Skip to content

Conversation

askalt
Copy link
Contributor

@askalt askalt commented Oct 15, 2024

Which issue does this PR close?

Closes #12738.

Rationale for this change

I investigated the performance degradation in creating physical plans for queries with a large number of columns compared to version 40 and discovered the following:

  1. The main time loss occurs during the cloning of plan nodes during optimizations. We can compare two flame graphs (for version 40 and version 42, attached) and see that in version 42, enforce_distribution spends additional time destroying the vector of AggregateFunExpr. It turns out that in patch f5c47fa, the Arc for storing aggregate expressions was removed. As a result, during calls to with_new_children, fairly heavy structures are being cloned and destroyed.

  2. Some time was also spent on a new optimization: limit pushdown. This is optional and left up to the user to decide whether to use it, so there are no issues here.

This patch restores the Arc wrappers for storing aggregates in the physical plan nodes, and also adds a benchmark aimed at preventing the degradation from recurring.

Additionally:
As can be seen from the flame graphs, a significant amount of time is spent on creating ProjectionMapping. This mapping is only needed when there are eq properties, which are irrelevant for some plans, so it gets built unnecessarily, wasting time. I will raise a separate issue for this.

====

Flamegraphs here:

flamegraphs.zip

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate labels Oct 15, 2024
@askalt
Copy link
Contributor Author

askalt commented Oct 16, 2024

image

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

As can be seen from the flame graphs, a significant amount of time is spent on creating ProjectionMapping. This mapping is only needed when there are eq properties, which are irrelevant for some plans, so it gets built unnecessarily, wasting time. I will raise a separate issue for this.

THank you

@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

I have this on my list to review tomorrow if no one beats me to it

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @askalt -- I reviewed this PR carefully and Arc'ing the structure makes quite a bit of sense.

Here is a proposal to reduce copying even more: askalt#1


criterion_group!(benches, criterion_benchmark);
/// Aimed at tracking inefficiencies at the stage of creating/optimizing a physical plan.
fn bench_creation_many_columns(c: &mut Criterion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this benchmark to the planning benchmark (where it will be easier to discover?)

https://github.com/apache/datafusion/blob/main/datafusion/core/benches/sql_planner.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@alamb
Copy link
Contributor

alamb commented Oct 17, 2024

Additionally:
As can be seen from the flame graphs, a significant amount of time is spent on creating ProjectionMapping. This mapping is only needed when there are eq properties, which are irrelevant for some plans, so it gets built unnecessarily, wasting time. I will raise a separate issue for this.

It would be great to file an issue for this -- I am happy to do so if you like

@askalt
Copy link
Contributor Author

askalt commented Oct 18, 2024

Additionally:
As can be seen from the flame graphs, a significant amount of time is spent on creating ProjectionMapping. This mapping is only needed when there are eq properties, which are irrelevant for some plans, so it gets built unnecessarily, wasting time. I will raise a separate issue for this.

It would be great to file an issue for this -- I am happy to do so if you like

Yes, I like, thank you!

Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr.
But, it can be inefficient. When physical optimizer decides to replace a node child to other,
it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains
hundreds aggregates and these aggregates are cloned each time.

This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer.
This patch adds a small optimization that can soft the edges on
some queries. If there are no parent requirements we do not need to
build column mapping.
@askalt askalt force-pushed the askalt/plan-creaton-perf-improvements branch from 9d2d031 to fdb0b33 Compare October 18, 2024 08:06
@alamb alamb added the api change Changes the API exposed to users of the crate label Oct 18, 2024
@alamb alamb merged commit 10af8a7 into apache:main Oct 18, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 18, 2024

Thanks again @askalt -- I am now working on the follow on ticket

@alamb
Copy link
Contributor

alamb commented Oct 19, 2024

Filed #13015 and a follow on issue here #13012

askalt added a commit to tarantool/datafusion that referenced this pull request Nov 1, 2024
…che#12950)

* Add a benchmark for physical plan creation with many aggregates

* Wrap AggregateFunctionExpr with Arc

Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr.
But, it can be inefficient. When physical optimizer decides to replace a node child to other,
it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains
hundreds aggregates and these aggregates are cloned each time.

This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer.

* Do not build mapping if parent does not require any

This patch adds a small optimization that can soft the edges on
some queries. If there are no parent requirements we do not need to
build column mapping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Execution plan creation performance degradation (is slower) compared to 0.40.0

2 participants