-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Improve performance for physical plan creation with many columns #12950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for physical plan creation with many columns #12950
Conversation
THank you |
I have this on my list to review tomorrow if no one beats me to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
criterion_group!(benches, criterion_benchmark); | ||
/// Aimed at tracking inefficiencies at the stage of creating/optimizing a physical plan. | ||
fn bench_creation_many_columns(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move this benchmark to the planning benchmark (where it will be easier to discover?)
https://github.com/apache/datafusion/blob/main/datafusion/core/benches/sql_planner.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It would be great to file an issue for this -- I am happy to do so if you like |
Yes, I like, thank you! |
Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr. But, it can be inefficient. When physical optimizer decides to replace a node child to other, it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains hundreds aggregates and these aggregates are cloned each time. This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer.
This patch adds a small optimization that can soft the edges on some queries. If there are no parent requirements we do not need to build column mapping.
9d2d031
to
fdb0b33
Compare
Thanks again @askalt -- I am now working on the follow on ticket |
…che#12950) * Add a benchmark for physical plan creation with many aggregates * Wrap AggregateFunctionExpr with Arc Patch f5c47fa removed Arc wrappers for AggregateFunctionExpr. But, it can be inefficient. When physical optimizer decides to replace a node child to other, it clones the node (with `with_new_children`). Assume, that node is `AggregateExec` than contains hundreds aggregates and these aggregates are cloned each time. This patch returns a Arc wrapping to not clone AggregateFunctionExpr itself but clone a pointer. * Do not build mapping if parent does not require any This patch adds a small optimization that can soft the edges on some queries. If there are no parent requirements we do not need to build column mapping.
Which issue does this PR close?
Closes #12738.
Rationale for this change
I investigated the performance degradation in creating physical plans for queries with a large number of columns compared to version 40 and discovered the following:
The main time loss occurs during the cloning of plan nodes during optimizations. We can compare two flame graphs (for version 40 and version 42, attached) and see that in version 42,
enforce_distribution
spends additional time destroying the vector of AggregateFunExpr. It turns out that in patch f5c47fa, the Arc for storing aggregate expressions was removed. As a result, during calls towith_new_children
, fairly heavy structures are being cloned and destroyed.Some time was also spent on a new optimization: limit pushdown. This is optional and left up to the user to decide whether to use it, so there are no issues here.
This patch restores the Arc wrappers for storing aggregates in the physical plan nodes, and also adds a benchmark aimed at preventing the degradation from recurring.
Additionally:
As can be seen from the flame graphs, a significant amount of time is spent on creating ProjectionMapping. This mapping is only needed when there are eq properties, which are irrelevant for some plans, so it gets built unnecessarily, wasting time. I will raise a separate issue for this.
====
Flamegraphs here:
flamegraphs.zip