Description
Is your feature request related to a problem or challenge?
Broken out from #9577 where @mustafasrepo @comphead and @jayzhan211 and I were discussing optimizer performance
TLDR is that the datafusion optimizer is slow. When I did some profiling locally by running the following
cargo bench --bench sql_planner -- physical_plan_tpch_all
My analysis is that almost 40% of the planning time is spent in SimplifyExprs and CommonSubexprEliminate and most of that time is related to copying expressions from what I can tell
While those passes themselves internally make a bunch of clones, which we are improving (e.g. @jayzhan211 on #9628) I think there is a more fundamental structural problem
I think a core challenge is that the OptimizerRule trait pretty much requires copying Exprs
on each pass, as it gets a &LogicalPlan
input, but produces a LogicalPlan
output
// Required methods
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig
) -> Result<Option<LogicalPlan>, DataFusionError>;
This mean any pass that works on Exprs
must clone
all Exprs
(by calling LogicalPlan::expressions()
) rewrite them, and then then create a new LogicalPlan
with those new Exprs.
Here is that pattern in the expression simplifier:
Describe the solution you'd like
Find some way to avoid clone'ing exprs during LogicalPlan rewrite
Update: here are the tasks:
Infrastructure Preparation
- Refactor
Optimizer
to use owned plans andTreeNode
API (10% faster planning) #9948 - Avoid
LogicalPlan::clone()
inLogicalPlan::map_children
when possible #9999
Update OptimizerRule
s to avoid copying
-
SimplifyExpressions
: IntroduceOptimizerRule::rewrite
to rewrite in place, rewriteExprSimplifier
(20% faster planning) #9954 -
CommonSubexprEliminate
: Stop copyingExpr
s and LogicalPlans so much during Common Subexpression Elimination #9873 -
DecorrelatePredicateSubquery
: Stop copying LogicalPlan and Exprs inDecorrelatePredicateSubquery
#10289 -
EliminateCrossJoin
: Stop copying LogicalPlan and Exprs inEliminateCrossJoin
#10287 -
EliminateDuplicatedExpr
: refactorEliminateDuplicatedExpr
optimizer pass to avoid clone #10218 -
EliminateFilter
: Stop copying LogicalPlan and Exprs inEliminateFilter
#10288 -
EliminateJoin
: Implement rewrite for EliminateOneUnion and EliminateJoin #10184 -
EliminateLimit
: Stop copying LogicalPlan and Exprs inEliminateLimit
#10212 -
EliminateNestedUnion
: Stop copying LogicalPlan and Exprs inEliminateNestedUnion
#10296 -
EliminateOneUnion
: Implement rewrite for EliminateOneUnion and EliminateJoin #10184 -
EliminateOuterJoin
: RefactorEliminateOuterJoin
to implementOptimizerRule::rewrite()
#10081 -
ExtractEquijoinPredicate
: implement rewrite for ExtractEquijoinPredicate and avoid clone in filter #10165 -
FilterNullJoinKeys
implement rewrite for FilterNullJoinKeys #10166 -
OptimizeProjections
: Stop copying LogicalPlan and Exprs inOptimizeProjections
#10209 -
PropagateEmptyRelation
: Stop copying LogicalPlan and Exprs inPropagateEmptyRelation
#10290 -
PushDownFilter
: Stop copying LogicalPlan and Exprs inPushDownFilter
#10291 -
PushDownLimit
: Stop copying LogicalPlan and Exprs inPushDownLimit
#10292 -
ReplaceDistinctWithAggregate
: Stop copying LogicalPlan and Exprs inReplaceDistinctWithAggregate
#10293 -
RewriteDisjunctivePredicate
: Stop copying LogicalPlan and Exprs inRewriteDisjunctivePredicate
#10213 -
ScalarSubqueryToJoin
: Stop copying LogicalPlan and Exprs inScalarSubqueryToJoin
#10294 -
SingleDistinctToGroupBy
: Stop copying LogicalPlan and Exprs inSingleDistinctToGroupBy
#10295 -
UnwrapCastInComparison
RefactorUnwrapCastInComparison
to implementOptimizerRule::rewrite()
#10087 / RefactorUnwrapCastInComparison
to removeExpr
clones #10115
Update AnalyzerRule
s to avoid copying
-
AnalyzerMisc
: Minor: Avoid copying all expressions inAnalzyer
/check_plan
#9974 -
InlineTableScan
: Avoid copies inInlineTableScan
via TreeNode API #10038 -
ApplyFunctionRewrites
: fixNamedStructField should be rewritten in OperatorToFunction
in subquery regression (changeApplyFunctionRewrites
to use TreeNode API #10032 -
TypeCoercion
: Stop copying LogicalPlan and Exprs inTypeCoercion
#10210 -
TypeCoercion
more: Onyl recompute schema inTypeCoercion
when necessary #10365 -
CountWildcardRule
: (needs a little reworking to avoid clones) Avoid copies inCountWildcardRule
via TreeNode API #10066
Update Other to avoid copying
Describe alternatives you've considered
No response
Additional context
We have talked about various other ways to reduce copying of LogicalPlans as well as its challenges in other tickets: