Skip to content

[EPIC] Stop copying LogicalPlan during OptimizerPasses #9637

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Broken out from #9577 where @mustafasrepo @comphead and @jayzhan211 and I were discussing optimizer performance

TLDR is that the datafusion optimizer is slow. When I did some profiling locally by running the following

cargo bench --bench sql_planner -- physical_plan_tpch_all

My analysis is that almost 40% of the planning time is spent in SimplifyExprs and CommonSubexprEliminate and most of that time is related to copying expressions from what I can tell

Screenshot 2024-03-14 at 11 07 57 AM

While those passes themselves internally make a bunch of clones, which we are improving (e.g. @jayzhan211 on #9628) I think there is a more fundamental structural problem

I think a core challenge is that the OptimizerRule trait pretty much requires copying Exprs on each pass, as it gets a &LogicalPlan input, but produces a LogicalPlan output

    // Required methods
    fn try_optimize(
        &self,
        plan: &LogicalPlan,
        config: &dyn OptimizerConfig
    ) -> Result<Option<LogicalPlan>, DataFusionError>;

This mean any pass that works on Exprs must clone all Exprs (by calling LogicalPlan::expressions()) rewrite them, and then then create a new LogicalPlan with those new Exprs.

Here is that pattern in the expression simplifier:

https://github.com/apache/arrow-datafusion/blob/0eec5f8e1d0f55e48f5cdc628fbb5ddd89b91512/datafusion/optimizer/src/simplify_expressions/simplify_exprs.rs#L112-L123

Describe the solution you'd like

Find some way to avoid clone'ing exprs during LogicalPlan rewrite

Update: here are the tasks:

Infrastructure Preparation

Update OptimizerRules to avoid copying

Update AnalyzerRules to avoid copying

Update Other to avoid copying

Describe alternatives you've considered

No response

Additional context

We have talked about various other ways to reduce copying of LogicalPlans as well as its challenges in other tickets:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions