Execution plan creation performance degradation (is slower) compared to 0.40.0

I measured a performance of execution plan creation (and optimization)  using default physical planner.

0.40.0 vs current main (84ac4f9db09a88a9d24c33b26fe968dc5d8e0786).


To measure used the following test (put it into core/src/physical_planner.rs):
```rust
#[tokio::test]
async fn execution_plan_creation_bench() {
    let ctx = SessionContext::new();
    let mut fields = vec![];
    let mut columns = vec![];
    const COLUMNS_NUM: usize = 200;

    for i in 0..COLUMNS_NUM {
        fields.push(Field::new(format!("attribute{}", i), datafusion::arrow::datatypes::DataType::Int32, true));
        columns.push(Int32Array::from(vec![1]));
    }

    let batch = RecordBatch::try_new(
        SchemaRef::new(Schema::new(fields)),
        columns
            .into_iter()
            .map(|it| Arc::new(it) as Arc<dyn Array>)
            .collect(),
    )
    .unwrap();

    ctx.register_batch("t", batch).unwrap();

    let mut aggregates = String::new();
    for i in 0..COLUMNS_NUM {
        if i > 0 {
            aggregates.push_str(", ");
        }
        aggregates.push_str(format!("MAX(attribute{})", i).as_str());
    }

    /*
        SELECT max(attr0), ..., max(attrN) FROM t
    */
    let query = format!("SELECT {} FROM t", aggregates);
    let mut res = 0;

    let statement = ctx.state().sql_to_statement(&query, "Generic").unwrap();
    let plan = ctx.state().statement_to_plan(statement).await.unwrap();
    let planner = DefaultPhysicalPlanner::default();

    let mut duration_sum = 0.0;
    let mut durtaion_count = 0;

    for _ in 0..500 {
        let creation_start = std::time::Instant::now();
        let phys_plan = planner
            .create_physical_plan(&plan, &ctx.state())
            .await
            .unwrap();
        duration_sum += creation_start.elapsed().as_micros() as f64;
        durtaion_count += 1;
        res += phys_plan.children().len();
    }

    println!("mean = {}", duration_sum / (durtaion_count as f64));

    /* Not important. */
    println!("res [not important, just to not discard] = {}", res);
}
```

The query in the test is  `SELECT f(a0), ..., f(a_N) FROM t`, where f - some aggregate function.

The command to test:
```sh
cargo test --release --package datafusion --lib -- physical_planner::tests::execution_plan_creation_bench --exact --show-output
```

On the branch-40 I got the result:
```
mean = 1003.368
```

On the current main (84ac4f9) I got:
```
mean = 1286.524
```

But what is more interesting, when I increase the number of columns in the query (200 -> 1000), got:

branch-40:
```
mean = 8207.86
```

main (84ac4f9):
```
mean = 9836.384
```

So, the difference increases (it seems that the dependency is linear) with columns number growth. 

I create this issue to investigate: is such degradation inevitable? Or maybe there is a an obvious optimization that solves the problem.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Execution plan creation performance degradation (is slower) compared to 0.40.0 #12738

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Execution plan creation performance degradation (is slower) compared to 0.40.0 #12738

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions