Skip to content

Conversation

pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Sep 4, 2025

Which issue does this PR close?

Rationale for this change

The documentation states that WITH ORDER clauses may use non-trivial expressions. It even has an example showing the usage of this feature. In practice this does not work and the implementation is limited to simple column references.

What changes are included in this PR?

  • Add a new physical_expr::create_lex_ordering function that provides a more flexible version of physical_expr::create_ordering. create_ordering with its single column constraint has been retained for backwards compatibility, but should perhaps be deprecated. It does not seems possible to reimplement it in terms of create_lex_ordering since an ExecutionProps instance is required.
  • Add a new physical_expr::equivalence::project_orderings convenience function that uses the existing sort order projection logic
  • Adjust the various users of sort order projection to make use of the new implementation.

Are these changes tested?

  • Changed logic is covered by existing tests.
  • Added additional SQL logic tests to verify a non-trivial with order case

Are there any user-facing changes?

  • The example in the documentation will now actually work
  • Consumers of sort expressions may now have to deal with arbitrary PhysicalExpr instances rather than only Column.

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate catalog Related to the catalog crate datasource Changes to the datasource crate labels Sep 4, 2025
@pepijnve pepijnve force-pushed the issue_17411 branch 7 times, most recently from 4e04412 to 4bbc81a Compare September 5, 2025 15:06
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 5, 2025
@pepijnve pepijnve marked this pull request as ready for review September 8, 2025 07:59
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pepijnve -- this looks good to me. I'll kick off some planning benchmarks just to make sure this doesn't affect them, but I don't expect to see any slowdown

}
None => create_ordering(self.0.source.schema(), &self.0.order)?,
let schema = self.0.source.schema();
let df_schema = DFSchema::try_from(Arc::clone(schema))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if (re)creating this DFSchema is necessary -- it feels like at this point we know the schema information

However, i also see we need to have a DFSchema to correctly create arbitrary PhysicalExprs so this is probably fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit concerned about the waste here as well, but I couldn't figure out a simple way to avoid this.

----
physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/data/composite_order.csv]]}, projection=[a, b], output_ordering=[a@0 + b@1 ASC NULLS LAST], file_type=csv, has_header=true

# Query ordered by the declared order should be just a table scan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

02)--CoalesceBatchesExec: target_batch_size=4096
03)----FilterExec: b@2 = 0
04)------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_orderings=[[a@1 ASC NULLS LAST, b@2 ASC NULLS LAST], [c@3 ASC NULLS LAST]], file_type=csv, has_header=true
04)------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a0, a, b, c, d], output_orderings=[[c@3 ASC NULLS LAST], [a@1 ASC NULLS LAST, b@2 ASC NULLS LAST]], file_type=csv, has_header=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why the output orderings come out in a different (reverse) order now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't take the time to try to understand why. That's how they're being emitted by the EquivalenceClass code. I had assumed the order was not important, but if it is I can take a closer look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it is important

@alamb
Copy link
Contributor

alamb commented Sep 10, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_17411 (124953d) to 241b669 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_17411
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 11, 2025

🤖: Benchmark completed

Details

group                                                issue_17411                            main
-----                                                -----------                            ----
logical_aggregate_with_join                          1.00    624.8±4.06µs        ? ?/sec    1.01    631.5±3.42µs        ? ?/sec
logical_plan_optimize                                1.00     179.0±5.63s        ? ?/sec    1.02     182.6±8.59s        ? ?/sec
logical_select_all_from_1000                         1.00     11.0±0.07ms        ? ?/sec    1.04     11.4±0.08ms        ? ?/sec
logical_select_one_from_700                          1.00    412.9±5.21µs        ? ?/sec    1.01    417.5±3.09µs        ? ?/sec
logical_trivial_join_high_numbered_columns           1.00    370.1±3.09µs        ? ?/sec    1.01    372.9±2.20µs        ? ?/sec
logical_trivial_join_low_numbered_columns            1.00    356.0±1.57µs        ? ?/sec    1.01    358.9±2.83µs        ? ?/sec
physical_intersection                                1.00    830.7±4.30µs        ? ?/sec    1.01    842.1±9.45µs        ? ?/sec
physical_join_consider_sort                          1.00   1382.2±8.17µs        ? ?/sec    1.01   1400.8±8.36µs        ? ?/sec
physical_join_distinct                               1.00    347.8±1.66µs        ? ?/sec    1.02    353.7±2.71µs        ? ?/sec
physical_many_self_joins                             1.00     10.2±0.06ms        ? ?/sec    1.02     10.4±0.07ms        ? ?/sec
physical_plan_clickbench_all                         1.06    217.7±7.87ms        ? ?/sec    1.00    205.0±5.02ms        ? ?/sec
physical_plan_clickbench_q1                          1.01      2.7±0.06ms        ? ?/sec    1.00      2.6±0.07ms        ? ?/sec
physical_plan_clickbench_q10                         1.03      3.7±0.12ms        ? ?/sec    1.00      3.6±0.13ms        ? ?/sec
physical_plan_clickbench_q11                         1.03      4.0±0.10ms        ? ?/sec    1.00      3.9±0.15ms        ? ?/sec
physical_plan_clickbench_q12                         1.03      4.2±0.16ms        ? ?/sec    1.00      4.0±0.16ms        ? ?/sec
physical_plan_clickbench_q13                         1.03      3.7±0.11ms        ? ?/sec    1.00      3.6±0.13ms        ? ?/sec
physical_plan_clickbench_q14                         1.03      4.0±0.11ms        ? ?/sec    1.00      3.9±0.12ms        ? ?/sec
physical_plan_clickbench_q15                         1.04      3.8±0.10ms        ? ?/sec    1.00      3.7±0.14ms        ? ?/sec
physical_plan_clickbench_q16                         1.00      3.6±0.07ms        ? ?/sec    1.00      3.6±0.11ms        ? ?/sec
physical_plan_clickbench_q17                         1.00      3.7±0.09ms        ? ?/sec    1.00      3.7±0.11ms        ? ?/sec
physical_plan_clickbench_q18                         1.05      3.2±0.08ms        ? ?/sec    1.00      3.1±0.10ms        ? ?/sec
physical_plan_clickbench_q19                         1.01      4.1±0.11ms        ? ?/sec    1.00      4.1±0.11ms        ? ?/sec
physical_plan_clickbench_q2                          1.05      3.3±0.13ms        ? ?/sec    1.00      3.2±0.12ms        ? ?/sec
physical_plan_clickbench_q20                         1.06      2.9±0.08ms        ? ?/sec    1.00      2.8±0.07ms        ? ?/sec
physical_plan_clickbench_q21                         1.06      3.3±0.08ms        ? ?/sec    1.00      3.1±0.08ms        ? ?/sec
physical_plan_clickbench_q22                         1.03      3.9±0.12ms        ? ?/sec    1.00      3.8±0.09ms        ? ?/sec
physical_plan_clickbench_q23                         1.01      4.2±0.12ms        ? ?/sec    1.00      4.2±0.15ms        ? ?/sec
physical_plan_clickbench_q24                         1.17      5.5±0.21ms        ? ?/sec    1.00      4.7±0.14ms        ? ?/sec
physical_plan_clickbench_q25                         1.07      3.6±0.12ms        ? ?/sec    1.00      3.3±0.11ms        ? ?/sec
physical_plan_clickbench_q26                         1.07      3.3±0.10ms        ? ?/sec    1.00      3.1±0.09ms        ? ?/sec
physical_plan_clickbench_q27                         1.12      3.7±0.24ms        ? ?/sec    1.00      3.3±0.09ms        ? ?/sec
physical_plan_clickbench_q28                         1.03      4.3±0.12ms        ? ?/sec    1.00      4.1±0.15ms        ? ?/sec
physical_plan_clickbench_q29                         1.02      5.0±0.16ms        ? ?/sec    1.00      4.9±0.12ms        ? ?/sec
physical_plan_clickbench_q3                          1.04      3.2±0.10ms        ? ?/sec    1.00      3.0±0.12ms        ? ?/sec
physical_plan_clickbench_q30                         1.03     14.6±0.60ms        ? ?/sec    1.00     14.1±0.48ms        ? ?/sec
physical_plan_clickbench_q31                         1.09      4.5±0.17ms        ? ?/sec    1.00      4.1±0.12ms        ? ?/sec
physical_plan_clickbench_q32                         1.06      4.4±0.16ms        ? ?/sec    1.00      4.1±0.11ms        ? ?/sec
physical_plan_clickbench_q33                         1.02      3.7±0.12ms        ? ?/sec    1.00      3.7±0.29ms        ? ?/sec
physical_plan_clickbench_q34                         1.05      3.4±0.14ms        ? ?/sec    1.00      3.2±0.08ms        ? ?/sec
physical_plan_clickbench_q35                         1.03      3.4±0.08ms        ? ?/sec    1.00      3.3±0.09ms        ? ?/sec
physical_plan_clickbench_q36                         1.07      4.4±0.17ms        ? ?/sec    1.00      4.1±0.10ms        ? ?/sec
physical_plan_clickbench_q37                         1.10      4.6±0.17ms        ? ?/sec    1.00      4.2±0.18ms        ? ?/sec
physical_plan_clickbench_q38                         1.10      4.6±0.15ms        ? ?/sec    1.00      4.2±0.13ms        ? ?/sec
physical_plan_clickbench_q39                         1.10      4.4±0.16ms        ? ?/sec    1.00      4.0±0.13ms        ? ?/sec
physical_plan_clickbench_q4                          1.04      2.8±0.06ms        ? ?/sec    1.00      2.7±0.10ms        ? ?/sec
physical_plan_clickbench_q40                         1.13      5.3±0.18ms        ? ?/sec    1.00      4.7±0.15ms        ? ?/sec
physical_plan_clickbench_q41                         1.13      4.8±0.18ms        ? ?/sec    1.00      4.2±0.13ms        ? ?/sec
physical_plan_clickbench_q42                         1.14      4.7±0.21ms        ? ?/sec    1.00      4.2±0.13ms        ? ?/sec
physical_plan_clickbench_q43                         1.20      5.4±0.26ms        ? ?/sec    1.00      4.5±0.13ms        ? ?/sec
physical_plan_clickbench_q44                         1.05      3.1±0.11ms        ? ?/sec    1.00      2.9±0.09ms        ? ?/sec
physical_plan_clickbench_q45                         1.04      3.0±0.09ms        ? ?/sec    1.00      2.9±0.09ms        ? ?/sec
physical_plan_clickbench_q46                         1.03      3.5±0.11ms        ? ?/sec    1.00      3.4±0.11ms        ? ?/sec
physical_plan_clickbench_q47                         1.02      4.1±0.16ms        ? ?/sec    1.00      4.1±0.12ms        ? ?/sec
physical_plan_clickbench_q48                         1.08      5.3±0.21ms        ? ?/sec    1.00      4.9±0.20ms        ? ?/sec
physical_plan_clickbench_q49                         1.09      5.6±0.25ms        ? ?/sec    1.00      5.1±0.15ms        ? ?/sec
physical_plan_clickbench_q5                          1.02      3.1±0.08ms        ? ?/sec    1.00      3.0±0.08ms        ? ?/sec
physical_plan_clickbench_q50                         1.04      4.8±0.22ms        ? ?/sec    1.00      4.7±0.20ms        ? ?/sec
physical_plan_clickbench_q51                         1.03      3.6±0.14ms        ? ?/sec    1.00      3.5±0.11ms        ? ?/sec
physical_plan_clickbench_q6                          1.04      3.1±0.08ms        ? ?/sec    1.00      3.0±0.09ms        ? ?/sec
physical_plan_clickbench_q7                          1.02      2.7±0.06ms        ? ?/sec    1.00      2.6±0.08ms        ? ?/sec
physical_plan_clickbench_q8                          1.03      3.8±0.09ms        ? ?/sec    1.00      3.7±0.11ms        ? ?/sec
physical_plan_clickbench_q9                          1.01      3.5±0.08ms        ? ?/sec    1.00      3.5±0.09ms        ? ?/sec
physical_plan_tpcds_all                              1.02   1074.3±3.41ms        ? ?/sec    1.00   1053.3±7.82ms        ? ?/sec
physical_plan_tpch_all                               1.03     65.8±0.43ms        ? ?/sec    1.00     63.9±0.61ms        ? ?/sec
physical_plan_tpch_q1                                1.00      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
physical_plan_tpch_q10                               1.03      4.0±0.02ms        ? ?/sec    1.00      3.9±0.03ms        ? ?/sec
physical_plan_tpch_q11                               1.05      3.5±0.02ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_tpch_q12                               1.00  1849.7±10.34µs        ? ?/sec    1.00  1844.4±15.32µs        ? ?/sec
physical_plan_tpch_q13                               1.00  1485.6±10.56µs        ? ?/sec    1.00  1486.7±11.55µs        ? ?/sec
physical_plan_tpch_q14                               1.00  1990.1±16.15µs        ? ?/sec    1.00  1991.3±12.82µs        ? ?/sec
physical_plan_tpch_q16                               1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_tpch_q17                               1.06      2.6±0.02ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec
physical_plan_tpch_q18                               1.00      2.7±0.01ms        ? ?/sec    1.01      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q19                               1.01      3.3±0.02ms        ? ?/sec    1.00      3.3±0.03ms        ? ?/sec
physical_plan_tpch_q2                                1.07      6.0±0.11ms        ? ?/sec    1.00      5.6±0.05ms        ? ?/sec
physical_plan_tpch_q20                               1.02      3.2±0.03ms        ? ?/sec    1.00      3.2±0.07ms        ? ?/sec
physical_plan_tpch_q21                               1.06      4.4±0.03ms        ? ?/sec    1.00      4.1±0.04ms        ? ?/sec
physical_plan_tpch_q22                               1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q3                                1.04      2.7±0.01ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_tpch_q4                                1.00   1541.2±6.67µs        ? ?/sec    1.01  1551.8±11.19µs        ? ?/sec
physical_plan_tpch_q5                                1.03      3.3±0.02ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpch_q6                                1.00    876.1±7.08µs        ? ?/sec    1.01    881.2±8.98µs        ? ?/sec
physical_plan_tpch_q7                                1.00      4.3±0.04ms        ? ?/sec    1.01      4.3±0.06ms        ? ?/sec
physical_plan_tpch_q8                                1.07      5.6±0.16ms        ? ?/sec    1.00      5.2±0.05ms        ? ?/sec
physical_plan_tpch_q9                                1.00      4.1±0.02ms        ? ?/sec    1.00      4.1±0.03ms        ? ?/sec
physical_select_aggregates_from_200                  1.00     16.9±0.08ms        ? ?/sec    1.01     17.0±0.10ms        ? ?/sec
physical_select_all_from_1000                        1.00     24.4±0.17ms        ? ?/sec    1.02     25.0±0.12ms        ? ?/sec
physical_select_one_from_700                         1.00  1057.1±10.09µs        ? ?/sec    1.02   1081.8±8.25µs        ? ?/sec
physical_sorted_union_order_by_10                    1.00     13.2±0.20ms        ? ?/sec    1.00     13.2±0.19ms        ? ?/sec
physical_sorted_union_order_by_100                   1.02       2.1±0.03s        ? ?/sec    1.00       2.0±0.01s        ? ?/sec
physical_sorted_union_order_by_200                   1.01      13.0±0.18s        ? ?/sec    1.00      12.9±0.14s        ? ?/sec
physical_sorted_union_order_by_300                   1.02      39.5±0.29s        ? ?/sec    1.00      38.9±0.54s        ? ?/sec
physical_sorted_union_order_by_50                    1.00    390.4±3.72ms        ? ?/sec    1.00    390.5±4.02ms        ? ?/sec
physical_theta_join_consider_sort                    1.00   1745.6±8.52µs        ? ?/sec    1.01  1762.5±10.74µs        ? ?/sec
physical_unnest_to_join                              1.00   1307.8±9.30µs        ? ?/sec    1.00   1312.9±5.21µs        ? ?/sec
physical_window_function_partition_by_4_on_values    1.00   1302.2±5.14µs        ? ?/sec    1.00   1298.8±5.89µs        ? ?/sec
physical_window_function_partition_by_7_on_values    1.00     34.8±0.15ms        ? ?/sec    1.02     35.4±0.10ms        ? ?/sec
physical_window_function_partition_by_8_on_values    1.00    136.2±0.66ms        ? ?/sec    1.01    138.2±0.29ms        ? ?/sec
with_param_values_many_columns                       1.00    148.7±4.52µs        ? ?/sec    1.01    150.3±5.18µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Sep 11, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_17411 (124953d) to 241b669 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 11, 2025

🤖: Benchmark completed

Details

Comparing HEAD and issue_17411
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ issue_17411 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2724.96 ms │  2883.22 ms │ 1.06x slower │
│ QQuery 1     │  1368.60 ms │  1466.75 ms │ 1.07x slower │
│ QQuery 2     │  2447.12 ms │  2678.82 ms │ 1.09x slower │
│ QQuery 3     │  1205.52 ms │  1172.61 ms │    no change │
│ QQuery 4     │  2291.41 ms │  2376.86 ms │    no change │
│ QQuery 5     │ 27385.01 ms │ 27708.13 ms │    no change │
│ QQuery 6     │  4140.13 ms │  4248.71 ms │    no change │
│ QQuery 7     │  3741.37 ms │  4334.76 ms │ 1.16x slower │
└──────────────┴─────────────┴─────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 45304.13ms │
│ Total Time (issue_17411)   │ 46869.87ms │
│ Average Time (HEAD)        │  5663.02ms │
│ Average Time (issue_17411) │  5858.73ms │
│ Queries Faster             │          0 │
│ Queries Slower             │          4 │
│ Queries with No Change     │          4 │
│ Queries with Failure       │          0 │
└────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ issue_17411 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │     2.37 ms │  1.08x slower │
│ QQuery 1     │    50.00 ms │    50.48 ms │     no change │
│ QQuery 2     │   139.23 ms │   133.63 ms │     no change │
│ QQuery 3     │   165.19 ms │   164.52 ms │     no change │
│ QQuery 4     │  1071.85 ms │  1060.62 ms │     no change │
│ QQuery 5     │  1475.94 ms │  1544.15 ms │     no change │
│ QQuery 6     │     2.17 ms │     2.30 ms │  1.06x slower │
│ QQuery 7     │    54.95 ms │    54.03 ms │     no change │
│ QQuery 8     │  1438.20 ms │  1497.52 ms │     no change │
│ QQuery 9     │  1838.06 ms │  1861.33 ms │     no change │
│ QQuery 10    │   379.72 ms │   386.61 ms │     no change │
│ QQuery 11    │   426.25 ms │   443.43 ms │     no change │
│ QQuery 12    │  1351.47 ms │  1369.58 ms │     no change │
│ QQuery 13    │  2125.51 ms │  2191.41 ms │     no change │
│ QQuery 14    │  1264.83 ms │  1306.22 ms │     no change │
│ QQuery 15    │  1227.45 ms │  1217.76 ms │     no change │
│ QQuery 16    │  2760.44 ms │  2715.61 ms │     no change │
│ QQuery 17    │  2644.63 ms │  2686.47 ms │     no change │
│ QQuery 18    │  5498.43 ms │  5035.57 ms │ +1.09x faster │
│ QQuery 19    │   127.09 ms │   125.21 ms │     no change │
│ QQuery 20    │  2046.49 ms │  2047.36 ms │     no change │
│ QQuery 21    │  2368.18 ms │  2349.67 ms │     no change │
│ QQuery 22    │  4483.89 ms │  4037.62 ms │ +1.11x faster │
│ QQuery 23    │ 12849.04 ms │ 12919.60 ms │     no change │
│ QQuery 24    │   215.42 ms │   234.47 ms │  1.09x slower │
│ QQuery 25    │   495.37 ms │   522.15 ms │  1.05x slower │
│ QQuery 26    │   220.22 ms │   218.60 ms │     no change │
│ QQuery 27    │  2947.09 ms │  2952.02 ms │     no change │
│ QQuery 28    │ 23157.97 ms │ 23142.16 ms │     no change │
│ QQuery 29    │   991.05 ms │   985.86 ms │     no change │
│ QQuery 30    │  1348.32 ms │  1319.36 ms │     no change │
│ QQuery 31    │  1348.57 ms │  1333.53 ms │     no change │
│ QQuery 32    │  4515.69 ms │  4490.59 ms │     no change │
│ QQuery 33    │  5826.81 ms │  5722.14 ms │     no change │
│ QQuery 34    │  6030.99 ms │  6093.99 ms │     no change │
│ QQuery 35    │  2083.58 ms │  2077.14 ms │     no change │
│ QQuery 36    │   120.66 ms │   125.11 ms │     no change │
│ QQuery 37    │    54.35 ms │    57.26 ms │  1.05x slower │
│ QQuery 38    │   122.58 ms │   122.68 ms │     no change │
│ QQuery 39    │   203.48 ms │   203.30 ms │     no change │
│ QQuery 40    │    45.17 ms │    46.75 ms │     no change │
│ QQuery 41    │    39.61 ms │    40.37 ms │     no change │
│ QQuery 42    │    35.30 ms │    36.12 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 95593.44ms │
│ Total Time (issue_17411)   │ 94926.68ms │
│ Average Time (HEAD)        │  2223.10ms │
│ Average Time (issue_17411) │  2207.60ms │
│ Queries Faster             │          2 │
│ Queries Slower             │          5 │
│ Queries with No Change     │         36 │
│ Queries with Failure       │          0 │
└────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ issue_17411 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 174.48 ms │   175.18 ms │     no change │
│ QQuery 2     │  25.04 ms │    25.70 ms │     no change │
│ QQuery 3     │  44.59 ms │    44.60 ms │     no change │
│ QQuery 4     │  27.25 ms │    26.34 ms │     no change │
│ QQuery 5     │  74.23 ms │    73.67 ms │     no change │
│ QQuery 6     │  19.61 ms │    19.13 ms │     no change │
│ QQuery 7     │ 145.96 ms │   140.82 ms │     no change │
│ QQuery 8     │  32.90 ms │    31.99 ms │     no change │
│ QQuery 9     │  83.36 ms │    85.80 ms │     no change │
│ QQuery 10    │  58.51 ms │    57.31 ms │     no change │
│ QQuery 11    │  42.27 ms │    41.12 ms │     no change │
│ QQuery 12    │  51.50 ms │    50.90 ms │     no change │
│ QQuery 13    │  48.05 ms │    45.82 ms │     no change │
│ QQuery 14    │  14.40 ms │    13.28 ms │ +1.08x faster │
│ QQuery 15    │  24.72 ms │    24.35 ms │     no change │
│ QQuery 16    │  24.44 ms │    23.66 ms │     no change │
│ QQuery 17    │ 147.72 ms │   144.33 ms │     no change │
│ QQuery 18    │ 331.26 ms │   321.67 ms │     no change │
│ QQuery 19    │  48.83 ms │    36.64 ms │ +1.33x faster │
│ QQuery 20    │  60.25 ms │    50.18 ms │ +1.20x faster │
│ QQuery 21    │ 222.16 ms │   224.59 ms │     no change │
│ QQuery 22    │  20.51 ms │    20.09 ms │     no change │
└──────────────┴───────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary          ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)          │ 1722.02ms │
│ Total Time (issue_17411)   │ 1677.16ms │
│ Average Time (HEAD)        │   78.27ms │
│ Average Time (issue_17411) │   76.23ms │
│ Queries Faster             │         3 │
│ Queries Slower             │         0 │
│ Queries with No Change     │        19 │
│ Queries with Failure       │         0 │
└────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Sep 14, 2025

I am a little worried about the reported slowdowns in the sql planning benchmarks. I'll try and reproduce them locally

@pepijnve
Copy link
Contributor Author

Perhaps a two step approach would be better then where we try the “column only” version first and only use the more complex code path as fallback.

@alamb
Copy link
Contributor

alamb commented Sep 15, 2025

Perhaps a two step approach would be better then where we try the “column only” version first and only use the more complex code path as fallback.

This would have the very nice property that it was at least as fast as the current implementation (aka no regressions) and we can always try and make the performance faster for the newly supported expressions with a follow on PR

@pepijnve
Copy link
Contributor Author

I had quick look using flamegraph at physical_plan_clickbench_q43 since that showed that largest relative increase. Physical planning hardly shows up in the output. The largest block by far is LogicalPlanBuilder::normalize. Do you see similar output @alamb? Or are you getting something that points in the direction of the changes in this PR?

@pepijnve pepijnve force-pushed the issue_17411 branch 6 times, most recently from 5aafdca to 10dff75 Compare September 23, 2025 17:00
@alamb

This comment was marked as outdated.

@pepijnve
Copy link
Contributor Author

@alamb still no luck with the planner benchmarks it seems

@alamb
Copy link
Contributor

alamb commented Sep 25, 2025

@alamb still no luck with the planner benchmarks it seems

It appears to be blowing up due to the newly added planning benchmarks for window functions. Conveniently, this appears to be almost fixed with

Once that is merged, I'll try (again) and hopefully succeed this time

@alamb
Copy link
Contributor

alamb commented Sep 25, 2025

I merged this branch up from main to get the fix for #17684 and will try again to get the benchmarks to run

@alamb
Copy link
Contributor

alamb commented Sep 25, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_17411 (26ca178) to c1d6f34 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_17411
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

Aha -- the benchmark is not running because it is panic'ing on this branch:

To reproduce:

cargo bench --profile dev --bench sql_planner -- physical_plan_tpcds_all

This passes on main

It panic's on this branch like this:

Benchmarking physical_plan_tpcds_all: Warming up for 3.0000 s
thread 'main' panicked at datafusion/core/benches/sql_planner.rs:64:14:
called `Result::unwrap()` on an `Err` value: Internal("Physical input schema should be the same as the one converted from logical input schema. Differences: \n\t- field nullability at index 5 [sales_cnt]: (physical) true vs (logical) false\n\t- field nullability at index 6 [sales_amt]: (physical) true vs (logical) false")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That is somewhat bad that all the tests pass too 🤔

@alamb
Copy link
Contributor

alamb commented Sep 26, 2025

EDIT -- the same test fails on main too (not related to this PR it seems)

@pepijnve
Copy link
Contributor Author

Thanks for checking already. I’ll see if I can find some time to get that fixed on main since we have no way to do comparative testing otherwise.

@alamb
Copy link
Contributor

alamb commented Sep 28, 2025

Here is a PR to temporarily disable the failing test

Comment on lines +241 to +243
// Err result indicates an expression could not be found in the
// projected_schema, stop iterating since rest of the orderings are violated
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give a warn!() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an error case. This happens, for instance, when the input has sort order a, b, but the projection of the node only retains a. The resulting order is just a.

I'm kind of abusing Err here. The plan_err!("") bit is basically just there to abort the transform_up tree walk early since there's no point in continuing. The Err(_) bit is handling it. Not pretty, but I couldn't find a nicer way to achieve the desired result.

FWIW, this is the same behaviour as the code that's already present.

@alamb
Copy link
Contributor

alamb commented Sep 29, 2025

Merging up from main to get #17809

@alamb
Copy link
Contributor

alamb commented Sep 29, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_17411 (91f225a) to 5cc0be5 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_17411
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Sep 30, 2025

🤖: Benchmark completed

Details

group                                                 issue_17411                            main
-----                                                 -----------                            ----
logical_aggregate_with_join                           1.00    625.2±3.55µs        ? ?/sec    1.00    626.7±2.49µs        ? ?/sec
logical_plan_optimize                                 1.04     190.0±5.36s        ? ?/sec    1.00     182.1±4.64s        ? ?/sec
logical_select_all_from_1000                          1.00     10.9±0.10ms        ? ?/sec    1.00     10.9±0.08ms        ? ?/sec
logical_select_one_from_700                           1.00    416.7±1.59µs        ? ?/sec    1.00    417.7±1.22µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.00    370.8±1.24µs        ? ?/sec    1.00    371.7±1.53µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.01    359.5±2.84µs        ? ?/sec    1.00    355.7±1.41µs        ? ?/sec
physical_intersection                                 1.00    833.9±9.21µs        ? ?/sec    1.00    834.9±5.42µs        ? ?/sec
physical_join_consider_sort                           1.01  1393.2±17.31µs        ? ?/sec    1.00   1383.8±6.14µs        ? ?/sec
physical_join_distinct                                1.00    348.5±2.52µs        ? ?/sec    1.00    349.0±3.33µs        ? ?/sec
physical_many_self_joins                              1.00      9.5±0.06ms        ? ?/sec    1.00      9.5±0.03ms        ? ?/sec
physical_plan_clickbench_all                          1.02    196.8±3.59ms        ? ?/sec    1.00    193.5±3.22ms        ? ?/sec
physical_plan_clickbench_q1                           1.02      2.5±0.03ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q10                          1.03      3.6±0.06ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_clickbench_q11                          1.01      3.7±0.08ms        ? ?/sec    1.00      3.7±0.07ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      3.8±0.07ms        ? ?/sec    1.01      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q13                          1.01      3.5±0.06ms        ? ?/sec    1.00      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q14                          1.00      3.7±0.11ms        ? ?/sec    1.00      3.7±0.05ms        ? ?/sec
physical_plan_clickbench_q15                          1.04      3.7±0.32ms        ? ?/sec    1.00      3.6±0.09ms        ? ?/sec
physical_plan_clickbench_q16                          1.05      3.5±0.13ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_clickbench_q17                          1.00      3.5±0.05ms        ? ?/sec    1.00      3.5±0.06ms        ? ?/sec
physical_plan_clickbench_q18                          1.01      3.0±0.04ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_clickbench_q19                          1.00      3.9±0.05ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q2                           1.02      3.1±0.04ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
physical_plan_clickbench_q20                          1.00      2.7±0.04ms        ? ?/sec    1.01      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q21                          1.00      3.0±0.06ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_clickbench_q22                          1.02      3.7±0.08ms        ? ?/sec    1.00      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q23                          1.00      4.0±0.11ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q24                          1.00      4.4±0.08ms        ? ?/sec    1.00      4.4±0.09ms        ? ?/sec
physical_plan_clickbench_q25                          1.00      3.2±0.06ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q26                          1.00      3.0±0.05ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_clickbench_q27                          1.00      3.2±0.04ms        ? ?/sec    1.00      3.2±0.06ms        ? ?/sec
physical_plan_clickbench_q28                          1.00      4.0±0.11ms        ? ?/sec    1.00      4.0±0.09ms        ? ?/sec
physical_plan_clickbench_q29                          1.00      4.4±0.10ms        ? ?/sec    1.01      4.4±0.16ms        ? ?/sec
physical_plan_clickbench_q3                           1.03      3.0±0.06ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q30                          1.00     14.7±0.28ms        ? ?/sec    1.00     14.7±0.31ms        ? ?/sec
physical_plan_clickbench_q31                          1.00      4.0±0.07ms        ? ?/sec    1.02      4.1±0.11ms        ? ?/sec
physical_plan_clickbench_q32                          1.02      4.1±0.07ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q33                          1.01      3.5±0.05ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_clickbench_q34                          1.02      3.2±0.04ms        ? ?/sec    1.00      3.1±0.05ms        ? ?/sec
physical_plan_clickbench_q35                          1.04      3.4±0.13ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q36                          1.01      4.0±0.07ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q37                          1.02      4.2±0.07ms        ? ?/sec    1.00      4.1±0.06ms        ? ?/sec
physical_plan_clickbench_q38                          1.03      4.2±0.08ms        ? ?/sec    1.00      4.1±0.08ms        ? ?/sec
physical_plan_clickbench_q39                          1.05      4.1±0.08ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q4                           1.04      2.7±0.04ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q40                          1.05      4.7±0.12ms        ? ?/sec    1.00      4.5±0.10ms        ? ?/sec
physical_plan_clickbench_q41                          1.05      4.3±0.12ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q42                          1.04      4.2±0.10ms        ? ?/sec    1.00      4.0±0.09ms        ? ?/sec
physical_plan_clickbench_q43                          1.08      4.6±0.11ms        ? ?/sec    1.00      4.2±0.08ms        ? ?/sec
physical_plan_clickbench_q44                          1.01      2.8±0.07ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q45                          1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q46                          1.01      3.2±0.06ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q47                          1.00      3.9±0.08ms        ? ?/sec    1.00      3.9±0.10ms        ? ?/sec
physical_plan_clickbench_q48                          1.01      4.7±0.08ms        ? ?/sec    1.00      4.6±0.10ms        ? ?/sec
physical_plan_clickbench_q49                          1.02      5.0±0.09ms        ? ?/sec    1.00      4.8±0.07ms        ? ?/sec
physical_plan_clickbench_q5                           1.01      2.9±0.07ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
physical_plan_clickbench_q50                          1.01      4.6±0.11ms        ? ?/sec    1.00      4.5±0.11ms        ? ?/sec
physical_plan_clickbench_q51                          1.00      3.3±0.05ms        ? ?/sec    1.00      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q6                           1.02      2.9±0.04ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
physical_plan_clickbench_q7                           1.01      2.6±0.04ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
physical_plan_clickbench_q8                           1.01      3.6±0.06ms        ? ?/sec    1.00      3.5±0.06ms        ? ?/sec
physical_plan_clickbench_q9                           1.01      3.4±0.07ms        ? ?/sec    1.00      3.4±0.07ms        ? ?/sec
physical_plan_tpcds_all                               1.04   1030.5±4.36ms        ? ?/sec    1.00    994.1±2.97ms        ? ?/sec
physical_plan_tpch_all                                1.03     64.1±0.24ms        ? ?/sec    1.00     62.4±0.29ms        ? ?/sec
physical_plan_tpch_q1                                 1.00      2.0±0.01ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q10                                1.04      4.0±0.01ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
physical_plan_tpch_q11                                1.06      3.5±0.02ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                                1.00   1821.3±8.48µs        ? ?/sec    1.00   1822.5±8.82µs        ? ?/sec
physical_plan_tpch_q13                                1.00   1471.1±5.33µs        ? ?/sec    1.00   1474.4±9.94µs        ? ?/sec
physical_plan_tpch_q14                                1.00   1989.9±5.11µs        ? ?/sec    1.01      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q16                                1.00      2.5±0.01ms        ? ?/sec    1.01      2.5±0.03ms        ? ?/sec
physical_plan_tpch_q17                                1.05      2.6±0.01ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q18                                1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q19                                1.02      3.2±0.01ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpch_q2                                 1.07      5.8±0.03ms        ? ?/sec    1.00      5.5±0.03ms        ? ?/sec
physical_plan_tpch_q20                                1.02      3.2±0.02ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q21                                1.05      4.3±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec
physical_plan_tpch_q22                                1.00      2.7±0.02ms        ? ?/sec    1.01      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q3                                 1.05      2.7±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q4                                 1.00   1527.7±6.87µs        ? ?/sec    1.00  1520.8±14.85µs        ? ?/sec
physical_plan_tpch_q5                                 1.03      3.3±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q6                                 1.00    871.6±6.76µs        ? ?/sec    1.00    870.9±5.50µs        ? ?/sec
physical_plan_tpch_q7                                 1.00      4.2±0.02ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
physical_plan_tpch_q8                                 1.07      5.5±0.05ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      4.0±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec
physical_select_aggregates_from_200                   1.01     16.8±0.09ms        ? ?/sec    1.00     16.6±0.05ms        ? ?/sec
physical_select_all_from_1000                         1.00     23.8±0.07ms        ? ?/sec    1.01     23.9±0.11ms        ? ?/sec
physical_select_one_from_700                          1.00   1070.3±4.38µs        ? ?/sec    1.01   1080.0±7.26µs        ? ?/sec
physical_sorted_union_order_by_10                     1.00     13.0±0.10ms        ? ?/sec    1.00     13.0±0.06ms        ? ?/sec
physical_sorted_union_order_by_100                    1.00       2.0±0.02s        ? ?/sec    1.00       2.0±0.01s        ? ?/sec
physical_sorted_union_order_by_200                    1.01      12.6±0.13s        ? ?/sec    1.00      12.4±0.09s        ? ?/sec
physical_sorted_union_order_by_300                    1.00      38.9±0.22s        ? ?/sec    1.00      38.9±0.37s        ? ?/sec
physical_sorted_union_order_by_50                     1.00    379.9±3.58ms        ? ?/sec    1.00    381.4±3.05ms        ? ?/sec
physical_theta_join_consider_sort                     1.00  1752.5±14.52µs        ? ?/sec    1.00   1744.3±7.09µs        ? ?/sec
physical_unnest_to_join                               1.00   1301.1±9.64µs        ? ?/sec    1.01   1310.7±4.19µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00   1084.9±4.85µs        ? ?/sec    1.02   1105.6±5.89µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00      2.2±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    653.1±3.96µs        ? ?/sec    1.01    656.6±2.19µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    806.8±2.76µs        ? ?/sec    1.00    809.0±3.47µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    869.4±8.66µs        ? ?/sec    1.01    874.7±6.05µs        ? ?/sec
with_param_values_many_columns                        1.00    140.6±1.51µs        ? ?/sec    1.00    140.2±3.01µs        ? ?/sec

@pepijnve
Copy link
Contributor Author

@alamb seems like there are still some small increases here and there, but much better than before. Is it worth looking into these still or is this within the acceptable range? I can dig deeper, but I could use some guidance on how best to profile the benchmarks. Last time I tried I seemed to be getting nothing but noise and framework overhead in the flame graph.

@alamb
Copy link
Contributor

alamb commented Sep 30, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_17411 (91f225a) to 5cc0be5 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_17411
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 1, 2025

🤖: Benchmark completed

Details

group                                                 issue_17411                            main
-----                                                 -----------                            ----
logical_aggregate_with_join                           1.00    627.7±3.61µs        ? ?/sec    1.00    624.7±3.38µs        ? ?/sec
logical_plan_optimize                                 1.04     187.9±6.74s        ? ?/sec    1.00     180.3±4.39s        ? ?/sec
logical_select_all_from_1000                          1.00     10.9±0.08ms        ? ?/sec    1.01     11.0±0.10ms        ? ?/sec
logical_select_one_from_700                           1.00    416.6±1.67µs        ? ?/sec    1.00    416.7±2.08µs        ? ?/sec
logical_trivial_join_high_numbered_columns            1.00    371.3±1.42µs        ? ?/sec    1.00    371.5±1.57µs        ? ?/sec
logical_trivial_join_low_numbered_columns             1.00    357.4±1.35µs        ? ?/sec    1.00    357.1±1.51µs        ? ?/sec
physical_intersection                                 1.00    833.7±3.46µs        ? ?/sec    1.00   834.5±11.18µs        ? ?/sec
physical_join_consider_sort                           1.00   1389.5±6.86µs        ? ?/sec    1.00   1387.2±4.91µs        ? ?/sec
physical_join_distinct                                1.00    348.1±1.44µs        ? ?/sec    1.01    350.2±2.35µs        ? ?/sec
physical_many_self_joins                              1.00      9.5±0.07ms        ? ?/sec    1.00      9.5±0.04ms        ? ?/sec
physical_plan_clickbench_all                          1.02    196.9±3.74ms        ? ?/sec    1.00    193.4±2.78ms        ? ?/sec
physical_plan_clickbench_q1                           1.02      2.6±0.05ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec
physical_plan_clickbench_q10                          1.01      3.5±0.08ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_clickbench_q11                          1.00      3.7±0.06ms        ? ?/sec    1.00      3.7±0.08ms        ? ?/sec
physical_plan_clickbench_q12                          1.00      3.8±0.08ms        ? ?/sec    1.00      3.8±0.18ms        ? ?/sec
physical_plan_clickbench_q13                          1.00      3.5±0.05ms        ? ?/sec    1.02      3.5±0.14ms        ? ?/sec
physical_plan_clickbench_q14                          1.02      3.7±0.08ms        ? ?/sec    1.00      3.6±0.05ms        ? ?/sec
physical_plan_clickbench_q15                          1.02      3.6±0.10ms        ? ?/sec    1.00      3.5±0.08ms        ? ?/sec
physical_plan_clickbench_q16                          1.02      3.4±0.08ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_clickbench_q17                          1.00      3.5±0.07ms        ? ?/sec    1.00      3.5±0.06ms        ? ?/sec
physical_plan_clickbench_q18                          1.02      3.0±0.06ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q19                          1.01      3.9±0.09ms        ? ?/sec    1.00      3.9±0.09ms        ? ?/sec
physical_plan_clickbench_q2                           1.02      3.1±0.08ms        ? ?/sec    1.00      3.0±0.10ms        ? ?/sec
physical_plan_clickbench_q20                          1.02      2.7±0.06ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q21                          1.03      3.1±0.05ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
physical_plan_clickbench_q22                          1.02      3.7±0.06ms        ? ?/sec    1.00      3.6±0.06ms        ? ?/sec
physical_plan_clickbench_q23                          1.02      4.0±0.08ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q24                          1.02      4.5±0.09ms        ? ?/sec    1.00      4.3±0.06ms        ? ?/sec
physical_plan_clickbench_q25                          1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q26                          1.03      3.0±0.06ms        ? ?/sec    1.00      2.9±0.05ms        ? ?/sec
physical_plan_clickbench_q27                          1.02      3.3±0.14ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_clickbench_q28                          1.03      4.1±0.10ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q29                          1.03      4.5±0.13ms        ? ?/sec    1.00      4.3±0.05ms        ? ?/sec
physical_plan_clickbench_q3                           1.02      3.0±0.07ms        ? ?/sec    1.00      2.9±0.06ms        ? ?/sec
physical_plan_clickbench_q30                          1.02     14.6±0.22ms        ? ?/sec    1.00     14.3±0.27ms        ? ?/sec
physical_plan_clickbench_q31                          1.01      4.0±0.11ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q32                          1.01      4.0±0.09ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q33                          1.02      3.5±0.07ms        ? ?/sec    1.00      3.4±0.09ms        ? ?/sec
physical_plan_clickbench_q34                          1.01      3.1±0.05ms        ? ?/sec    1.00      3.1±0.03ms        ? ?/sec
physical_plan_clickbench_q35                          1.00      3.3±0.05ms        ? ?/sec    1.00      3.3±0.05ms        ? ?/sec
physical_plan_clickbench_q36                          1.02      4.0±0.07ms        ? ?/sec    1.00      3.9±0.08ms        ? ?/sec
physical_plan_clickbench_q37                          1.03      4.2±0.10ms        ? ?/sec    1.00      4.1±0.12ms        ? ?/sec
physical_plan_clickbench_q38                          1.04      4.2±0.15ms        ? ?/sec    1.00      4.1±0.08ms        ? ?/sec
physical_plan_clickbench_q39                          1.04      4.0±0.08ms        ? ?/sec    1.00      3.8±0.08ms        ? ?/sec
physical_plan_clickbench_q4                           1.02      2.7±0.06ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_clickbench_q40                          1.05      4.8±0.13ms        ? ?/sec    1.00      4.5±0.14ms        ? ?/sec
physical_plan_clickbench_q41                          1.07      4.3±0.14ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_clickbench_q42                          1.03      4.2±0.08ms        ? ?/sec    1.00      4.0±0.18ms        ? ?/sec
physical_plan_clickbench_q43                          1.07      4.6±0.07ms        ? ?/sec    1.00      4.2±0.09ms        ? ?/sec
physical_plan_clickbench_q44                          1.00      2.8±0.04ms        ? ?/sec    1.01      2.8±0.04ms        ? ?/sec
physical_plan_clickbench_q45                          1.03      2.8±0.06ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q46                          1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_clickbench_q47                          1.01      3.9±0.10ms        ? ?/sec    1.00      3.8±0.05ms        ? ?/sec
physical_plan_clickbench_q48                          1.05      4.7±0.16ms        ? ?/sec    1.00      4.5±0.06ms        ? ?/sec
physical_plan_clickbench_q49                          1.02      4.9±0.08ms        ? ?/sec    1.00      4.8±0.09ms        ? ?/sec
physical_plan_clickbench_q5                           1.01      3.0±0.07ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
physical_plan_clickbench_q50                          1.00      4.6±0.17ms        ? ?/sec    1.00      4.6±0.17ms        ? ?/sec
physical_plan_clickbench_q51                          1.00      3.3±0.08ms        ? ?/sec    1.00      3.3±0.07ms        ? ?/sec
physical_plan_clickbench_q6                           1.04      3.0±0.09ms        ? ?/sec    1.00      2.9±0.06ms        ? ?/sec
physical_plan_clickbench_q7                           1.03      2.6±0.05ms        ? ?/sec    1.00      2.5±0.03ms        ? ?/sec
physical_plan_clickbench_q8                           1.02      3.6±0.12ms        ? ?/sec    1.00      3.5±0.04ms        ? ?/sec
physical_plan_clickbench_q9                           1.01      3.4±0.07ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_tpcds_all                               1.04   1039.4±4.58ms        ? ?/sec    1.00    997.5±3.07ms        ? ?/sec
physical_plan_tpch_all                                1.04     65.3±0.48ms        ? ?/sec    1.00     62.5±0.29ms        ? ?/sec
physical_plan_tpch_q1                                 1.00      2.0±0.01ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q10                                1.05      4.0±0.02ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
physical_plan_tpch_q11                                1.06      3.5±0.03ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                                1.00  1828.9±10.34µs        ? ?/sec    1.00   1821.2±7.81µs        ? ?/sec
physical_plan_tpch_q13                                1.00   1481.4±5.16µs        ? ?/sec    1.00   1479.8±9.30µs        ? ?/sec
physical_plan_tpch_q14                                1.00   1992.9±9.73µs        ? ?/sec    1.00   1989.7±8.90µs        ? ?/sec
physical_plan_tpch_q16                                1.01      2.5±0.02ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q17                                1.06      2.6±0.01ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
physical_plan_tpch_q18                                1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
physical_plan_tpch_q19                                1.02      3.3±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q2                                 1.07      5.9±0.04ms        ? ?/sec    1.00      5.5±0.03ms        ? ?/sec
physical_plan_tpch_q20                                1.03      3.2±0.02ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_tpch_q21                                1.07      4.3±0.01ms        ? ?/sec    1.00      4.0±0.01ms        ? ?/sec
physical_plan_tpch_q22                                1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q3                                 1.05      2.7±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q4                                 1.01   1527.1±6.27µs        ? ?/sec    1.00   1515.2±3.40µs        ? ?/sec
physical_plan_tpch_q5                                 1.03      3.3±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q6                                 1.01    876.4±6.89µs        ? ?/sec    1.00   869.2±10.30µs        ? ?/sec
physical_plan_tpch_q7                                 1.01      4.3±0.02ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
physical_plan_tpch_q8                                 1.08      5.6±0.04ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
physical_plan_tpch_q9                                 1.00      4.1±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec
physical_select_aggregates_from_200                   1.01     16.9±0.08ms        ? ?/sec    1.00     16.6±0.06ms        ? ?/sec
physical_select_all_from_1000                         1.00     23.8±0.08ms        ? ?/sec    1.01     23.9±0.08ms        ? ?/sec
physical_select_one_from_700                          1.00   1073.4±5.75µs        ? ?/sec    1.00   1076.4±6.68µs        ? ?/sec
physical_sorted_union_order_by_10                     1.00     13.1±0.10ms        ? ?/sec    1.01     13.1±0.07ms        ? ?/sec
physical_sorted_union_order_by_100                    1.00       2.0±0.02s        ? ?/sec    1.00       2.0±0.02s        ? ?/sec
physical_sorted_union_order_by_200                    1.01      12.6±0.11s        ? ?/sec    1.00      12.5±0.10s        ? ?/sec
physical_sorted_union_order_by_300                    1.01      38.9±0.20s        ? ?/sec    1.00      38.5±0.16s        ? ?/sec
physical_sorted_union_order_by_50                     1.01    383.1±4.72ms        ? ?/sec    1.00    380.0±3.23ms        ? ?/sec
physical_theta_join_consider_sort                     1.00   1743.6±6.29µs        ? ?/sec    1.00   1744.2±7.83µs        ? ?/sec
physical_unnest_to_join                               1.00   1304.5±8.48µs        ? ?/sec    1.00   1298.7±4.81µs        ? ?/sec
physical_window_function_partition_by_12_on_values    1.00   1091.7±7.31µs        ? ?/sec    1.00  1096.4±17.47µs        ? ?/sec
physical_window_function_partition_by_30_on_values    1.00      2.2±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
physical_window_function_partition_by_4_on_values     1.00    653.3±3.21µs        ? ?/sec    1.01    657.9±5.21µs        ? ?/sec
physical_window_function_partition_by_7_on_values     1.00    811.3±5.27µs        ? ?/sec    1.01    823.3±2.47µs        ? ?/sec
physical_window_function_partition_by_8_on_values     1.00    871.7±4.97µs        ? ?/sec    1.00    873.0±4.95µs        ? ?/sec
with_param_values_many_columns                        1.01    140.3±1.53µs        ? ?/sec    1.00    139.4±1.81µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 1, 2025

I am looking at this a little more closely nwo

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and reviewed this PR carefully and I don't think it causes any slowdowns. I also spent some time profiling locally using

cargo bench --profile=profiling --bench sql_planner --  --bench  physical_plan_clickbench_q43

And could not see any difference

I also then prevented inlining of the potential functions

index a4ed8187c..bd73b5488 100644
--- a/datafusion/physical-expr/src/equivalence/projection.rs
+++ b/datafusion/physical-expr/src/equivalence/projection.rs
@@ -211,6 +211,7 @@ pub fn project_orderings(
 /// schema only contains b and not a. The result will be `Some([col("a@0")])`. In other
 /// words, the column reference is reindexed to match the projected schema.
 /// If neither a nor b is present, the result will be None.
+#[inline(never)]
 pub fn project_ordering(
     ordering: &LexOrdering,
     schema: &SchemaRef,
diff --git a/datafusion/physical-expr/src/physical_expr.rs b/datafusion/physical-expr/src/physical_expr.rs
index 2cc484ec6..02b75a0b8 100644
--- a/datafusion/physical-expr/src/physical_expr.rs
+++ b/datafusion/physical-expr/src/physical_expr.rs
@@ -163,6 +163,7 @@ pub fn create_ordering(
 }

 /// Creates a vector of [LexOrdering] from a vector of logical expression
+#[inline(never)]
 pub fn create_lex_ordering(
     schema: &SchemaRef,
     sort_order: &[Vec<SortExpr>],

ANd didn't see them show up any any of the traces when I ran it like

samply record -- target/profiling/deps/sql_planner-1adcb045f71bd635  --bench  physical_plan_clickbench_q43
Screenshot 2025-10-01 at 2 32 42 PM

Thus I conclude this PR is good to go and I am going to merge it in.

Thank you for your patience and help @pepijnve

@alamb alamb added this pull request to the merge queue Oct 1, 2025
Merged via the queue into apache:main with commit 97e00ef Oct 1, 2025
28 checks passed
@alamb alamb mentioned this pull request Oct 1, 2025
@pepijnve
Copy link
Contributor Author

pepijnve commented Oct 3, 2025

samply record -- target/profiling/deps/sql_planner-1adcb045f71bd635 --bench physical_plan_clickbench_q43

I'll try samply next time. Was that on your macOS machine or on Linux? I tried essentially the same using the flamegraph crate, but the graph I got out of that (on macOS at least) didn't make much sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External table with complex order expression not allowed
3 participants