Eliminate all redundant aggregations #17139

findepi · 2025-08-12T09:29:17Z

Before the change, it was disallowed to have an aggregation without
GROUP BY and without any aggregate functions. This prevented the
optimizer from removing each redundant aggregation if all were
redundant. The first one would always be retained.

This PR removes this optimizer limitation, if a global aggregation has
all aggregate functions redundant, it's replaced with 1-row VALUES.

fixes Redundant aggregation elimination regression #17138

Before the change, it was disallowed to have an aggregation without GROUP BY and without any aggregate functions. This prevented the optimizer from removing each redundant aggregation if all were redundant. The first one would always be retained. This commit removes the limitation, allowing for queries to be further optimized.

findepi · 2025-08-12T15:04:54Z

datafusion/optimizer/src/optimize_projections/mod.rs

-          Projection:
-            Aggregate: groupBy=[[]], aggr=[[count(Int32(1))]]
-              TableScan: ?table? projection=[]
+          EmptyRelation


table scan gets replaced with 1-row VALUES (that would be more visible if we merge #17145)

we could further eliminate count(*) on top of a relation with known cardinality. follow-up potential

alamb

Thank you @findepi -- this seems like an improvement to me

I have some suggestions but nothing I think is required before merging

alamb · 2025-08-12T14:30:07Z

datafusion/optimizer/src/optimize_projections/mod.rs

+            let new_aggr_expr = aggregate_reqs.get_at_indices(&aggregate.aggr_expr);
+
+            if new_group_bys.is_empty() && new_aggr_expr.is_empty() {
+                // Global aggregation with no aggregate functions always produces 1 row and no columns.


this seems like a good rule to have, though perhaps it would be eaiser to find if it were with related rules for aggregates, for example

https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/eliminate_group_by_constant.rs

IMO here is a natural place to place this logic. We're pruning and we need to somehow handle the case where we prune the last aggregate function. The code should do something reasonable and now it does.
alternative is to let this place create Agg with empty group by and no aggregate functions (I had so initially), and then have a separate rule that finds such trivial Aggs and replaces with VALUES.

Keeping it here just makes sense

alamb · 2025-08-12T15:22:37Z

datafusion/sqllogictest/test_files/issue_17138.slt

@@ -0,0 +1,36 @@
+statement ok


I recommend we name this file more specifically, or put it in an existing function, perhaps aggregate.slt ?

Not a blocker, just a suggestion

This showed up as a regression so i named this file as a regression test.
We have regression tests stucked in other files, but this creates risk of inadvertently changing the scenario and so thus breaking the regression properties of a test. They are better of isolated.
Would it help if this file is moved under a directory for just regression cases?

Yeah, maybe that would be better. But if there is already precedent for regression cases perhaps we can leave this PR as is and move it in a subsequent PR

alamb · 2025-08-12T15:23:10Z

datafusion/sqllogictest/test_files/subquery.slt

-04)----Projection:
-05)------Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
-06)--------TableScan: t2 projection=[]
+04)----EmptyRelation


seems like an improvement to me

comphead · 2025-08-12T18:11:45Z

datafusion/optimizer/src/optimize_projections/mod.rs

+                return Ok(Transformed::yes(LogicalPlan::EmptyRelation(
+                    EmptyRelation {
+                        produce_one_row: true,
+                        schema: Arc::new(DFSchema::empty()),


not related to this PR, but why we need schema for EmptyRelation, the structure description presumes the schema always empty?

/// Produces no rows: An empty relation with an empty schema #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub struct EmptyRelation

We probably can simplify using this relationship

UPD: actually the description is not accurate, EmptyRelation used with non empty schema in joins and some other places, we need to update the description

Here we don't produce any symbols, but maybe we use EmptyRelation also for other cases?
For exmple SELECT a, b, c FROM t WHERE false could be replaced with EmptyRelation, but something needs to project a, b, c symbols. It can be a project above EmptyRelation or part of EmptyRelation itself.

A different question -- why do we have both, EmptyRelation and Values?
The latter is strictly more generic, without any visible downsides. It even has a better name (EmptyRelation is only conditionally an empty relation).
Can we deprecate EmptyRelation and replace with Values?

Both structs have much in common, but EmptyRelation also serves for producing 0 rows, not sure if that can work for Values?

Although EmptyRelation with produce_one_row == true that probably feasible to replace with Values 🤔

Both structs have much in common, but EmptyRelation also serves for producing 0 rows, not sure if that can work for Values?

Why not?
SQL doesn't allow empty values, but I see no reason for LP not to allow them.

findepi · 2025-08-12T20:30:24Z

Merging to unblock

Update tests due to new simplification rules datafusion-testing#10.

findepi added the performance Make DataFusion faster label Aug 12, 2025

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Aug 12, 2025

Regression test

53a201d

findepi force-pushed the findepi/eliminate-all-redundant-aggs branch from f3601c2 to cb49bfc Compare August 12, 2025 09:33

findepi mentioned this pull request Aug 12, 2025

Redundant aggregation elimination regression #17138

Closed

findepi force-pushed the findepi/eliminate-all-redundant-aggs branch from cb49bfc to 4c92ae8 Compare August 12, 2025 09:56

github-actions bot added the core Core DataFusion crate label Aug 12, 2025

findepi force-pushed the findepi/eliminate-all-redundant-aggs branch from 4c92ae8 to 9e02646 Compare August 12, 2025 10:36

findepi mentioned this pull request Aug 12, 2025

Update tests due to new simplification rules apache/datafusion-testing#10

Merged

findepi requested review from alamb and comphead August 12, 2025 11:21

fixup! Eliminate all redundant aggregations

c79cd3e

findepi commented Aug 12, 2025

View reviewed changes

alamb approved these changes Aug 12, 2025

View reviewed changes

comphead reviewed Aug 12, 2025

View reviewed changes

alamb mentioned this pull request Aug 12, 2025

chore: Clarify EmptyRelation description #17157

Merged

findepi merged commit b786b9a into apache:main Aug 12, 2025
27 checks passed

findepi deleted the findepi/eliminate-all-redundant-aggs branch August 12, 2025 20:30

findepi mentioned this pull request Aug 12, 2025

Differentiate 0-row and 1-row EmptyRelation in EXPLAIN #17145

Merged

Eliminate all redundant aggregations #17139

Eliminate all redundant aggregations #17139

Uh oh!

Conversation

findepi commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

findepi commented Aug 12, 2025 •

edited

Loading