Skip to content

[SPARK-53094][SQL][4.0] Fix CUBE with aggregate containing HAVING clauses #51854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

peter-toth
Copy link
Contributor

What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved UnresolvedHaving nodes from ResolveGroupingAnalytics:

=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]

to ResolveAggregateFunctions to add the correct aggregate expressions (count(product#261)):

=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]

Why are the changes needed?

Fix a correctness isue described in #51810.

Does this PR introduce any user-facing change?

Yes, it fixes a correctness issue.

How was this patch tested?

Added new UT from #51810.

Was this patch authored or co-authored using generative AI tooling?

No.

…uses

This is an alternative PR to apache#51810 to fix a regresion introduced in Spark 3.2 with apache#32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

Fix a correctness isue described in apache#51810.

Yes, it fixes a correctness issue.

Added new UT from apache#51810.

No.

Closes apache#51820 from peter-toth/SPARK-53094-fix-cube-having.

Lead-authored-by: Peter Toth <[email protected]>
Co-authored-by: harris233 <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
@peter-toth
Copy link
Contributor Author

cc @dongjoon-hyun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

peter-toth added a commit that referenced this pull request Aug 6, 2025
…uses

### What changes were proposed in this pull request?

This is an alternative PR to #51810 to fix a regresion introduced in Spark 3.2 with #32470.
This PR defers the resolution of not fully resolved `UnresolvedHaving` nodes from `ResolveGroupingAnalytics`:
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics ===
 'Sort ['s DESC NULLS LAST], true                                                                                               'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving ('count('product) > 2)                                                                                    +- 'UnresolvedHaving ('count(tempresolvedcolumn(product#261, product, false)) > 2)
!   +- 'Aggregate [cube(Vector(0), Vector(1), product#261, region#262)], [product#261, region#262, sum(amount#263) AS s#264L]      +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]
!      +- SubqueryAlias t                                                                                                             +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!         +- LocalRelation [product#261, region#262, amount#263]                                                                         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!                                                                                                                                           +- SubqueryAlias t
!                                                                                                                                              +- LocalRelation [product#261, region#262, amount#263]
```
to `ResolveAggregateFunctions` to add the correct aggregate expressions (`count(product#261)`):
```
=== Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions ===
 'Sort ['s DESC NULLS LAST], true                                                                                                                                                                                                                                                                                                                             'Sort ['s DESC NULLS LAST], true
!+- 'UnresolvedHaving (count(tempresolvedcolumn(product#261, product, false)) > cast(2 as bigint))                                                                                                                                                                                                                                                            +- Project [product#269, region#270, s#264L]
!   +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L]                                                                                                                                                                                                                                         +- Filter (count(product)#272L > cast(2 as bigint))
!      +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]         +- Aggregate [product#269, region#270, spark_grouping_id#268L], [product#269, region#270, sum(amount#263) AS s#264L, count(product#261) AS count(product)#272L]
!         +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]                                                                                                                                                                                                                                                       +- Expand [[product#261, region#262, amount#263, product#266, region#267, 0], [product#261, region#262, amount#263, product#266, null, 1], [product#261, region#262, amount#263, null, region#267, 2], [product#261, region#262, amount#263, null, null, 3]], [product#261, region#262, amount#263, product#269, region#270, spark_grouping_id#268L]
!            +- SubqueryAlias t                                                                                                                                                                                                                                                                                                                                           +- Project [product#261, region#262, amount#263, product#261 AS product#266, region#262 AS region#267]
!               +- LocalRelation [product#261, region#262, amount#263]                                                                                                                                                                                                                                                                                                       +- SubqueryAlias t
!                                                                                                                                                                                                                                                                                                                                                                               +- LocalRelation [product#261, region#262, amount#263]
```

### Why are the changes needed?

Fix a correctness isue described in #51810.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a correctness issue.

### How was this patch tested?

Added new UT from #51810.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51854 from peter-toth/SPARK-53094-fix-cube-having-4.0.

Authored-by: Peter Toth <[email protected]>
Signed-off-by: Peter Toth <[email protected]>
@peter-toth
Copy link
Contributor Author

Thank you all for the review.

Merged to branch-4.0 (4.0.1).

@peter-toth peter-toth closed this Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants