[SPARK-53127][SQL] Enable LIMIT ALL to override recursion row limit #51847

Pajaraja · 2025-08-05T14:16:31Z

What changes were proposed in this pull request?

Introduce LimitAll for LIMIT ALL node that gets pushed into UnionAll to have unbounded number of rows returned by the recursion.

Why are the changes needed?

LIMIT should override the recursion row limit, so LIMIT ALL should remove this limit. Currently LIMIT ALL is completely no-op (it doesn't create any) node. We introduce this new node and propagate it through its subtree into any UnionLoop.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests in LimitPushdownSuite, golden file test in cte-recursion. Existing golden file tests.

Was this patch authored or co-authored using generative AI tooling?

No.

violetnspct · 2025-08-05T20:10:12Z

@Pajaraja Do you need to do the comparison tests against Snowflake and PostgreSQL? The current changes modify how limits are handled in recursive CTEs, which could affect query results and compatibility with other databases.

…essions/LimitAllExpr.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/LimitAllExpr.scala

Pajaraja · 2025-08-14T16:32:40Z

@Pajaraja Do you need to do the comparison tests against Snowflake and PostgreSQL? The current changes modify how limits are handled in recursive CTEs, which could affect query results and compatibility with other databases.

I think this won't affect compatibility since LIMIT ALL should be completely no op in these, but they don't have the same row limit as spark.

cloud-fan · 2025-09-02T09:49:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+      LimitAll(withOffset)
+    } else {
+      withOffset.optional(limit) {
+        if (forPipeOperators && clause.nonEmpty && clause != PipeOperators.offsetClause) {


to avoid duplicated code:

if (ctx.LIMIIT != null) { if (forPipeOperators ...) ... }

cloud-fan · 2025-09-02T09:50:10Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -1659,6 +1659,15 @@ case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends OrderPr
    copy(child = newChild)
 }

+case class LimitAll(child: LogicalPlan) extends OrderPreservingUnaryNode {


let's add some code comment to explain it.

it should be a regular unary node, not OrderPreservingUnaryNode

Added comment, changed parent class.

cloud-fan · 2025-09-02T10:10:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        applyLimitAllToPlan(la.child, isInLimitAll = true)
+      case cteRef: CTERelationRef if isInLimitAll =>
+        cteRef.copy(isUnlimitedRecursion = Some(true))
+      case other =>


shall we define an allow list? It seems wrong to propagate LimitAll all the way down to CTERelationRef, even though there are plan nodes like Sort in the middle that breaks Limit semantic.

I made an allowlist based on the LimitPushDown optimizer rule and added Filter. Is there any other nodes we should add?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala

cloud-fan · 2025-09-04T10:03:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+        cteRef.copy(isUnlimitedRecursion = Some(true))
+      // Allow-list for pushing down Limit All.
+      case _: Project | _: Filter | _: Join | _: Union | _: Offset |
+           _: BatchEvalPython | _: ArrowEvalPython | _: SubqueryAlias =>


SubqueryAlias is removed before optimizer kicks in

But this happens in analyzer, right after CTESubstitution, so we have to include it.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala

cloud-fan · 2025-09-04T10:09:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

-        throw QueryParsingErrors.multipleQueryResultClausesWithPipeOperatorsUnsupportedError(
-          ctx, clause, PipeOperators.limitClause)
-      }
+    if (forPipeOperators && clause.nonEmpty && clause != PipeOperators.offsetClause) {


we should only fail if limit != null

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

cloud-fan · 2025-09-04T10:12:42Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala

@@ -194,7 +194,8 @@ case class CTERelationRef(
    override val isStreaming: Boolean,
    statsOpt: Option[Statistics] = None,
    recursive: Boolean = false,
-    override val maxRows: Option[Long] = None) extends LeafNode with MultiInstanceRelation {
+    override val maxRows: Option[Long] = None,
+    isUnlimitedRecursion: Option[Boolean] = None) extends LeafNode with MultiInstanceRelation {


this means we have 3 values: None, Some(true), Some(false). Is it necessary and what does each of them mean?

I initially did optional to avoid changing golden files. I've now changed it to a single boolean.

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala

…en golden files

cloud-fan · 2025-09-11T08:27:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -4281,6 +4282,29 @@ object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
  }
 }

+object ApplyLimitAll extends Rule[LogicalPlan] {


can we move it to a new file?

cloud-fan · 2025-09-11T08:27:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -4281,6 +4282,29 @@ object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
  }
 }

+object ApplyLimitAll extends Rule[LogicalPlan] {
+  def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {


Suggested change

def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {

private def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {

cloud-fan · 2025-09-11T08:27:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala

@@ -189,14 +189,20 @@ case class InlineCTE(

      case ref: CTERelationRef =>
        val refInfo = cteMap(ref.cteId)
+
+        val cteBody = if (ref.isUnlimitedRecursion) {
+            setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)


Suggested change

setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)

setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)

cloud-fan · 2025-09-11T08:29:37Z

sql/core/src/test/resources/sql-tests/inputs/cte-recursion.sql

@@ -97,6 +97,37 @@ SELECT * FROM t LIMIT 60;

 DROP VIEW ZeroAndOne;

+-- limited recursion allowed to stop from failing by putting LIMIT ALL
+WITH RECURSIVE t(n) MAX RECURSION LEVEL 100 AS (


so LIMIT ALL can override user-specified MAX RECURSION LEVEL ?

cloud-fan · 2025-09-11T08:30:01Z

sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala

@@ -786,6 +786,7 @@ class ParametersSuite extends QueryTest with SharedSparkSession {
  }

  test("SPARK-50892: parameterized identifier inside a recursive CTE") {
+    spark.conf.set("spark.sql.cteRecursionRowLimit", "50")


use withSQLConf

[SPARK-53127] Enable LIMIT ALL to override recursion row limit

afc6059

github-actions bot added the SQL label Aug 5, 2025

replace LimitAll node with custom expr

a4198dd

HyukjinKwon changed the title ~~[SPARK-53127] Enable LIMIT ALL to override recursion row limit~~ [SPARK-53127][SQL] Enable LIMIT ALL to override recursion row limit Aug 12, 2025

Update sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expr…

fb22de5

…essions/LimitAllExpr.scala

HyukjinKwon reviewed Aug 12, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/LimitAllExpr.scala Outdated Show resolved Hide resolved

change to unevaluable, fix limit all offset bug, regen golden files

22bcc52

pavle-martinovic_data added 3 commits August 28, 2025 15:38

Merge branch 'master' into LimitAllUnionLoop

4d0ef2a

replace limitAll to get resolved in analysis

91f2d0d

remove old tests, add new, fix

9200bdd

cloud-fan reviewed Sep 2, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 2, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Sep 3, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala Outdated Show resolved Hide resolved

make changes according to Wenchen's comments

645dc3b