Skip to content

Conversation

Pajaraja
Copy link
Contributor

@Pajaraja Pajaraja commented Aug 5, 2025

What changes were proposed in this pull request?

Introduce LimitAll for LIMIT ALL node that gets pushed into UnionAll to have unbounded number of rows returned by the recursion.

Why are the changes needed?

LIMIT should override the recursion row limit, so LIMIT ALL should remove this limit. Currently LIMIT ALL is completely no-op (it doesn't create any) node. We introduce this new node and propagate it through its subtree into any UnionLoop.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests in LimitPushdownSuite, golden file test in cte-recursion. Existing golden file tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Aug 5, 2025
@violetnspct
Copy link

@Pajaraja Do you need to do the comparison tests against Snowflake and PostgreSQL? The current changes modify how limits are handled in recursive CTEs, which could affect query results and compatibility with other databases.

@HyukjinKwon HyukjinKwon changed the title [SPARK-53127] Enable LIMIT ALL to override recursion row limit [SPARK-53127][SQL] Enable LIMIT ALL to override recursion row limit Aug 12, 2025
@Pajaraja
Copy link
Contributor Author

@Pajaraja Do you need to do the comparison tests against Snowflake and PostgreSQL? The current changes modify how limits are handled in recursive CTEs, which could affect query results and compatibility with other databases.

I think this won't affect compatibility since LIMIT ALL should be completely no op in these, but they don't have the same row limit as spark.

LimitAll(withOffset)
} else {
withOffset.optional(limit) {
if (forPipeOperators && clause.nonEmpty && clause != PipeOperators.offsetClause) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid duplicated code:

if (ctx.LIMIIT != null) {
  if (forPipeOperators ...) ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

@@ -1659,6 +1659,15 @@ case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends OrderPr
copy(child = newChild)
}

case class LimitAll(child: LogicalPlan) extends OrderPreservingUnaryNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add some code comment to explain it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be a regular unary node, not OrderPreservingUnaryNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, changed parent class.

applyLimitAllToPlan(la.child, isInLimitAll = true)
case cteRef: CTERelationRef if isInLimitAll =>
cteRef.copy(isUnlimitedRecursion = Some(true))
case other =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we define an allow list? It seems wrong to propagate LimitAll all the way down to CTERelationRef, even though there are plan nodes like Sort in the middle that breaks Limit semantic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an allowlist based on the LimitPushDown optimizer rule and added Filter. Is there any other nodes we should add?

cteRef.copy(isUnlimitedRecursion = Some(true))
// Allow-list for pushing down Limit All.
case _: Project | _: Filter | _: Join | _: Union | _: Offset |
_: BatchEvalPython | _: ArrowEvalPython | _: SubqueryAlias =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubqueryAlias is removed before optimizer kicks in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this happens in analyzer, right after CTESubstitution, so we have to include it.

throw QueryParsingErrors.multipleQueryResultClausesWithPipeOperatorsUnsupportedError(
ctx, clause, PipeOperators.limitClause)
}
if (forPipeOperators && clause.nonEmpty && clause != PipeOperators.offsetClause) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only fail if limit != null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -194,7 +194,8 @@ case class CTERelationRef(
override val isStreaming: Boolean,
statsOpt: Option[Statistics] = None,
recursive: Boolean = false,
override val maxRows: Option[Long] = None) extends LeafNode with MultiInstanceRelation {
override val maxRows: Option[Long] = None,
isUnlimitedRecursion: Option[Boolean] = None) extends LeafNode with MultiInstanceRelation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means we have 3 values: None, Some(true), Some(false). Is it necessary and what does each of them mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did optional to avoid changing golden files. I've now changed it to a single boolean.

@@ -4281,6 +4282,29 @@ object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
}
}

object ApplyLimitAll extends Rule[LogicalPlan] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it to a new file?

@@ -4281,6 +4282,29 @@ object RemoveTempResolvedColumn extends Rule[LogicalPlan] {
}
}

object ApplyLimitAll extends Rule[LogicalPlan] {
def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {
private def applyLimitAllToPlan(plan: LogicalPlan, isInLimitAll: Boolean = false): LogicalPlan = {

@@ -189,14 +189,20 @@ case class InlineCTE(

case ref: CTERelationRef =>
val refInfo = cteMap(ref.cteId)

val cteBody = if (ref.isUnlimitedRecursion) {
setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)
setUnlimitedRecursion(refInfo.cteDef.child, ref.cteId)

@@ -97,6 +97,37 @@ SELECT * FROM t LIMIT 60;

DROP VIEW ZeroAndOne;

-- limited recursion allowed to stop from failing by putting LIMIT ALL
WITH RECURSIVE t(n) MAX RECURSION LEVEL 100 AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so LIMIT ALL can override user-specified MAX RECURSION LEVEL ?

@@ -786,6 +786,7 @@ class ParametersSuite extends QueryTest with SharedSparkSession {
}

test("SPARK-50892: parameterized identifier inside a recursive CTE") {
spark.conf.set("spark.sql.cteRecursionRowLimit", "50")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use withSQLConf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants