Skip to content

Unnest Correlated Subquery #17110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 204 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
204 commits
Select commit Hold shift + click to select a range
4ba36c0
chore: add test
duongcongtoai Feb 3, 2025
79eaca3
chore: more progress
duongcongtoai Feb 10, 2025
7ed0831
temp
duongcongtoai Mar 18, 2025
cc97879
Merge remote-tracking branch 'origin/main' into 14554-unnest-subquery…
duongcongtoai Mar 18, 2025
5096937
Merge remote-tracking branch 'origin/main' into 14554-unnest-subquery…
duongcongtoai Apr 10, 2025
68fd9ca
chore: some work
duongcongtoai Apr 16, 2025
ace332e
chore: some work on indexed algebra
duongcongtoai Apr 27, 2025
da8980c
chore: more progress
duongcongtoai May 4, 2025
483e3ac
chore: impl projection pull up
duongcongtoai May 4, 2025
f14b145
chore: complete unnesting simple subquery
duongcongtoai May 6, 2025
0cd8143
chore: correct join condition
duongcongtoai May 8, 2025
cc3e01c
chore: handle exist query
duongcongtoai May 8, 2025
9b5daa2
test: in sq test
duongcongtoai May 10, 2025
f26baf8
test: exist with no dependent column
duongcongtoai May 10, 2025
37852c1
test: exist with dependent columns
duongcongtoai May 10, 2025
2544478
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai May 10, 2025
e984a55
chore: remove redundant clone
duongcongtoai May 11, 2025
94aba08
feat: dummy implementation for aggregation
duongcongtoai May 13, 2025
0f039fe
feat: handle count bug
duongcongtoai May 15, 2025
898bdc4
feat: add sq alias step
duongcongtoai May 16, 2025
1a600b6
test: simple count decorrelate
duongcongtoai May 16, 2025
6ce21b3
chore: some work to support multiple subqueries per level
duongcongtoai May 17, 2025
67923d4
feat: support multiple subqueries decorrelation untested
duongcongtoai May 19, 2025
64538cc
feat: correct node rewriting rule
duongcongtoai May 19, 2025
957403f
fix: subquery alias
duongcongtoai May 19, 2025
a465459
fix: adjust test case expectation
duongcongtoai May 19, 2025
479ae64
feat: convert sq to dependent joins
duongcongtoai May 24, 2025
2171e52
feat: impl dependent join rewriter
duongcongtoai May 24, 2025
9d26437
chore: clean up unused function
duongcongtoai May 24, 2025
24d1223
chore: clean up debug slt
duongcongtoai May 24, 2025
3533cd1
chore: simple logical plan type for dependent join
duongcongtoai May 24, 2025
e1002f8
fix: recursive dependent join rewrite
duongcongtoai May 24, 2025
7ba92f1
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai May 24, 2025
e3c77d6
chore: some more note on further implementation
duongcongtoai May 24, 2025
1ae0926
chore: lint
duongcongtoai May 24, 2025
d15c2aa
chore: clippy
duongcongtoai May 24, 2025
e5baf2c
fix: test
duongcongtoai May 25, 2025
11dbb80
doc: draw diagram
duongcongtoai May 25, 2025
5856213
fix: proto
duongcongtoai May 25, 2025
a3f11a8
chore: revert unrelated change
duongcongtoai May 25, 2025
e2d9d14
chore: lint
duongcongtoai May 25, 2025
b298426
fix: subtrait
duongcongtoai May 25, 2025
cb1a757
fix: subtrait again
duongcongtoai May 25, 2025
baef066
fix: fail test
duongcongtoai May 25, 2025
a07b3b0
chore: clippy
duongcongtoai May 25, 2025
2a828ed
fix: allow OuterRefColumn for non-adjacent outer relation
duongcongtoai May 25, 2025
dea0b70
fix: accidentally pushdown filter with subquery
duongcongtoai May 25, 2025
5ed2d24
chore: clippy
duongcongtoai May 25, 2025
c2caf37
chore: rm debug details
duongcongtoai May 25, 2025
cec566a
fix: breaking changes
duongcongtoai May 25, 2025
699424d
fix: lateral join losing its outer ref columns
duongcongtoai May 25, 2025
4edaf61
test: more test case for other decorrelation
duongcongtoai May 25, 2025
244a778
doc: better comments
duongcongtoai May 26, 2025
32db3a9
chore: add depth and data_type to correlated columns
duongcongtoai May 26, 2025
50d26f3
chore: rm snapshot
duongcongtoai May 26, 2025
b09e370
Merge branch 'main' into 14554-subquery-unnest-framework
duongcongtoai May 26, 2025
28dc7a4
feat: support alias and join
duongcongtoai May 26, 2025
cf830cb
feat: add lateral join fields to dependent join
duongcongtoai May 26, 2025
95994da
feat: rewrite lateral join
duongcongtoai May 27, 2025
9745a4f
feat: rewrite projection
duongcongtoai May 28, 2025
c2bf4d3
refactor: split rewrite logic
duongcongtoai May 28, 2025
c083501
feat: impl other api of logical plan for dependent join
duongcongtoai May 28, 2025
9512ccc
chore: rm debug file
duongcongtoai May 28, 2025
4f99adb
Merge branch 'plann-recursive-subquery' into 14554-subquery-unnest-fr…
duongcongtoai May 29, 2025
0cf0b69
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai May 29, 2025
db8b918
chore: fix logical plan apis for dependent join
duongcongtoai May 29, 2025
8a8b10c
fix: some test
duongcongtoai May 29, 2025
98d1c27
fix: not expose subquery expr for dependentjoin
duongcongtoai May 29, 2025
080a365
Merge branch '14554-subquery-unnest-framework' into 14554-subquery-un…
duongcongtoai May 29, 2025
f75a512
chore: regen plan
duongcongtoai May 29, 2025
09cf86a
chore: dummy implementation of decorrelation
duongcongtoai Jun 1, 2025
8b6df12
chore: fix delim scan
duongcongtoai Jun 1, 2025
81fc0ef
chore: park some work
duongcongtoai Jun 2, 2025
a46a778
add LogicalPlan delim_get
irenjj Jun 2, 2025
62af637
feat: impl join expr from subquery
duongcongtoai Jun 2, 2025
1c1e4a9
Merge branch '14554-subquery-unnest-framework-fixed-planner' into del…
irenjj Jun 2, 2025
86e8acc
fix test
irenjj Jun 2, 2025
8edd44d
feat: more work on aggregation pushdown
duongcongtoai Jun 3, 2025
be56e09
fix: do not perform delim on the very left node
duongcongtoai Jun 3, 2025
3eb2ee5
feat: correctly support aggregation pushdown
duongcongtoai Jun 3, 2025
b31dfa6
chore: some more note for later impl
duongcongtoai Jun 4, 2025
350021a
chore: adjust comment
duongcongtoai Jun 4, 2025
8ae8c2c
fix conflict
irenjj Jun 4, 2025
3681104
Merge pull request #5 from irenjj/delim_get
duongcongtoai Jun 4, 2025
3061081
chore: also pushdown parent correlated columns
duongcongtoai Jun 4, 2025
1aae78a
feat: recursive query decorrelate
duongcongtoai Jun 4, 2025
496703d
fix: not expose subquery expr for dependentjoin
duongcongtoai May 29, 2025
11020e5
Merge pull request #6 from irenjj/more_plan_support_for_dependent_join
duongcongtoai Jun 5, 2025
926c916
fix: handle the case 2 tables having same col name
duongcongtoai Jun 5, 2025
7f9253b
chore: update snapshot test
duongcongtoai Jun 5, 2025
4c52eb7
fix: use indexmap for deterministic output
duongcongtoai Jun 5, 2025
a2abb7c
fix: update snapshot test
duongcongtoai Jun 5, 2025
47ace22
spilt into rewrite_dependent_join & decorrelate_dependent_join
irenjj Jun 6, 2025
32ba413
Merge pull request #7 from irenjj/split
duongcongtoai Jun 6, 2025
10f9aeb
chore: add data type to correlated column
duongcongtoai Jun 7, 2025
92bb175
fix: not expose subquery expr for dependentjoin
duongcongtoai May 29, 2025
29eff4b
spilt into rewrite_dependent_join & decorrelate_dependent_join
irenjj Jun 6, 2025
f4e332e
fix: cherry-pick conflict
duongcongtoai Jun 7, 2025
2a324bd
chore: move left over commit from feature branch
duongcongtoai Jun 7, 2025
f0c9f0b
chore: minor import format
duongcongtoai Jun 7, 2025
5e67945
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai Jun 7, 2025
e964d6e
chore: clippy
duongcongtoai Jun 7, 2025
309511c
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai Jun 7, 2025
2eb723e
fix: err msg
duongcongtoai Jun 7, 2025
b8a8de8
test: some more test cases
duongcongtoai Jun 7, 2025
a3d0b65
refactor: shared rewrite function
duongcongtoai Jun 7, 2025
8e858b4
refactor: remove all unwrap
duongcongtoai Jun 7, 2025
30300d1
fix: test expectation
duongcongtoai Jun 7, 2025
a93f901
fix subquery in join filter
irenjj Jun 8, 2025
4aed14f
rename
irenjj Jun 8, 2025
6f2ce78
add todo
irenjj Jun 8, 2025
7534a49
Merge pull request #9 from irenjj/subquery_in_join_filter
duongcongtoai Jun 8, 2025
c330c24
Merge branch 'main' into 14554-subquery-unnest-framework
duongcongtoai Jun 8, 2025
5be430a
chore: more constraint on correlated subquery in join filter
duongcongtoai Jun 8, 2025
7dc3dd9
Merge pull request #10 from duongcongtoai/dependent-join-multiple-sub…
duongcongtoai Jun 9, 2025
6b9afab
Merge remote-tracking branch 'origin/main' into 14554-subquery-unnest…
duongcongtoai Jun 9, 2025
50fbc63
Merge remote-tracking branch 'myfork/14554-subquery-unnest-framework'…
duongcongtoai Jun 9, 2025
2988b21
Merge remote-tracking branch 'myfork/14554-subquery-unnest-framework'…
duongcongtoai Jun 9, 2025
b8f10b9
add join kind: delimjoin & add deliminator
irenjj Jun 12, 2025
35fbfd7
fix build
irenjj Jun 12, 2025
478de4a
add more impl
irenjj Jun 14, 2025
5585c89
add DelimCandidateVisitor
irenjj Jun 14, 2025
b7693b0
DelimCandidateVisitor collection subplan size for every node
irenjj Jun 14, 2025
925a650
replace with apply_children
irenjj Jun 14, 2025
14a93aa
add DelimCandidateVisitor & DelimCandidatesCollector to collect all j…
irenjj Jun 14, 2025
6627434
add left child iterator for join
irenjj Jun 15, 2025
ae9f303
replace with new candidate collector
irenjj Jun 15, 2025
30d963f
remove inequality join with delim_scan
irenjj Jun 15, 2025
a177e35
construct new filter in remove_inequality_join_with_delim_scan
irenjj Jun 16, 2025
32ff734
rewrite the whole plan with candidate
irenjj Jun 16, 2025
e668554
replace old column
irenjj Jun 16, 2025
aca2e8f
remove unnecessary tests
irenjj Jun 16, 2025
ef47baf
add collect_node
irenjj Jun 18, 2025
1cda278
replace delim join with child/filter & update id
irenjj Jun 18, 2025
0a95ebc
add simple deliminator test
irenjj Jun 19, 2025
096c468
fix test
irenjj Jun 20, 2025
78ad520
add new join func
irenjj Jun 27, 2025
7973dc2
fix issues
irenjj Jun 27, 2025
baa3bf4
Merge pull request #13 from irenjj/deliminator
duongcongtoai Jun 28, 2025
ec88887
Merge remote-tracking branch 'upstream/main' into 14554-subquery-unne…
irenjj Jun 28, 2025
4541d12
Merge pull request #16 from irenjj/merge_main
duongcongtoai Jun 28, 2025
5746672
fix: wrong domain from higher depth being pushdown
duongcongtoai Jun 29, 2025
e2b1998
chore: maintain correlated map
duongcongtoai Jun 29, 2025
8ceda79
revert other changes
irenjj Jun 29, 2025
81422da
chore: add paper query
duongcongtoai Jun 30, 2025
6c7a1e9
Merge pull request #17 from irenjj/delim_scan_schema
duongcongtoai Jun 30, 2025
259fc3b
Merge remote-tracking branch 'myfork/14554-subquery-unnest-framework-…
duongcongtoai Jun 30, 2025
33ef27d
Merge pull request #20 from duongcongtoai/fix-wrong-domain-push-down
duongcongtoai Jul 1, 2025
0e8e871
fix: no need to call init
duongcongtoai Jul 2, 2025
9553e8b
add push down join support & add delim scan split in different outer …
irenjj Jul 3, 2025
624d6c5
Merge pull request #15 from irenjj/push_down_join
duongcongtoai Jul 4, 2025
d86fc63
test: test case for independent join
duongcongtoai Jul 4, 2025
3b6c5f6
test: test for for independent join
duongcongtoai Jul 4, 2025
9173511
add join condition for push down both sides
irenjj Jul 3, 2025
056f231
fix test
irenjj Jul 4, 2025
6a9d6fb
do some refactor on the current framework
irenjj Jul 4, 2025
f84aaaa
Revert "do some refactor on the current framework"
irenjj Jul 4, 2025
32998fc
fix detect_correlated_expressions and fix multi delim scan create logic
irenjj Jul 4, 2025
8ec5eda
fix
irenjj Jul 4, 2025
422c232
fix empty dependent join
irenjj Jul 4, 2025
6caf98c
add new example
irenjj Jul 5, 2025
ab3c547
Merge remote-tracking branch 'myfork/14554-subquery-unnest-framework-…
duongcongtoai Jul 5, 2025
8660245
fix: existing snapshots
duongcongtoai Jul 6, 2025
70d534a
Merge pull request #21 from duongcongtoai/fix-buggy-init-code-cor-dec…
duongcongtoai Jul 6, 2025
dff764b
Merge remote-tracking branch 'subquery/14554-subquery-unnest-framewor…
irenjj Jul 6, 2025
00a0757
feat: add tree print for logical plan
duongcongtoai Jul 6, 2025
6717d3f
chore: unify tree render impl
duongcongtoai Jul 6, 2025
11c6788
fix test snapshot
irenjj Jul 6, 2025
29368e1
rm unnecessary test
irenjj Jul 6, 2025
dbf9cbf
update
irenjj Jul 6, 2025
8acbc53
fix filter in join
irenjj Jul 6, 2025
59faab5
Merge pull request #24 from duongcongtoai/feat-print-tree-logical-plan
duongcongtoai Jul 6, 2025
ea8d173
fix match check
irenjj Jul 6, 2025
71e69b9
Revert "feat: add tree print for logical plan"
duongcongtoai Jul 6, 2025
4e94e86
Merge pull request #26 from duongcongtoai/revert-24-feat-print-tree-l…
duongcongtoai Jul 6, 2025
3f09791
Merge branch 'fix_detect_correlated_expressions' into decorrelate_phy…
irenjj Jul 7, 2025
6599385
add order by limit support
irenjj Jul 5, 2025
76f9225
add distinct support
irenjj Jul 5, 2025
f6dc64d
push down sort
irenjj Jul 5, 2025
d2d0d60
push down table scan
irenjj Jul 5, 2025
988df0f
push down window
irenjj Jul 5, 2025
c538294
add dummy test
irenjj Jul 6, 2025
d4f53ef
add limit test
irenjj Jul 6, 2025
06f1b31
add window test
irenjj Jul 6, 2025
f01a4e4
change test
irenjj Jul 9, 2025
b87a09e
replace alias with projection
irenjj Jul 9, 2025
02e3588
fix join condition
irenjj Jul 10, 2025
6e37887
delimget physical plan
irenjj Jul 10, 2025
4d5629e
add agg for delim scan
irenjj Jul 10, 2025
291ce55
Merge remote-tracking branch 'upstream/main' into decorrelate_physical
irenjj Jul 10, 2025
73f61b3
extract negative from In subquery
irenjj Jul 11, 2025
f177824
full Not subquery support
irenjj Jul 11, 2025
8496041
refactor flatten project logic
irenjj Jul 11, 2025
bd63c90
refactor push down agg
irenjj Jul 11, 2025
17a6820
rename func name
irenjj Jul 11, 2025
4e9071e
treat subquery alias as alias instead of table
irenjj Jul 13, 2025
47df36b
fix delim join condition projection issue
irenjj Jul 13, 2025
f60abc7
refactor
irenjj Jul 13, 2025
433e026
detect_correlated_expressions of current plan instead left
irenjj Jul 13, 2025
9bdf838
fix limit row_number wrong data type
irenjj Jul 13, 2025
fee0468
update test
irenjj Aug 4, 2025
29b591e
fix subquery actual type
irenjj Aug 4, 2025
8c04441
add single join type
irenjj Jun 30, 2025
6e7085d
fix multi batch issue for single join
irenjj Jul 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ rust-version = { workspace = true }
all-features = true

[features]
default = []
default = ["backtrace"]
backtrace = ["datafusion/backtrace"]

[dependencies]
Expand Down
2 changes: 1 addition & 1 deletion datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -783,7 +783,7 @@ config_namespace! {
pub skip_failed_rules: bool, default = false

/// Number of times that the optimizer will attempt to optimize the plan
pub max_passes: usize, default = 3
pub max_passes: usize, default = 1

/// When set to true, the physical plan optimizer will run a top down
/// process to reorder the join keys
Expand Down
2 changes: 1 addition & 1 deletion datafusion/common/src/functional_dependencies.rs
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ impl FunctionalDependencies {
left_func_dependencies.extend(right_func_dependencies);
left_func_dependencies
}
JoinType::LeftSemi | JoinType::LeftAnti | JoinType::LeftMark => {
JoinType::LeftSemi | JoinType::LeftAnti | JoinType::LeftMark | JoinType::LeftSingle => {
// These joins preserve functional dependencies of the left side:
left_func_dependencies
}
Expand Down
5 changes: 5 additions & 0 deletions datafusion/common/src/join_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ pub enum JoinType {
/// Same logic as the LeftMark Join above, however it returns a record for each record from the
/// right input.
RightMark,

LeftSingle,
}

impl JoinType {
Expand All @@ -94,6 +96,7 @@ impl JoinType {
JoinType::RightAnti => JoinType::LeftAnti,
JoinType::LeftMark => JoinType::RightMark,
JoinType::RightMark => JoinType::LeftMark,
JoinType::LeftSingle => unreachable!(), // TODO: add right single support
}
}

Expand Down Expand Up @@ -126,6 +129,7 @@ impl Display for JoinType {
JoinType::RightAnti => "RightAnti",
JoinType::LeftMark => "LeftMark",
JoinType::RightMark => "RightMark",
JoinType::LeftSingle => "LeftSingle",
};
write!(f, "{join_type}")
}
Expand All @@ -147,6 +151,7 @@ impl FromStr for JoinType {
"RIGHTANTI" => Ok(JoinType::RightAnti),
"LEFTMARK" => Ok(JoinType::LeftMark),
"RIGHTMARK" => Ok(JoinType::RightMark),
"LEFtSINGLE" => Ok(JoinType::LeftSingle),
_ => _not_impl_err!("The join type {s} does not exist or is not implemented"),
}
}
Expand Down
61 changes: 59 additions & 2 deletions datafusion/core/src/physical_planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ use datafusion_expr::expr::{
use datafusion_expr::expr_rewriter::unnormalize_cols;
use datafusion_expr::logical_plan::builder::wrap_projection_for_join_if_necessary;
use datafusion_expr::{
Analyze, DescribeTable, DmlStatement, Explain, ExplainFormat, Extension, FetchType,
Filter, JoinType, RecursiveQuery, SkipType, StringifiedPlan, WindowFrame,
Analyze, DelimGet, DescribeTable, DmlStatement, Explain, ExplainFormat, Extension,
FetchType, Filter, JoinType, RecursiveQuery, SkipType, StringifiedPlan, WindowFrame,
WindowFrameBound, WriteOp,
};
use datafusion_physical_expr::aggregate::{AggregateExprBuilder, AggregateFunctionExpr};
Expand Down Expand Up @@ -1311,6 +1311,63 @@ impl DefaultPhysicalPlanner {
"Unsupported logical plan: Analyze must be root of the plan"
)
}
LogicalPlan::DependentJoin(_) => {
return internal_err!(
"Optimizors have not completely remove dependent join"
)
}
LogicalPlan::DelimGet(DelimGet {
table_name,
projected_schema,
..
}) => {
let resolved = session_state.resolve_table_ref(table_name.clone());
if let Ok(schema) = session_state.schema_for_ref(resolved.clone()) {
if let Some(table) = schema.table(&resolved.table).await? {
let mut proj = vec![];
for (i, field) in table.schema().fields().iter().enumerate() {
for iter in projected_schema.as_ref().iter() {
if iter.1 == field {
proj.push(i);
}
}
}

// First create the scan execution plan.
let scan_plan =
table.scan(session_state, Some(&proj), &[], None).await?;

// Now add aggregation to eliminate duplicated rows.
// Create a PhysicalGroupBy with empty expressions, which means we're grouping by all columns
let schema = &scan_plan.schema();
let group_exprs: Vec<(Arc<dyn PhysicalExpr>, String)> = (0
..schema.fields().len())
.map(|i| {
let name = schema.field(i).name().to_string();
let expr = Arc::new(Column::new(&name, i))
as Arc<dyn PhysicalExpr>;
(expr, name)
})
.collect();

let group_by = PhysicalGroupBy::new_single(group_exprs);

// Create the AggregateExec with no aggregate expressions to deduplicate the rows
Arc::new(AggregateExec::try_new(
AggregateMode::Final,
group_by,
vec![], // No aggregate expressions, just grouping to deduplicate
vec![], // No filters
scan_plan.clone(),
scan_plan.schema(),
)?)
} else {
return internal_err!("no table provider");
}
} else {
return internal_err!("empty schema");
}
}
};
Ok(exec_node)
}
Expand Down
13 changes: 12 additions & 1 deletion datafusion/expr/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3169,7 +3169,18 @@ pub const UNNEST_COLUMN_PREFIX: &str = "UNNEST";
impl Display for Expr {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
match self {
Expr::Alias(Alias { expr, name, .. }) => write!(f, "{expr} AS {name}"),
Expr::Alias(Alias {
expr,
relation,
name,
..
}) => {
if let Some(relation) = relation {
write!(f, "{expr} AS {relation}.{name}")
} else {
write!(f, "{expr} AS {name}")
}
}
Expr::Column(c) => write!(f, "{c}"),
Expr::OuterReferenceColumn(_, c) => {
write!(f, "{OUTER_REFERENCE_COLUMN_PREFIX}({c})")
Expand Down
Loading
Loading