-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I see duplicated OR clauses on the DynamicPhysicalExpr I get in the consumer
for an execution plan like this:
ProjectionExec: expr=[c0@0 as c0, c1@1 as c1, c2@2 as c2]
CoalescePartitionsExec: fetch=5
CoalesceBatchesExec: target_batch_size=8192, fetch=5
HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(c0@0, c32@32)]
CoalesceBatchesExec: target_batch_size=8192
FilterExec: c0@0 IS NOT NULL
DataSourceExec: partitions=1, partition_sizes=[1]
RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1
CooperativeExec
DataSourceExec: partitions=1
The bounds predicates arrive as 16 identical conjuncts, 1 per (right) output partition it seems:
(
("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
OR ("c32" >= 'db-01' AND "c32" <= 'keb-03')
)
This is probably related to this comment. I wrote some logic in the consumer node to dedup the predicates but it seems worth handling in DataFusion.
Following the code, in CollectLeft
we derive the number of output predicates from the right side’s partition count. But iiuc CollectLeft
collects the left into a single partition, so every right-side partition will see the same bounds in theory?
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working