Open
Description
Is your feature request related to a problem? Please describe.
Since prior to the DataFusion migration, test_except
has been failing because we haven't implemented the join code required for EXCEPT
clauses:
self = <dask_sql.physical.rel.logical.join.DaskJoinPlugin object at 0x7f54b2038340>, rel = <dask_planner.LogicalPlan object at 0x7f54b200aa50>, context = <dask_sql.context.Context object at 0x7f54dc803040>
def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContainer:
# Joining is a bit more complicated, so lets do it in steps:
join = rel.join()
# 1. We now have two inputs (from left and right), so we fetch them both
dc_lhs, dc_rhs = self.assert_inputs(rel, 2, context)
cc_lhs = dc_lhs.column_container
cc_rhs = dc_rhs.column_container
# 2. dask's merge will do some smart things with columns, which have the same name
# on lhs an rhs (which also includes reordering).
# However, that will confuse our column numbering in SQL.
# So we make our life easier by converting the column names into unique names
# We will convert back in the end
cc_lhs_renamed = cc_lhs.make_unique("lhs")
cc_rhs_renamed = cc_rhs.make_unique("rhs")
dc_lhs_renamed = DataContainer(dc_lhs.df, cc_lhs_renamed)
dc_rhs_renamed = DataContainer(dc_rhs.df, cc_rhs_renamed)
df_lhs_renamed = dc_lhs_renamed.assign()
df_rhs_renamed = dc_rhs_renamed.assign()
join_type = join.getJoinType()
> join_type = self.JOIN_TYPE_MAPPING[str(join_type)]
E KeyError: 'ANTI'
dask_sql/physical/rel/logical/join.py:74: KeyError
Describe the solution you'd like
Queries with EXCEPT
seem to be getting successfully parsed by DataFusion, so all we should need to do is add support for 'ANTI'
joins.