Skip to content

[ENH] Implement 'ANTI' joins to support EXCEPT syntax #879

Open
@charlesbluca

Description

@charlesbluca

Is your feature request related to a problem? Please describe.
Since prior to the DataFusion migration, test_except has been failing because we haven't implemented the join code required for EXCEPT clauses:

self = <dask_sql.physical.rel.logical.join.DaskJoinPlugin object at 0x7f54b2038340>, rel = <dask_planner.LogicalPlan object at 0x7f54b200aa50>, context = <dask_sql.context.Context object at 0x7f54dc803040>

    def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContainer:
        # Joining is a bit more complicated, so lets do it in steps:
    
        join = rel.join()
    
        # 1. We now have two inputs (from left and right), so we fetch them both
        dc_lhs, dc_rhs = self.assert_inputs(rel, 2, context)
        cc_lhs = dc_lhs.column_container
        cc_rhs = dc_rhs.column_container
    
        # 2. dask's merge will do some smart things with columns, which have the same name
        # on lhs an rhs (which also includes reordering).
        # However, that will confuse our column numbering in SQL.
        # So we make our life easier by converting the column names into unique names
        # We will convert back in the end
        cc_lhs_renamed = cc_lhs.make_unique("lhs")
        cc_rhs_renamed = cc_rhs.make_unique("rhs")
    
        dc_lhs_renamed = DataContainer(dc_lhs.df, cc_lhs_renamed)
        dc_rhs_renamed = DataContainer(dc_rhs.df, cc_rhs_renamed)
    
        df_lhs_renamed = dc_lhs_renamed.assign()
        df_rhs_renamed = dc_rhs_renamed.assign()
    
        join_type = join.getJoinType()
>       join_type = self.JOIN_TYPE_MAPPING[str(join_type)]
E       KeyError: 'ANTI'

dask_sql/physical/rel/logical/join.py:74: KeyError

Describe the solution you'd like
Queries with EXCEPT seem to be getting successfully parsed by DataFusion, so all we should need to do is add support for 'ANTI' joins.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SQL grammarImprovements to or issues with SQL syntaxenhancementNew feature or requestpythonAffects Python API

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions