Skip to content

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Jun 15, 2024

Which issue does this PR close?

Closes #10907 and is part of #8708

Rationale for this change

What changes are included in this PR?

  • moved BitAnd, BitOr, BitXor to UDAF

Are these changes tested?

Running existing test cases

cargo build
cargo test
cargo test --test sqllogictests 

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jun 15, 2024
@dharanad dharanad changed the title remove bit and or xor from expr Convert BitAnd, BitOr, BitXor to UDAF Jun 15, 2024
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jun 15, 2024
@dharanad dharanad marked this pull request as ready for review June 16, 2024 16:31
dharanad added 3 commits June 16, 2024 22:51
…o-udaf

# Conflicts:
#	datafusion/functions-aggregate/src/lib.rs
#	datafusion/physical-expr/src/aggregate/build_in.rs
#	datafusion/proto/src/physical_plan/to_proto.rs
@dharanad
Copy link
Contributor Author

@jayzhan211 / @alamb can you please help me with a review here.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, remember to add the test in roundtrip_expr_api in datafusion/proto/tests/cases/roundtrip_logical_plan.rs

@dharanad
Copy link
Contributor Author

cc: @jayzhan211 I have made the changes and also updated the UT.

@dharanad dharanad requested a review from jayzhan211 June 17, 2024 08:42
true
}

fn create_groups_accumulator(&self) -> Result<Box<dyn GroupsAccumulator>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think create_groups_accumulator is not supported in new UDAF, but we should

Copy link
Contributor Author

@dharanad dharanad Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Can you help me, how to do i test groups accumulator ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment @jayzhan211

It seems like the groups accumulator code was moved from the existing function here

https://github.com/apache/datafusion/pull/10930/files#diff-9b0874795d7dc847d4137426e205de0a2e3b0b56c03ad80d55b000aa020ca132L438-L462

Are you saying we shouldn't support the groups accumulator? Or that we should and it is missing from some places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update -- NM I see now now that @dharanad added the groups accumulator code

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad -- this is so cool to see

true
}

fn create_groups_accumulator(&self) -> Result<Box<dyn GroupsAccumulator>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment @jayzhan211

It seems like the groups accumulator code was moved from the existing function here

https://github.com/apache/datafusion/pull/10930/files#diff-9b0874795d7dc847d4137426e205de0a2e3b0b56c03ad80d55b000aa020ca132L438-L462

Are you saying we shouldn't support the groups accumulator? Or that we should and it is missing from some places?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad -- this looks good to me.

Let's wait for @jayzhan211 to give it another review as well

@jayzhan211
Copy link
Contributor

👍 Thanks @dharanad and @alamb

@jayzhan211 jayzhan211 merged commit 2daadb7 into apache:main Jun 17, 2024
@dharanad
Copy link
Contributor Author

Thank you very much @jayzhan211 and @alamb for your help and time.

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

Thank you for the great initial contribution @dharanad

@dharanad dharanad deleted the dharanad/convert-bit-and-or-xor-to-udaf branch June 17, 2024 15:46
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* remove bit and or xor from expr

* remove bit and or xor from physical expr and proto

* add proto regen changes

* impl BitAnd, BitOr, BitXor UADF

* add support for float

* removing support for float

* refactor helper macros

* clippy'fy

* simplify Bitwise operation

* add documentation

* formatting

* fix lint issue

* remove XorDistinct

* update roundtrip_expr_api test

* linting

* support groups accumulator
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 25, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
andygrove pushed a commit to apache/datafusion-python that referenced this pull request Jul 31, 2024
* chore: update datafusion deps

* feat: impl ExecutionPlan::static_name() for DatasetExec

This required trait method was added upstream [0] and recommends to simply forward to `static_name`.

[0]: apache/datafusion#10266

* feat: update first_value and last_value wrappers.

Upstream signatures were changed for the new new `AggregateBuilder` api [0].

This simply gets the code to work. We should better incorporate that API into `datafusion-python`.

[0] apache/datafusion#10560

* migrate count to UDAF

Builtin Count was removed upstream.

TBD whether we want to re-implement `count_star` with new API.

Ref: apache/datafusion#10893

* migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF

Ref: approx_distinct apache/datafusion#10851
Ref: approx_median apache/datafusion#10840
Ref: approx_percentile_cont and _with_weight apache/datafusion#10917

* migrate avg to UDAF

Ref: apache/datafusion#10964

* migrage corr to UDAF

Ref: apache/datafusion#10884

* migrate grouping to UDAF

Ref: apache/datafusion#10906

* add alias `mean` for UDAF `avg`

* migrate stddev to UDAF

Ref: apache/datafusion#10827

* remove rust alias for stddev

The python wrapper now provides stddev_samp alias.

* migrage var_pop to UDAF

Ref: apache/datafusion#10836

* migrate regr_* functions to UDAF

Ref: apache/datafusion#10898

* migrate bitwise functions to UDAF

The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930

* add missing variants for ScalarValue with todo

* fix typo in approx_percentile_cont

* add distinct arg to count

* comment out failing test

`approx_percentile_cont` is now returning a DoubleArray instead of an IntArray.

This may be a bug upstream; it requires further investigation.

* update tests to expect lowercase `sum` in query plans

This was changed upstream.

Ref: apache/datafusion#10831

* update ScalarType data_type map

* add docs dependency pickleshare

* re-implement count_star

* lint: ruff python lint

* lint: rust cargo fmt

* include name of window function in error for find_window_fn

* refactor `find_window_fn` for debug clarity

* search default aggregate functions by both name and aliases

The alias list no longer includes the name of the function.

Ref: apache/datafusion#10658

* fix markdown in find_window_fn docs

* parameterize test_window_functions

`first_value` and `last_value` are currently failing and marked as xfail.

* add test ids to test_simple_select tests marked xfail

* update find_window_fn to search built-ins first

The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior.
This allowed me to remove `marks=pytest.xfail` from the window tests.

* improve first_call and last_call use of the builder API

* remove trailing todos

* fix examples/substrait.py

* chore: remove explicit aliases from functions.rs

Ref: #779

* remove `array_fn!` aliases

* remove alias rules for `expr_fn_vec!`

* remove alias rules from `expr_fn!` macro

* remove unnecessary pyo3 var-arg signatures in functions.rs

* remove pyo3 signatures that provided defaults for first_value and last_value

* parametrize test_string_functions

* test regr_ function wrappers

Closes #778
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert BitAnd, BitOr, BitXor to UDAF
3 participants