-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-39220: [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array #43043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39220: [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array #43043
Conversation
…ession in addition to mask array
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean Table.filter()
will no longer work if PyArrow was not built with Acero? I forget how optional that is considered.
It will still work without Acero for a plain boolean mask (in that case we only require the compute kernel), only for filtering with an expression it needs Acero. (and note this was already the case for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks for explaining!
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 577cafb. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 12 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ession in addition to mask array (apache#43043) ### Rationale for this change `Table.filter()` already accepted either a boolean mask array or a boolean expression. But the equivalent method on `RecordBatch` only accepted the array. This makes both methods consistent in accepting both types of mask. ### What changes are included in this PR? Consolidate the `Table.filter` and `RecordBatch.fitler` methods into a single shared method on the base class, and expanded the `_filter_table` Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch) ### Are these changes tested? Yes * GitHub Issue: apache#39220 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Will Jones <[email protected]>
Rationale for this change
Table.filter()
already accepted either a boolean mask array or a boolean expression. But the equivalent method onRecordBatch
only accepted the array. This makes both methods consistent in accepting both types of mask.What changes are included in this PR?
Consolidate the
Table.filter
andRecordBatch.fitler
methods into a single shared method on the base class, and expanded the_filter_table
Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch)Are these changes tested?
Yes
RecordBatch.filter
to take anExpression
in addition to a boolean maskArray
#39220