-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Python] Extend RecordBatch.filter
to take an Expression
in addition to a boolean mask Array
#39220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like this is supported in Rust as well: https://arrow.apache.org/rust/arrow_select/filter/fn.filter_record_batch.html |
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Jun 25, 2024
…ession in addition to mask array
wjones127
pushed a commit
that referenced
this issue
Jun 26, 2024
… in addition to mask array (#43043) ### Rationale for this change `Table.filter()` already accepted either a boolean mask array or a boolean expression. But the equivalent method on `RecordBatch` only accepted the array. This makes both methods consistent in accepting both types of mask. ### What changes are included in this PR? Consolidate the `Table.filter` and `RecordBatch.fitler` methods into a single shared method on the base class, and expanded the `_filter_table` Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch) ### Are these changes tested? Yes * GitHub Issue: #39220 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Will Jones <[email protected]>
Issue resolved by pull request 43043 |
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Jul 9, 2024
…ession in addition to mask array (apache#43043) ### Rationale for this change `Table.filter()` already accepted either a boolean mask array or a boolean expression. But the equivalent method on `RecordBatch` only accepted the array. This makes both methods consistent in accepting both types of mask. ### What changes are included in this PR? Consolidate the `Table.filter` and `RecordBatch.fitler` methods into a single shared method on the base class, and expanded the `_filter_table` Acero helper to also work with RecordBatch in addition to Table (and ensure to return a batch if the input was a batch) ### Are these changes tested? Yes * GitHub Issue: apache#39220 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Will Jones <[email protected]>
kevinjqliu
added a commit
to apache/iceberg-python
that referenced
this issue
Feb 13, 2025
This [PR](apache/arrow#39220) from Apache Arrow was merged to allow to filter with a boolean expression directly on `pa.RecordBatch`. I believe pyiceberg is currently using pyarrow version 19.0.0. Filtering from pa.RecordBatch was introduced in python in version 17.0.0 I have not run integration tests for some reason my docker setup is messed up. I believe this test should check this change: https://github.com/apache/iceberg-python/blob/dfbee4b0023ff8442c7d27194dc2edc66b15142d/tests/integration/test_deletes.py#L314 Closes #1050 --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Gabriel Igliozzi <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Currently RecordBatches can only be filtered using a boolean mask
Array
, unlike Tables which can be filtered using either a mask or anExpression
. It would be useful to allowRecordBatch.filter
to also accept anExpression
to make it consistent withTable.filter
.See also discussion here
Component(s)
Python
The text was updated successfully, but these errors were encountered: