Skip to content

RowSelection::and_then is slow #7458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #7456
alamb opened this issue Apr 29, 2025 · 1 comment
Open
Tracked by #7456

RowSelection::and_then is slow #7458

alamb opened this issue Apr 29, 2025 · 1 comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This ticket records the symptoms reported by @mbutrovich in (discord) where they see inconsistent performance. It appears the root cause is allocations related to computing the RowSelection to evaluate multiple predicates:

In our case it's currently RowSelection::and_then, so I'm trying to make sense of that function and see if there's a more efficient way to go about it other than the iter().cloned() over both inputs, mutating those, and building the output one element at a time

i was wondering about the better representation of Vec

I'm coming at Rust from C and C++, and a struct with a uint64 and a bool stuck on teh end is just gonna end up aligned to 64 bits with a bunch of padding on the end between each one. Is Rust going to do something similar?

Background:

RowSelection::and_then is used to combine the results of multiple ArrowPredicates in a RowFilter -- see source:

Here is the code for RowSelection::and_then.

Describe the solution you'd like
I would like the combination of multiple RowSelections to go faster

Describe alternatives you've considered
Some suggestions from @Dandandan in discord:

selectors can reduce allocations in from log(N) to 1 allocations using Vec::with_capacity(len_left + len_right)
Alternatively: the self.selectors allocation probably could be reused for the new one
Any better way to represent Vec ?

Here is one idea for better representing RowSelection instead of Vec<RowSelector>

Additional context

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Apr 29, 2025
@alamb alamb added the parquet Changes to the parquet crate label Apr 29, 2025
@alamb
Copy link
Contributor Author

alamb commented Apr 30, 2025

Like always, @XiangpengHao was ahead of us, and he also mentioned the issue with and_then in #7363 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

1 participant