Skip to content

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jul 3, 2025

Which issue does this PR close?

Rationale for this change

sort string_view[0-400] nulls to indices 2^12                                      1.00     45.2±1.37µs        ? ?/sec    1.01     45.8±1.74µs        ? ?/sec
sort string_view[0-400] to indices 2^12                                            1.00     69.1±1.98µs        ? ?/sec    1.00     69.1±4.24µs        ? ?/sec
sort string_view[10] nulls to indices 2^12                                         1.00     40.8±1.81µs        ? ?/sec    1.37     55.7±3.90µs        ? ?/sec
sort string_view[10] to indices 2^12                                               1.00     52.8±0.35µs        ? ?/sec    1.63     85.9±1.46µs        ? ?/sec
sort string_view_inlined[0-12] nulls to indices 2^12                               1.00     40.9±1.99µs        ? ?/sec    1.29     52.6±1.76µs        ? ?/sec
sort string_view_inlined[0-12] to indices 2^12                                     1.00     50.6±0.27µs        ? ?/sec    1.68    85.0±12.24µs        ? ?/sec

What changes are included in this PR?

Speedup by specializing on batches with only inline views.

Are these changes tested?, are they covered by existing tests)?

existing tests

Are there any user-facing changes?

no

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 3, 2025
@Dandandan Dandandan changed the title Speedup byte view ops some more Speedup sorting for inline views Jul 3, 2025
@Dandandan Dandandan changed the title Speedup sorting for inline views Speedup sorting for inline views for 1.4x - 1.7x improvement Jul 3, 2025
@Dandandan Dandandan marked this pull request as ready for review July 3, 2025 10:28
@Dandandan
Copy link
Contributor Author

FYI @zhuqi-lucas

@zhuqi-lucas
Copy link
Contributor

Amazing work @Dandandan , reviewing now.

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you @Dandandan !

_ => value_indices.len(),
};
// 3.a Check if all views are inline (no data buffers)
if values.data_buffers().is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan , it's very clear we optimize the no data buffers path!

@Dandandan Dandandan changed the title Speedup sorting for inline views for 1.4x - 1.7x improvement Speedup sorting for inline views: 1.4x - 1.7x improvement Jul 3, 2025
@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_byte_view (b2a1dd0) to 52ad7d7 diff
BENCH_NAME=sort_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench sort_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_byte_view
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

group                                                   main                                   speedup_byte_view
-----                                                   ----                                   -----------------
lexsort (bool, bool) 2^12                               1.00    117.1±0.39µs        ? ?/sec    1.01    118.8±0.31µs        ? ?/sec
lexsort (bool, bool) nulls 2^12                         1.00    155.9±0.66µs        ? ?/sec    1.05    163.9±0.32µs        ? ?/sec
lexsort (f32, f32) 2^10                                 1.01     45.3±0.08µs        ? ?/sec    1.00     45.0±0.12µs        ? ?/sec
lexsort (f32, f32) 2^12                                 1.00    212.2±0.27µs        ? ?/sec    1.00    211.9±0.39µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 10                        1.00     38.1±0.24µs        ? ?/sec    1.01     38.6±0.04µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 100                       1.00     40.5±0.04µs        ? ?/sec    1.00     40.5±0.06µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 1000                      1.00     78.6±0.77µs        ? ?/sec    1.00     78.3±0.11µs        ? ?/sec
lexsort (f32, f32) 2^12 limit 2^12                      1.00    212.3±0.42µs        ? ?/sec    1.00    211.8±0.29µs        ? ?/sec
lexsort (f32, f32) nulls 2^10                           1.00     53.5±0.14µs        ? ?/sec    1.03     55.3±0.13µs        ? ?/sec
lexsort (f32, f32) nulls 2^12                           1.00    252.3±0.33µs        ? ?/sec    1.03    259.3±0.61µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 10                  1.00     85.0±0.24µs        ? ?/sec    1.05     89.3±0.21µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 100                 1.00     86.2±0.17µs        ? ?/sec    1.05     90.2±0.19µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 1000                1.00     95.9±0.27µs        ? ?/sec    1.06    101.3±0.21µs        ? ?/sec
lexsort (f32, f32) nulls 2^12 limit 2^12                1.00    252.5±0.50µs        ? ?/sec    1.03    259.2±0.45µs        ? ?/sec
rank f32 2^12                                           1.00     68.1±0.34µs        ? ?/sec    1.00     68.4±0.16µs        ? ?/sec
rank f32 nulls 2^12                                     1.00     37.7±0.09µs        ? ?/sec    1.00     37.9±0.11µs        ? ?/sec
rank string[10] 2^12                                    1.00    235.9±0.65µs        ? ?/sec    1.02    239.5±0.64µs        ? ?/sec
rank string[10] nulls 2^12                              1.00    114.6±0.15µs        ? ?/sec    1.04    118.7±0.26µs        ? ?/sec
sort f32 2^12                                           1.08     65.0±0.37µs        ? ?/sec    1.00     60.4±0.45µs        ? ?/sec
sort f32 nulls 2^12                                     1.00     32.1±0.10µs        ? ?/sec    1.00     32.1±0.07µs        ? ?/sec
sort f32 nulls to indices 2^12                          1.00     70.3±0.61µs        ? ?/sec    1.04     73.4±0.27µs        ? ?/sec
sort f32 to indices 2^12                                1.07     77.3±0.25µs        ? ?/sec    1.00     72.2±0.17µs        ? ?/sec
sort i32 2^10                                           1.17      8.5±0.02µs        ? ?/sec    1.00      7.3±0.01µs        ? ?/sec
sort i32 2^12                                           1.17     42.1±0.11µs        ? ?/sec    1.00     36.1±0.12µs        ? ?/sec
sort i32 nulls 2^10                                     1.00      5.4±0.01µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
sort i32 nulls 2^12                                     1.01     22.9±0.07µs        ? ?/sec    1.00     22.6±0.07µs        ? ?/sec
sort i32 nulls to indices 2^10                          1.00     12.5±0.54µs        ? ?/sec    1.08     13.5±0.02µs        ? ?/sec
sort i32 nulls to indices 2^12                          1.00     52.8±0.12µs        ? ?/sec    1.10     58.1±0.13µs        ? ?/sec
sort i32 to indices 2^10                                1.01     11.5±0.03µs        ? ?/sec    1.00     11.4±0.02µs        ? ?/sec
sort i32 to indices 2^12                                1.02     54.9±0.18µs        ? ?/sec    1.00     53.7±0.27µs        ? ?/sec
sort primitive run 2^12                                 1.00      6.2±0.01µs        ? ?/sec    1.00      6.2±0.01µs        ? ?/sec
sort primitive run to indices 2^12                      1.00      9.0±0.25µs        ? ?/sec    1.04      9.4±0.01µs        ? ?/sec
sort string[10] dict nulls to indices 2^12              1.00    169.2±0.35µs        ? ?/sec    1.05    176.9±0.35µs        ? ?/sec
sort string[10] dict to indices 2^12                    1.00    300.2±0.46µs        ? ?/sec    1.02    305.3±0.59µs        ? ?/sec
sort string[10] nulls to indices 2^12                   1.00    137.1±0.40µs        ? ?/sec    1.05    143.4±7.07µs        ? ?/sec
sort string[10] to indices 2^12                         1.00    229.9±0.84µs        ? ?/sec    1.01    231.6±0.57µs        ? ?/sec
sort string_view[0-400] nulls to indices 2^12           1.00     82.7±0.20µs        ? ?/sec    1.06     87.4±0.28µs        ? ?/sec
sort string_view[0-400] to indices 2^12                 1.01    121.9±0.21µs        ? ?/sec    1.00    120.5±0.29µs        ? ?/sec
sort string_view[10] nulls to indices 2^12              1.47    108.6±0.22µs        ? ?/sec    1.00     73.7±0.32µs        ? ?/sec
sort string_view[10] to indices 2^12                    1.67    173.1±0.25µs        ? ?/sec    1.00    103.6±0.27µs        ? ?/sec
sort string_view_inlined[0-12] nulls to indices 2^12    1.45    103.3±0.14µs        ? ?/sec    1.00     71.1±0.63µs        ? ?/sec
sort string_view_inlined[0-12] to indices 2^12          1.74    163.3±0.21µs        ? ?/sec    1.00     93.7±0.36µs        ? ?/sec

@Dandandan
Copy link
Contributor Author

sorting is about to become even more crazy fast!

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

sorting is about to become even more crazy fast!

Just when you think you can't make sorting faster.... Along comes @zhuqi-lucas and @Dandandan 🥳 🐶

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan and @zhuqi-lucas -- this looks great as well. I agree with @zhuqi-lucas that optimizing the short strings / no buffers case makes a lot of sense

@Dandandan Dandandan merged commit e6cb61f into apache:main Jul 3, 2025
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speedup sorting for inline views

3 participants