Skip to content

Optimize heapsort #93765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2022
Merged

Optimize heapsort #93765

merged 1 commit into from
Jun 20, 2022

Conversation

zhangyunhao116
Copy link
Contributor

The new implementation is about 10% faster than the previous one(sorting random 1000 items).

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 8, 2022
@the8472
Copy link
Member

the8472 commented Feb 8, 2022

sorting random 1000 items

Usually changes to sorting are benchmarked against a bunch of different data sets such already sorted, reverse-sorted, concatenation of two sorted etc.
But there don't seem to be any in the standard library benches themselves. Maybe look at previous PRs and see if someone posted external benchmarks that can be used to test those scenarios too.

@zhangyunhao116
Copy link
Contributor Author

sorting random 1000 items

Usually changes to sorting are benchmarked against a bunch of different data sets such already sorted, reverse-sorted, concatenation of two sorted etc. But there don't seem to be any in the standard library benches themselves. Maybe look at previous PRs and see if someone posted external benchmarks that can be used to test those scenarios too.

Create a benchmark suite. Compared the sort with sortboost in the script.

Benchmark script: https://gist.github.com/zhangyunhao116/1d40de341ba24462615d04ae21fcac81
(mainly base on https://docs.rs/crate/pdqsort/latest/source/benches/bench.rs)

Result:

sort_small_random       time:   [144.60 ns 144.96 ns 145.36 ns]                              
                        change: [-1.1857% -0.7782% -0.4072%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sort_small_ascending    time:   [35.500 ns 35.614 ns 35.736 ns]                                  
                        change: [-9.6611% -9.0568% -8.2390%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

sort_small_descending   time:   [29.978 ns 30.041 ns 30.101 ns]                                   
                        change: [-6.8975% -6.6595% -6.4402%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  4 (4.00%) high severe

sort_small_big_random   time:   [262.48 ns 263.47 ns 264.54 ns]                                  
                        change: [-0.1157% +0.2622% +0.6097%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 35 outliers among 100 measurements (35.00%)
  9 (9.00%) low severe
  3 (3.00%) low mild
  7 (7.00%) high mild
  16 (16.00%) high severe

Benchmarking sort_small_big_ascending: Collecting 100 samples in estimated 5.0004 s (24M iterations                                                                                                   sort_small_big_ascending                        
                        time:   [207.50 ns 208.16 ns 208.85 ns]
                        change: [-3.7178% -2.8530% -2.0372%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

Benchmarking sort_small_big_descending: Collecting 100 samples in estimated 5.0008 s (31M iteration                                                                                                   sort_small_big_descending                        
                        time:   [163.15 ns 163.25 ns 163.34 ns]
                        change: [+0.6363% +1.4359% +2.0507%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild

sort_medium_random      time:   [2.9320 us 2.9336 us 2.9352 us]                                
                        change: [-4.7123% -4.4968% -4.2811%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  1 (1.00%) low severe
  16 (16.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

sort_medium_ascending   time:   [592.44 ns 593.33 ns 594.34 ns]                                   
                        change: [-26.795% -26.598% -26.407%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

sort_medium_descending  time:   [506.31 ns 507.14 ns 508.29 ns]                                    
                        change: [-25.662% -24.674% -23.639%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

sort_large_random       time:   [539.69 us 540.42 us 541.23 us]                              
                        change: [-5.1022% -4.7602% -4.4391%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

sort_large_ascending    time:   [326.84 us 327.91 us 329.21 us]                                 
                        change: [-7.0931% -6.7599% -6.3780%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

sort_large_descending   time:   [338.89 us 340.89 us 342.66 us]                                  
                        change: [-6.5248% -6.0734% -5.6059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  19 (19.00%) high mild
  1 (1.00%) high severe

Benchmarking sort_large_mostly_ascending: Collecting 100 samples in estimated 5.3313 s (15k iterati                                                                                                   sort_large_mostly_ascending                        
                        time:   [351.16 us 351.54 us 351.94 us]
                        change: [-4.3671% -4.2394% -4.1178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking sort_large_mostly_descending: Collecting 100 samples in estimated 5.4411 s (15k iterat                                                                                                   sort_large_mostly_descending                        
                        time:   [358.32 us 358.54 us 358.82 us]
                        change: [-7.7897% -7.6157% -7.4594%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

Benchmarking sort_large_big_random: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
sort_large_big_random   time:   [1.1880 ms 1.1898 ms 1.1919 ms]                                   
                        change: [-2.0795% -1.8401% -1.6394%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  6 (6.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

Benchmarking sort_large_big_ascending: Collecting 100 samples in estimated 8.8968 s (10k iterations                                                                                                   sort_large_big_ascending                        
                        time:   [878.54 us 880.05 us 882.04 us]
                        change: [-0.5682% +0.2753% +0.8210%] (p = 0.57 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking sort_large_big_descending: Collecting 100 samples in estimated 8.5611 s (10k iteration                                                                                                   sort_large_big_descending                        
                        time:   [840.28 us 841.62 us 843.39 us]
                        change: [-0.2935% -0.0882% +0.1958%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

@zhangyunhao116
Copy link
Contributor Author

@joshtriplett Kindly ping :)

@zhangyunhao116
Copy link
Contributor Author

r? rust-lang/libs

@rust-highfive rust-highfive assigned m-ou-se and unassigned joshtriplett Apr 6, 2022
@JohnCSimon
Copy link
Member

Triage:
@m-ou-se - what is the state of this review?

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 11, 2022
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022
@JohnTitor
Copy link
Member

Could you prefer to rebase over merge? We have a no-merge policy.

Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Can you squash/rebase the commits? Thanks!

@zhangyunhao116
Copy link
Contributor Author

Done. PTAL, thanks!

@m-ou-se
Copy link
Member

m-ou-se commented Jun 20, 2022

@bors r+

Thanks!

@bors
Copy link
Collaborator

bors commented Jun 20, 2022

📌 Commit 98507f2 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022
@matthiaskrgr
Copy link
Member

@bors rollup=never

@bors
Copy link
Collaborator

bors commented Jun 20, 2022

⌛ Testing commit 98507f2 with merge 5750a6a...

@bors
Copy link
Collaborator

bors commented Jun 20, 2022

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 5750a6a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 20, 2022
@bors bors merged commit 5750a6a into rust-lang:master Jun 20, 2022
@rustbot rustbot added this to the 1.63.0 milestone Jun 20, 2022
@joshtriplett joshtriplett added the relnotes-perf Performance improvements that should be mentioned in the release notes. label Jun 20, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5750a6a): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-0.2% -0.2% 1
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.1% 4.1% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-1.8% -1.8% 1
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.1% 3.8% 2
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

ritchie46 added a commit to ritchie46/rayon that referenced this pull request Jun 24, 2022
rust-lang/rust#93765 shows
a 10% speedup and had been merged into std.

Given that rayon's code was an exact copy of the
heapsort in std, this PR implements the same
optimization.
bors bot added a commit to rayon-rs/rayon that referenced this pull request Jun 24, 2022
950: Keep heapsorts implementation equal to std r=cuviper a=ritchie46

rust-lang/rust#93765 shows
a 10% speedup and had been merged into std.

Given that rayon's code was an exact copy of the
heapsort in std, this PR implements the same
optimization.

Co-authored-by: Ritchie Vink <[email protected]>
@zhangyunhao116 zhangyunhao116 deleted the heapsort branch March 3, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. relnotes-perf Performance improvements that should be mentioned in the release notes. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.