Optimize heapsort #93765

zhangyunhao116 · 2022-02-08T09:28:26Z

The new implementation is about 10% faster than the previous one(sorting random 1000 items).

rust-highfive · 2022-02-08T09:28:28Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

Please see the contribution instructions for more information.

the8472 · 2022-02-08T21:25:59Z

sorting random 1000 items

Usually changes to sorting are benchmarked against a bunch of different data sets such already sorted, reverse-sorted, concatenation of two sorted etc.
But there don't seem to be any in the standard library benches themselves. Maybe look at previous PRs and see if someone posted external benchmarks that can be used to test those scenarios too.

zhangyunhao116 · 2022-02-09T04:58:24Z

sorting random 1000 items

Usually changes to sorting are benchmarked against a bunch of different data sets such already sorted, reverse-sorted, concatenation of two sorted etc. But there don't seem to be any in the standard library benches themselves. Maybe look at previous PRs and see if someone posted external benchmarks that can be used to test those scenarios too.

Create a benchmark suite. Compared the sort with sortboost in the script.

Benchmark script: https://gist.github.com/zhangyunhao116/1d40de341ba24462615d04ae21fcac81
(mainly base on https://docs.rs/crate/pdqsort/latest/source/benches/bench.rs)

Result:

sort_small_random       time:   [144.60 ns 144.96 ns 145.36 ns]                              
                        change: [-1.1857% -0.7782% -0.4072%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sort_small_ascending    time:   [35.500 ns 35.614 ns 35.736 ns]                                  
                        change: [-9.6611% -9.0568% -8.2390%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

sort_small_descending   time:   [29.978 ns 30.041 ns 30.101 ns]                                   
                        change: [-6.8975% -6.6595% -6.4402%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  4 (4.00%) high severe

sort_small_big_random   time:   [262.48 ns 263.47 ns 264.54 ns]                                  
                        change: [-0.1157% +0.2622% +0.6097%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 35 outliers among 100 measurements (35.00%)
  9 (9.00%) low severe
  3 (3.00%) low mild
  7 (7.00%) high mild
  16 (16.00%) high severe

Benchmarking sort_small_big_ascending: Collecting 100 samples in estimated 5.0004 s (24M iterations                                                                                                   sort_small_big_ascending                        
                        time:   [207.50 ns 208.16 ns 208.85 ns]
                        change: [-3.7178% -2.8530% -2.0372%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high severe

Benchmarking sort_small_big_descending: Collecting 100 samples in estimated 5.0008 s (31M iteration                                                                                                   sort_small_big_descending                        
                        time:   [163.15 ns 163.25 ns 163.34 ns]
                        change: [+0.6363% +1.4359% +2.0507%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild

sort_medium_random      time:   [2.9320 us 2.9336 us 2.9352 us]                                
                        change: [-4.7123% -4.4968% -4.2811%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  1 (1.00%) low severe
  16 (16.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

sort_medium_ascending   time:   [592.44 ns 593.33 ns 594.34 ns]                                   
                        change: [-26.795% -26.598% -26.407%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

sort_medium_descending  time:   [506.31 ns 507.14 ns 508.29 ns]                                    
                        change: [-25.662% -24.674% -23.639%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

sort_large_random       time:   [539.69 us 540.42 us 541.23 us]                              
                        change: [-5.1022% -4.7602% -4.4391%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

sort_large_ascending    time:   [326.84 us 327.91 us 329.21 us]                                 
                        change: [-7.0931% -6.7599% -6.3780%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

sort_large_descending   time:   [338.89 us 340.89 us 342.66 us]                                  
                        change: [-6.5248% -6.0734% -5.6059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  19 (19.00%) high mild
  1 (1.00%) high severe

Benchmarking sort_large_mostly_ascending: Collecting 100 samples in estimated 5.3313 s (15k iterati                                                                                                   sort_large_mostly_ascending                        
                        time:   [351.16 us 351.54 us 351.94 us]
                        change: [-4.3671% -4.2394% -4.1178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking sort_large_mostly_descending: Collecting 100 samples in estimated 5.4411 s (15k iterat                                                                                                   sort_large_mostly_descending                        
                        time:   [358.32 us 358.54 us 358.82 us]
                        change: [-7.7897% -7.6157% -7.4594%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

Benchmarking sort_large_big_random: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
sort_large_big_random   time:   [1.1880 ms 1.1898 ms 1.1919 ms]                                   
                        change: [-2.0795% -1.8401% -1.6394%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  6 (6.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

Benchmarking sort_large_big_ascending: Collecting 100 samples in estimated 8.8968 s (10k iterations                                                                                                   sort_large_big_ascending                        
                        time:   [878.54 us 880.05 us 882.04 us]
                        change: [-0.5682% +0.2753% +0.8210%] (p = 0.57 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking sort_large_big_descending: Collecting 100 samples in estimated 8.5611 s (10k iteration                                                                                                   sort_large_big_descending                        
                        time:   [840.28 us 841.62 us 843.39 us]
                        change: [-0.2935% -0.0882% +0.1958%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

zhangyunhao116 · 2022-03-02T02:59:26Z

@joshtriplett Kindly ping :)

zhangyunhao116 · 2022-04-06T08:18:22Z

r? rust-lang/libs

JohnCSimon · 2022-05-08T00:49:37Z

Triage:
@m-ou-se - what is the state of this review?

JohnTitor · 2022-06-20T07:46:20Z

Could you prefer to rebase over merge? We have a no-merge policy.

m-ou-se

This looks good to me. Can you squash/rebase the commits? Thanks!

zhangyunhao116 · 2022-06-20T09:14:52Z

Done. PTAL, thanks!

m-ou-se · 2022-06-20T09:21:11Z

@bors r+

Thanks!

bors · 2022-06-20T09:21:12Z

📌 Commit 98507f2 has been approved by m-ou-se

matthiaskrgr · 2022-06-20T18:06:23Z

@bors rollup=never

bors · 2022-06-20T18:09:33Z

⌛ Testing commit 98507f2 with merge 5750a6a...

bors · 2022-06-20T22:15:49Z

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 5750a6a to master...

rust-timer · 2022-06-20T23:48:21Z

Finished benchmarking commit (5750a6a): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvement found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-0.2%	-0.2%	1
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	4.1%	4.1%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-1.8%	-1.8%	1
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.1%	3.8%	2
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

rust-lang/rust#93765 shows a 10% speedup and had been merged into std. Given that rayon's code was an exact copy of the heapsort in std, this PR implements the same optimization.

950: Keep heapsorts implementation equal to std r=cuviper a=ritchie46 rust-lang/rust#93765 shows a 10% speedup and had been merged into std. Given that rayon's code was an exact copy of the heapsort in std, this PR implements the same optimization. Co-authored-by: Ritchie Vink <[email protected]>

rust-highfive assigned joshtriplett Feb 8, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 8, 2022

rust-highfive assigned m-ou-se and unassigned joshtriplett Apr 6, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 11, 2022

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022

m-ou-se approved these changes Jun 20, 2022

View reviewed changes

Optimize heapsort

98507f2

zhangyunhao116 force-pushed the heapsort branch from c2f1a54 to 98507f2 Compare June 20, 2022 08:31

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 20, 2022

bors merged commit 5750a6a into rust-lang:master Jun 20, 2022

rustbot added this to the 1.63.0 milestone Jun 20, 2022

joshtriplett added the relnotes-perf Performance improvements that should be mentioned in the release notes. label Jun 20, 2022

ritchie46 mentioned this pull request Jun 24, 2022

Keep heapsorts implementation equal to std rayon-rs/rayon#950

Merged

zhangyunhao116 deleted the heapsort branch March 3, 2023 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize heapsort #93765

Optimize heapsort #93765

zhangyunhao116 commented Feb 8, 2022

rust-highfive commented Feb 8, 2022

the8472 commented Feb 8, 2022

zhangyunhao116 commented Feb 9, 2022

zhangyunhao116 commented Mar 2, 2022

zhangyunhao116 commented Apr 6, 2022

JohnCSimon commented May 8, 2022

JohnTitor commented Jun 20, 2022

m-ou-se left a comment

zhangyunhao116 commented Jun 20, 2022

m-ou-se commented Jun 20, 2022

bors commented Jun 20, 2022

matthiaskrgr commented Jun 20, 2022

bors commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

Optimize heapsort #93765

Optimize heapsort #93765

Conversation

zhangyunhao116 commented Feb 8, 2022

rust-highfive commented Feb 8, 2022

the8472 commented Feb 8, 2022

zhangyunhao116 commented Feb 9, 2022

zhangyunhao116 commented Mar 2, 2022

zhangyunhao116 commented Apr 6, 2022

JohnCSimon commented May 8, 2022

JohnTitor commented Jun 20, 2022

m-ou-se left a comment

Choose a reason for hiding this comment

zhangyunhao116 commented Jun 20, 2022

m-ou-se commented Jun 20, 2022

bors commented Jun 20, 2022

matthiaskrgr commented Jun 20, 2022

bors commented Jun 20, 2022

bors commented Jun 20, 2022

rust-timer commented Jun 20, 2022

Instruction count

Max RSS (memory usage)

Cycles

Footnotes