Skip to content

sort: use pdqsort #50154

Closed
Closed
@zhangyunhao116

Description

@zhangyunhao116

Abstract

From ByteDance Programming Language Team

We suggest using pdqsort in the sort package. The new algorithm is mainly based on pattern-defeating quicksort by Orson Peters. Both Rust and C++ Boost have used pdqsort in their library.

  • Across all benchmarks, pdqsort is never significantly slower than the previous algorithm(in the built-in sort package).
  • In common patterns, pdqsort is often faster (i.e. 10x faster in sorted slices).

Implementations

  • Gerrit CL: https://go-review.googlesource.com/c/go/+/371574
  • stdpdqsort: based on sort.Interface. Same as the Gerrit CL, but contains more tests and benchmarks.
  • pdqsort: based on generics(need Go 1.18), about 2x ~ 60x faster than built-in package(only support Ordered types). Since Go1.18 is not released, this implementation is just a demo for now.

Algorithm details

The new algorithm is mainly based on pdqsort, but with these modifications:

  • Disable the optimization from BlockQuicksort, since its poor performance in Go.
  • Modify some parameters according to the benchmark results.

Known issues

  • The new algorithm is still an unstable sorting algorithm, it may rearrange equal elements differently compared to the original sorting algorithm in the sort package.

Benchmarks

Go version: go1.18-d6c4583ad4 linux/amd64

CPU: Intel 11700k(8C16T)

OS: ubuntu 20.04

MEMORY: 16G x 2 (3200MHz)

  • Benchmark from sort package. (go test sort -count=20 -run=NOTEST -bench=.)
name                   old time/op  new time/op  delta
SearchWrappers-16      72.8ns ± 3%  75.1ns ± 2%   +3.25%  (p=0.000 n=20+10)
SortString1K-16        85.2µs ± 3%  86.2µs ± 5%     ~     (p=0.247 n=19+10)
SortString1K_Slice-16  84.6µs ± 4%  86.1µs ± 4%     ~     (p=0.120 n=20+10)
StableString1K-16       112µs ± 5%   112µs ± 5%     ~     (p=0.604 n=19+10)
SortInt1K-16           44.8µs ± 3%  43.2µs ± 2%   -3.68%  (p=0.000 n=20+10)
SortInt1K_Sorted-16    28.2µs ± 3%   3.3µs ± 3%  -88.16%  (p=0.000 n=19+10)
SortInt1K_Reversed-16  29.4µs ± 3%   4.8µs ± 2%  -83.59%  (p=0.000 n=20+10)
SortInt1K_Mod8-16      25.1µs ± 2%  20.0µs ± 2%  -20.35%  (p=0.000 n=18+10)
StableInt1K-16         51.3µs ± 3%  50.9µs ± 2%     ~     (p=0.562 n=20+9)
StableInt1K_Slice-16   49.5µs ± 2%  50.7µs ± 4%   +2.55%  (p=0.009 n=19+10)
SortInt64K-16          4.73ms ± 3%  4.49ms ± 4%   -5.08%  (p=0.000 n=20+10)
SortInt64K_Slice-16    4.51ms ± 3%  4.35ms ± 1%   -3.42%  (p=0.000 n=20+8)
StableInt64K-16        4.85ms ± 2%  4.82ms ± 2%     ~     (p=0.267 n=20+10)
Sort1e2-16             27.9µs ± 1%  28.1µs ± 2%     ~     (p=0.198 n=20+10)
Stable1e2-16           56.6µs ± 2%  55.0µs ± 2%   -2.88%  (p=0.000 n=20+10)
Sort1e4-16             5.51ms ± 1%  5.36ms ± 1%   -2.58%  (p=0.000 n=19+9)
Stable1e4-16           17.8ms ± 1%  17.3ms ± 1%   -2.40%  (p=0.000 n=20+10)
Sort1e6-16              833ms ± 1%   807ms ± 1%   -3.02%  (p=0.000 n=20+10)
Stable1e6-16            3.49s ± 2%   3.44s ± 1%   -1.41%  (p=0.001 n=20+10)
  • Benchmark from stdpdqsort. (go test -run=NOTEST -bench=. -cpu=1 -benchtime=1000x -count=10 -timeout=60m > release.txt && benchstat release.txt)

    • Random: data[i] = rand.Int()

    • Sorted: data[i] = i

    • NearlySorted:

      for i := range x {
      	x[i] = i
      }
      for i := 0; i < len(x)/20; i++ {
      	a, b := rand.Intn(len(x)), rand.Intn(len(x))
      	x[a], x[b] = x[b], x[a]
      }
    • Reversed: data[i] = len(data) - i

    • NearlyReversed:

      for i := range x {
      	x[i] = len(x) - i
      }
      for i := 0; i < len(x)/20; i++ {
      	a, b := rand.Intn(len(x)), rand.Intn(len(x))
      	x[a], x[b] = x[b], x[a]
      }
    • Mod8: data[i] = i % 8

    name                          time/op
    Random/pdqsort_64             2.44µs ± 1%
    Random/stdsort_64             2.38µs ± 0%
    Random/pdqsort_256            13.1µs ± 7%
    Random/stdsort_256            13.8µs ± 0%
    Random/pdqsort_1024           62.3µs ± 1%
    Random/stdsort_1024           62.2µs ± 0%
    Random/pdqsort_4096            289µs ± 0%
    Random/stdsort_4096            290µs ± 0%
    Random/pdqsort_65536          6.08ms ± 0%
    Random/stdsort_65536          6.11ms ± 0%
    Sorted/pdqsort_64              249ns ± 0%
    Sorted/stdsort_64              709ns ± 1%
    Sorted/pdqsort_256             577ns ± 0%
    Sorted/stdsort_256            3.36µs ± 0%
    Sorted/pdqsort_1024           1.97µs ± 0%
    Sorted/stdsort_1024           16.6µs ± 0%
    Sorted/pdqsort_4096           7.31µs ± 0%
    Sorted/stdsort_4096           78.9µs ± 0%
    Sorted/pdqsort_65536           115µs ± 0%
    Sorted/stdsort_65536          1.73ms ± 0%
    NearlySorted/pdqsort_64        906ns ± 2%
    NearlySorted/stdsort_64       1.00µs ± 2%
    NearlySorted/pdqsort_256      4.52µs ± 1%
    NearlySorted/stdsort_256      5.01µs ± 1%
    NearlySorted/pdqsort_1024     21.2µs ± 0%
    NearlySorted/stdsort_1024     25.6µs ± 4%
    NearlySorted/pdqsort_4096      101µs ± 0%
    NearlySorted/stdsort_4096      117µs ± 1%
    NearlySorted/pdqsort_65536    2.14ms ± 0%
    NearlySorted/stdsort_65536    2.46ms ± 0%
    Reversed/pdqsort_64            320ns ± 1%
    Reversed/stdsort_64            845ns ± 0%
    Reversed/pdqsort_256           831ns ± 1%
    Reversed/stdsort_256          3.71µs ± 0%
    Reversed/pdqsort_1024         2.88µs ± 1%
    Reversed/stdsort_1024         17.9µs ± 1%
    Reversed/pdqsort_4096         10.9µs ± 1%
    Reversed/stdsort_4096         83.9µs ± 0%
    Reversed/pdqsort_65536         172µs ± 0%
    Reversed/stdsort_65536        1.80ms ± 0%
    NearlyReversed/pdqsort_64     1.17µs ± 1%
    NearlyReversed/stdsort_64     1.41µs ± 1%
    NearlyReversed/pdqsort_256    5.63µs ± 1%
    NearlyReversed/stdsort_256    7.24µs ± 1%
    NearlyReversed/pdqsort_1024   28.9µs ± 4%
    NearlyReversed/stdsort_1024   38.3µs ± 0%
    NearlyReversed/pdqsort_4096    136µs ± 1%
    NearlyReversed/stdsort_4096    176µs ± 0%
    NearlyReversed/pdqsort_65536  2.73ms ± 1%
    NearlyReversed/stdsort_65536  3.59ms ± 0%
    Mod8/pdqsort_64                878ns ± 1%
    Mod8/stdsort_64                941ns ± 0%
    Mod8/pdqsort_256              3.01µs ± 1%
    Mod8/stdsort_256              3.78µs ± 0%
    Mod8/pdqsort_1024             10.6µs ± 1%
    Mod8/stdsort_1024             14.3µs ± 0%
    Mod8/pdqsort_4096             45.7µs ± 0%
    Mod8/stdsort_4096             59.4µs ± 1%
    Mod8/pdqsort_65536             754µs ± 1%
    Mod8/stdsort_65536            1.01ms ± 0%
    AllEqual/pdqsort_64            250ns ± 0%
    AllEqual/stdsort_64            376ns ± 2%
    AllEqual/pdqsort_256           578ns ± 0%
    AllEqual/stdsort_256          1.02µs ± 1%
    AllEqual/pdqsort_1024         1.97µs ± 0%
    AllEqual/stdsort_1024         3.69µs ± 0%
    AllEqual/pdqsort_4096         7.20µs ± 0%
    AllEqual/stdsort_4096         14.0µs ± 0%
    AllEqual/pdqsort_65536         115µs ± 0%
    AllEqual/stdsort_65536         223µs ± 0%
    

Activity

added this to the Proposal milestone on Dec 14, 2021
gopherbot

gopherbot commented on Dec 14, 2021

@gopherbot
Contributor

Change https://golang.org/cl/371574 mentions this issue: sort: use pdqsort

robpike

robpike commented on Dec 14, 2021

@robpike
Contributor

Without running the same set of benchmarks on the same data with the new and old code, it's hard to see the improvement.

zhangyunhao116

zhangyunhao116 commented on Dec 14, 2021

@zhangyunhao116
ContributorAuthor

Updated, add Sorted Reversed Mod8 to the built-in benchmark.

esote

esote commented on Dec 14, 2021

@esote

Reran the stdpdqsort benchmarks to get a delta. EC2 t2.micro (1 vCPU, 1 GiB mem), goos: linux, goarch: amd64

name                       old time/op    new time/op    delta
Random/sort_64               6.57µs ± 2%    6.67µs ± 1%   +1.52%  (p=0.002 n=10+9)
Random/sort_256              30.6µs ± 5%    31.2µs ± 5%     ~     (p=0.052 n=10+10)
Random/sort_1024              142µs ± 1%     144µs ± 1%   +1.73%  (p=0.000 n=10+10)
Random/sort_4096              674µs ± 1%     674µs ± 0%     ~     (p=0.739 n=10+10)
Random/sort_65536            14.2ms ± 0%    14.1ms ± 0%   -0.68%  (p=0.000 n=10+10)
Sorted/sort_64               2.87µs ± 4%    1.31µs ± 8%  -54.49%  (p=0.000 n=10+10)
Sorted/sort_256              11.7µs ± 0%     2.4µs ± 3%  -79.34%  (p=0.000 n=10+10)
Sorted/sort_1024             54.8µs ± 6%     7.0µs ± 2%  -87.31%  (p=0.000 n=10+10)
Sorted/sort_4096              253µs ± 1%      25µs ± 1%  -90.31%  (p=0.000 n=9+10)
Sorted/sort_65536            5.53ms ± 0%    0.35ms ± 1%  -93.61%  (p=0.000 n=9+10)
NearlySorted/sort_64         3.36µs ± 1%    3.13µs ± 5%   -6.69%  (p=0.000 n=8+10)
NearlySorted/sort_256        14.7µs ± 1%    13.2µs ± 1%   -9.63%  (p=0.000 n=9+8)
NearlySorted/sort_1024       69.2µs ± 4%    59.6µs ± 1%  -13.94%  (p=0.000 n=10+8)
NearlySorted/sort_4096        320µs ± 1%     283µs ± 1%  -11.45%  (p=0.000 n=9+10)
NearlySorted/sort_65536      6.79ms ± 0%    6.03ms ± 1%  -11.17%  (p=0.000 n=9+10)
Reversed/sort_64             3.25µs ± 3%    1.53µs ± 8%  -52.90%  (p=0.000 n=8+10)
Reversed/sort_256            12.7µs ± 2%     3.1µs ± 3%  -75.43%  (p=0.000 n=10+10)
Reversed/sort_1024           55.5µs ± 1%     9.8µs ± 3%  -82.40%  (p=0.000 n=9+10)
Reversed/sort_4096            265µs ± 1%      36µs ± 3%  -86.58%  (p=0.000 n=10+8)
Reversed/sort_65536          5.68ms ± 0%    0.51ms ± 0%  -90.98%  (p=0.000 n=9+10)
NearlyReversed/sort_64       4.44µs ± 1%    3.93µs ± 2%  -11.38%  (p=0.000 n=7+9)
NearlyReversed/sort_256      21.1µs ± 1%    16.8µs ± 1%  -20.16%  (p=0.000 n=10+10)
NearlyReversed/sort_1024     96.8µs ± 3%    77.3µs ± 2%  -20.13%  (p=0.000 n=10+9)
NearlyReversed/sort_4096      453µs ± 1%     365µs ± 1%  -19.42%  (p=0.000 n=10+10)
NearlyReversed/sort_65536    9.22ms ± 0%    7.40ms ± 2%  -19.75%  (p=0.000 n=8+10)
Mod8/sort_64                 3.39µs ± 4%    3.32µs ± 7%     ~     (p=0.190 n=9+10)
Mod8/sort_256                12.5µs ± 1%    10.3µs ± 1%  -17.87%  (p=0.000 n=10+10)
Mod8/sort_1024               49.1µs ± 6%    39.6µs ± 6%  -19.43%  (p=0.000 n=9+10)
Mod8/sort_4096                200µs ± 2%     149µs ± 2%  -25.53%  (p=0.000 n=10+10)
Mod8/sort_65536              2.98ms ± 0%    2.20ms ± 1%  -26.34%  (p=0.000 n=10+10)
AllEqual/sort_64             1.68µs ± 6%    1.33µs ± 7%  -20.73%  (p=0.000 n=10+10)
AllEqual/sort_256            3.94µs ± 1%    2.49µs ± 3%  -36.69%  (p=0.000 n=9+10)
AllEqual/sort_1024           12.9µs ± 1%     6.9µs ± 3%  -46.14%  (p=0.000 n=9+10)
AllEqual/sort_4096           46.4µs ± 5%    24.8µs ± 1%  -46.68%  (p=0.000 n=10+8)
AllEqual/sort_65536           697µs ± 0%     352µs ± 1%  -49.50%  (p=0.000 n=10+10)
[Geo mean]                   78.7µs         40.8µs       -48.17%
ianlancetaylor

ianlancetaylor commented on Dec 14, 2021

@ianlancetaylor
Contributor

I think the only reason this could be considered as a proposal is that it will change the sorting of equal elements when using sort.Sort. If that is OK then this is just an implementation change. (Personally, I think that the change in sorting is OK.)

esote

esote commented on Dec 15, 2021

@esote

Since sort.Sort is not guaranteed to be stable I think it's safe to change. If particular sorting for equal elements was needed, sort.Stable or a total ordering function for Less should be used

rsc

rsc commented on Dec 15, 2021

@rsc
Contributor

We've changed the sort results in the past. There was a release that greatly reduced the number of comparisons but changed sort orders. So changing again if we have significant optimizations to apply should be fine (modulo reviews, code being fine, etc).

changed the title [-]proposal: sort: use pdqsort[/-] [+]sort: use pdqsort[/+] on Dec 15, 2021
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
and removed on Dec 15, 2021

13 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rsc@ianlancetaylor@robpike@gopherbot@MaoJianwei

        Issue actions

          sort: use pdqsort · Issue #50154 · golang/go