Closed
Description
Abstract
From ByteDance Programming Language Team
We suggest using pdqsort in the sort
package. The new algorithm is mainly based on pattern-defeating quicksort by Orson Peters. Both Rust and C++ Boost have used pdqsort in their library.
- Across all benchmarks, pdqsort is never significantly slower than the previous algorithm(in the built-in
sort
package). - In common patterns, pdqsort is often faster (i.e. 10x faster in sorted slices).
Implementations
- Gerrit CL: https://go-review.googlesource.com/c/go/+/371574
- stdpdqsort: based on
sort.Interface
. Same as the Gerrit CL, but contains more tests and benchmarks. - pdqsort: based on generics(need Go 1.18), about 2x ~ 60x faster than built-in package(only support
Ordered
types). Since Go1.18 is not released, this implementation is just a demo for now.
Algorithm details
The new algorithm is mainly based on pdqsort, but with these modifications:
- Disable the optimization from BlockQuicksort, since its poor performance in Go.
- Modify some parameters according to the benchmark results.
Known issues
- The new algorithm is still an unstable sorting algorithm, it may rearrange equal elements differently compared to the original sorting algorithm in the
sort
package.
Benchmarks
Go version: go1.18-d6c4583ad4 linux/amd64
CPU: Intel 11700k(8C16T)
OS: ubuntu 20.04
MEMORY: 16G x 2 (3200MHz)
- Benchmark from
sort
package. (go test sort -count=20 -run=NOTEST -bench=.
)
name old time/op new time/op delta
SearchWrappers-16 72.8ns ± 3% 75.1ns ± 2% +3.25% (p=0.000 n=20+10)
SortString1K-16 85.2µs ± 3% 86.2µs ± 5% ~ (p=0.247 n=19+10)
SortString1K_Slice-16 84.6µs ± 4% 86.1µs ± 4% ~ (p=0.120 n=20+10)
StableString1K-16 112µs ± 5% 112µs ± 5% ~ (p=0.604 n=19+10)
SortInt1K-16 44.8µs ± 3% 43.2µs ± 2% -3.68% (p=0.000 n=20+10)
SortInt1K_Sorted-16 28.2µs ± 3% 3.3µs ± 3% -88.16% (p=0.000 n=19+10)
SortInt1K_Reversed-16 29.4µs ± 3% 4.8µs ± 2% -83.59% (p=0.000 n=20+10)
SortInt1K_Mod8-16 25.1µs ± 2% 20.0µs ± 2% -20.35% (p=0.000 n=18+10)
StableInt1K-16 51.3µs ± 3% 50.9µs ± 2% ~ (p=0.562 n=20+9)
StableInt1K_Slice-16 49.5µs ± 2% 50.7µs ± 4% +2.55% (p=0.009 n=19+10)
SortInt64K-16 4.73ms ± 3% 4.49ms ± 4% -5.08% (p=0.000 n=20+10)
SortInt64K_Slice-16 4.51ms ± 3% 4.35ms ± 1% -3.42% (p=0.000 n=20+8)
StableInt64K-16 4.85ms ± 2% 4.82ms ± 2% ~ (p=0.267 n=20+10)
Sort1e2-16 27.9µs ± 1% 28.1µs ± 2% ~ (p=0.198 n=20+10)
Stable1e2-16 56.6µs ± 2% 55.0µs ± 2% -2.88% (p=0.000 n=20+10)
Sort1e4-16 5.51ms ± 1% 5.36ms ± 1% -2.58% (p=0.000 n=19+9)
Stable1e4-16 17.8ms ± 1% 17.3ms ± 1% -2.40% (p=0.000 n=20+10)
Sort1e6-16 833ms ± 1% 807ms ± 1% -3.02% (p=0.000 n=20+10)
Stable1e6-16 3.49s ± 2% 3.44s ± 1% -1.41% (p=0.001 n=20+10)
-
Benchmark from stdpdqsort. (
go test -run=NOTEST -bench=. -cpu=1 -benchtime=1000x -count=10 -timeout=60m > release.txt && benchstat release.txt
)-
Random:
data[i] = rand.Int()
-
Sorted:
data[i] = i
-
NearlySorted:
for i := range x { x[i] = i } for i := 0; i < len(x)/20; i++ { a, b := rand.Intn(len(x)), rand.Intn(len(x)) x[a], x[b] = x[b], x[a] }
-
Reversed:
data[i] = len(data) - i
-
NearlyReversed:
for i := range x { x[i] = len(x) - i } for i := 0; i < len(x)/20; i++ { a, b := rand.Intn(len(x)), rand.Intn(len(x)) x[a], x[b] = x[b], x[a] }
-
Mod8:
data[i] = i % 8
name time/op Random/pdqsort_64 2.44µs ± 1% Random/stdsort_64 2.38µs ± 0% Random/pdqsort_256 13.1µs ± 7% Random/stdsort_256 13.8µs ± 0% Random/pdqsort_1024 62.3µs ± 1% Random/stdsort_1024 62.2µs ± 0% Random/pdqsort_4096 289µs ± 0% Random/stdsort_4096 290µs ± 0% Random/pdqsort_65536 6.08ms ± 0% Random/stdsort_65536 6.11ms ± 0% Sorted/pdqsort_64 249ns ± 0% Sorted/stdsort_64 709ns ± 1% Sorted/pdqsort_256 577ns ± 0% Sorted/stdsort_256 3.36µs ± 0% Sorted/pdqsort_1024 1.97µs ± 0% Sorted/stdsort_1024 16.6µs ± 0% Sorted/pdqsort_4096 7.31µs ± 0% Sorted/stdsort_4096 78.9µs ± 0% Sorted/pdqsort_65536 115µs ± 0% Sorted/stdsort_65536 1.73ms ± 0% NearlySorted/pdqsort_64 906ns ± 2% NearlySorted/stdsort_64 1.00µs ± 2% NearlySorted/pdqsort_256 4.52µs ± 1% NearlySorted/stdsort_256 5.01µs ± 1% NearlySorted/pdqsort_1024 21.2µs ± 0% NearlySorted/stdsort_1024 25.6µs ± 4% NearlySorted/pdqsort_4096 101µs ± 0% NearlySorted/stdsort_4096 117µs ± 1% NearlySorted/pdqsort_65536 2.14ms ± 0% NearlySorted/stdsort_65536 2.46ms ± 0% Reversed/pdqsort_64 320ns ± 1% Reversed/stdsort_64 845ns ± 0% Reversed/pdqsort_256 831ns ± 1% Reversed/stdsort_256 3.71µs ± 0% Reversed/pdqsort_1024 2.88µs ± 1% Reversed/stdsort_1024 17.9µs ± 1% Reversed/pdqsort_4096 10.9µs ± 1% Reversed/stdsort_4096 83.9µs ± 0% Reversed/pdqsort_65536 172µs ± 0% Reversed/stdsort_65536 1.80ms ± 0% NearlyReversed/pdqsort_64 1.17µs ± 1% NearlyReversed/stdsort_64 1.41µs ± 1% NearlyReversed/pdqsort_256 5.63µs ± 1% NearlyReversed/stdsort_256 7.24µs ± 1% NearlyReversed/pdqsort_1024 28.9µs ± 4% NearlyReversed/stdsort_1024 38.3µs ± 0% NearlyReversed/pdqsort_4096 136µs ± 1% NearlyReversed/stdsort_4096 176µs ± 0% NearlyReversed/pdqsort_65536 2.73ms ± 1% NearlyReversed/stdsort_65536 3.59ms ± 0% Mod8/pdqsort_64 878ns ± 1% Mod8/stdsort_64 941ns ± 0% Mod8/pdqsort_256 3.01µs ± 1% Mod8/stdsort_256 3.78µs ± 0% Mod8/pdqsort_1024 10.6µs ± 1% Mod8/stdsort_1024 14.3µs ± 0% Mod8/pdqsort_4096 45.7µs ± 0% Mod8/stdsort_4096 59.4µs ± 1% Mod8/pdqsort_65536 754µs ± 1% Mod8/stdsort_65536 1.01ms ± 0% AllEqual/pdqsort_64 250ns ± 0% AllEqual/stdsort_64 376ns ± 2% AllEqual/pdqsort_256 578ns ± 0% AllEqual/stdsort_256 1.02µs ± 1% AllEqual/pdqsort_1024 1.97µs ± 0% AllEqual/stdsort_1024 3.69µs ± 0% AllEqual/pdqsort_4096 7.20µs ± 0% AllEqual/stdsort_4096 14.0µs ± 0% AllEqual/pdqsort_65536 115µs ± 0% AllEqual/stdsort_65536 223µs ± 0%
-
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
gopherbot commentedon Dec 14, 2021
Change https://golang.org/cl/371574 mentions this issue:
sort: use pdqsort
robpike commentedon Dec 14, 2021
Without running the same set of benchmarks on the same data with the new and old code, it's hard to see the improvement.
zhangyunhao116 commentedon Dec 14, 2021
Updated, add
Sorted
Reversed
Mod8
to the built-in benchmark.esote commentedon Dec 14, 2021
Reran the stdpdqsort benchmarks to get a delta. EC2 t2.micro (1 vCPU, 1 GiB mem), goos: linux, goarch: amd64
ianlancetaylor commentedon Dec 14, 2021
I think the only reason this could be considered as a proposal is that it will change the sorting of equal elements when using
sort.Sort
. If that is OK then this is just an implementation change. (Personally, I think that the change in sorting is OK.)esote commentedon Dec 15, 2021
Since
sort.Sort
is not guaranteed to be stable I think it's safe to change. If particular sorting for equal elements was needed,sort.Stable
or a total ordering function for Less should be usedrsc commentedon Dec 15, 2021
We've changed the sort results in the past. There was a release that greatly reduced the number of comparisons but changed sort orders. So changing again if we have significant optimizations to apply should be fine (modulo reviews, code being fine, etc).
[-]proposal: sort: use pdqsort[/-][+]sort: use pdqsort[/+]13 remaining items