-
Notifications
You must be signed in to change notification settings - Fork 3.9k
roachpb: Use make and copy pattern for high profile byte slice copies #4963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachpb: Use make and copy pattern for high profile byte slice copies #4963
Conversation
@@ -112,11 +112,15 @@ func MakeKey(keys ...[]byte) []byte { | |||
// BytesNext returns the next possible byte by appending an \x00. | |||
func BytesNext(b []byte) []byte { | |||
// TODO(spencer): Do we need to enforce KeyMaxLength here? | |||
return append(append([]byte(nil), b...), 0) | |||
bn := make([]byte, len(b)+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add a comment referring to the PR so that this isn't accidentally reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
LGTM |
LGTM. Seems like this deserves a bug report upstream, too. |
Benchmarking and examining assembly demonstrated that the following patterns are not 100% equal when making a deep copy of a byte slice: ``` d := make([]byte, len(c)) copy(d, c) ``` ``` d = append([]byte(nil), c...) ``` It turns out that the first pattern is actually about a constant (independent of buffer size) **12 ns** faster. This is pretty insignificant and shouldn't matter in most cases, but actually makes a difference for a few key manipulation routines we use extremely heavily (`PrefixEnd`, `Next`, etc.). Interestingly, the first way will actually perform worse for a nil byte slice, so this shouldn't be used in general cases where nil checks weren't already in use. ``` BenchmarkBBytes-4 20000000 59.6 ns/op 32 B/op 1 allocs/op BenchmarkBString-4 20000000 71.0 ns/op 32 B/op 1 allocs/op ``` ``` name old time/op new time/op delta Bank_Cockroach-2 427µs ± 1% 427µs ± 3% ~ (p=0.853 n=10+10) Select1_Cockroach-2 59.9µs ± 2% 60.0µs ± 1% ~ (p=0.436 n=10+10) Select2_Cockroach-2 1.03ms ± 1% 1.03ms ± 1% ~ (p=0.762 n=8+10) Insert1_Cockroach-2 499µs ± 1% 499µs ± 0% ~ (p=0.573 n=10+8) Insert10_Cockroach-2 884µs ± 2% 891µs ± 1% +0.78% (p=0.043 n=10+10) Insert100_Cockroach-2 4.58ms ± 2% 4.60ms ± 1% ~ (p=0.400 n=10+9) Insert1000_Cockroach-2 44.0ms ± 1% 44.2ms ± 1% ~ (p=0.315 n=9+10) Update1_Cockroach-2 758µs ± 2% 756µs ± 1% ~ (p=0.631 n=10+10) Update10_Cockroach-2 1.41ms ± 1% 1.41ms ± 2% ~ (p=0.356 n=9+10) Update100_Cockroach-2 7.69ms ± 3% 7.66ms ± 2% ~ (p=0.497 n=10+9) Update1000_Cockroach-2 67.8ms ± 1% 68.0ms ± 1% ~ (p=0.211 n=9+10) Delete1_Cockroach-2 848µs ± 2% 839µs ± 3% ~ (p=0.165 n=10+10) Delete10_Cockroach-2 2.63ms ± 2% 2.60ms ± 1% -1.07% (p=0.035 n=10+9) Delete100_Cockroach-2 21.6ms ± 1% 21.4ms ± 1% -0.95% (p=0.001 n=10+9) Delete1000_Cockroach-2 225ms ± 1% 226ms ± 2% ~ (p=0.579 n=10+10) Scan1_Cockroach-2 267µs ± 1% 268µs ± 1% ~ (p=0.065 n=10+9) Scan10_Cockroach-2 315µs ± 2% 317µs ± 1% ~ (p=0.133 n=10+9) Scan100_Cockroach-2 675µs ± 3% 669µs ± 2% ~ (p=0.218 n=10+10) Scan1000_Cockroach-2 4.03ms ± 1% 3.96ms ± 1% -1.71% (p=0.000 n=10+10) Scan10000_Cockroach-2 41.3ms ± 2% 40.9ms ± 1% ~ (p=0.105 n=10+10) Scan1000Limit1_Cockroach-2 283µs ± 1% 284µs ± 1% ~ (p=0.218 n=10+10) Scan1000Limit10_Cockroach-2 330µs ± 2% 329µs ± 1% ~ (p=0.912 n=10+10) Scan1000Limit100_Cockroach-2 692µs ± 3% 683µs ± 2% -1.38% (p=0.023 n=10+10) PgbenchQuery_Cockroach-2 3.97ms ± 4% 3.86ms ± 2% -2.66% (p=0.043 n=10+9) name old alloc/op new alloc/op delta Bank_Cockroach-2 68.5kB ± 0% 68.5kB ± 0% ~ (p=0.239 n=10+10) Select1_Cockroach-2 3.90kB ± 0% 3.90kB ± 0% ~ (p=0.132 n=10+10) Select2_Cockroach-2 80.7kB ± 0% 80.2kB ± 0% -0.54% (p=0.000 n=9+7) Insert1_Cockroach-2 23.0kB ± 0% 23.0kB ± 0% ~ (p=0.795 n=10+9) Insert10_Cockroach-2 67.1kB ± 0% 67.1kB ± 0% ~ (p=0.912 n=10+10) Insert100_Cockroach-2 508kB ± 0% 508kB ± 0% ~ (p=0.483 n=10+9) Insert1000_Cockroach-2 4.57MB ± 1% 4.57MB ± 1% ~ (p=0.853 n=10+10) Update1_Cockroach-2 36.8kB ± 0% 36.8kB ± 0% ~ (p=0.857 n=9+10) Update10_Cockroach-2 100kB ± 0% 100kB ± 0% ~ (p=0.631 n=10+10) Update100_Cockroach-2 710kB ± 1% 706kB ± 0% ~ (p=0.211 n=10+9) Update1000_Cockroach-2 6.12MB ± 0% 6.13MB ± 1% ~ (p=0.780 n=9+10) Delete1_Cockroach-2 35.9kB ± 0% 35.8kB ± 0% -0.19% (p=0.001 n=10+9) Delete10_Cockroach-2 109kB ± 0% 108kB ± 0% -0.25% (p=0.000 n=10+9) Delete100_Cockroach-2 818kB ± 0% 816kB ± 0% -0.30% (p=0.000 n=10+8) Delete1000_Cockroach-2 7.38MB ± 0% 7.37MB ± 0% -0.05% (p=0.000 n=8+8) Scan1_Cockroach-2 13.3kB ± 1% 13.2kB ± 0% ~ (p=0.435 n=10+10) Scan10_Cockroach-2 17.1kB ± 0% 17.0kB ± 0% -0.62% (p=0.000 n=9+10) Scan100_Cockroach-2 49.3kB ± 0% 48.5kB ± 0% -1.64% (p=0.000 n=8+10) Scan1000_Cockroach-2 331kB ± 0% 329kB ± 0% -0.63% (p=0.000 n=10+10) Scan10000_Cockroach-2 5.72MB ± 0% 5.71MB ± 0% -0.04% (p=0.000 n=9+10) Scan1000Limit1_Cockroach-2 14.0kB ± 0% 14.0kB ± 0% ~ (p=0.986 n=9+10) Scan1000Limit10_Cockroach-2 17.8kB ± 0% 17.8kB ± 0% -0.44% (p=0.000 n=9+10) Scan1000Limit100_Cockroach-2 50.0kB ± 0% 49.2kB ± 0% -1.65% (p=0.000 n=8+10) PgbenchQuery_Cockroach-2 225kB ±10% 215kB ± 0% -4.50% (p=0.009 n=10+8) name old allocs/op new allocs/op delta Bank_Cockroach-2 1.67k ± 0% 1.67k ± 0% -0.02% (p=0.029 n=7+10) Select1_Cockroach-2 97.0 ± 0% 97.0 ± 0% ~ (all samples are equal) Select2_Cockroach-2 2.70k ± 0% 2.70k ± 0% ~ (p=0.350 n=9+7) Insert1_Cockroach-2 356 ± 0% 357 ± 0% ~ (p=0.465 n=10+10) Insert10_Cockroach-2 906 ± 0% 907 ± 0% ~ (p=0.491 n=10+10) Insert100_Cockroach-2 6.18k ± 0% 6.18k ± 0% ~ (p=0.511 n=10+9) Insert1000_Cockroach-2 59.0k ± 0% 59.0k ± 0% ~ (p=0.754 n=10+10) Update1_Cockroach-2 719 ± 0% 719 ± 0% ~ (p=0.412 n=9+8) Update10_Cockroach-2 1.41k ± 0% 1.41k ± 0% ~ (p=0.871 n=10+10) Update100_Cockroach-2 7.99k ± 2% 7.95k ± 0% ~ (p=0.140 n=10+9) Update1000_Cockroach-2 70.4k ± 0% 70.5k ± 0% ~ (p=0.556 n=9+10) Delete1_Cockroach-2 692 ± 0% 692 ± 0% ~ (p=0.165 n=8+9) Delete10_Cockroach-2 1.70k ± 0% 1.70k ± 0% ~ (p=0.968 n=10+9) Delete100_Cockroach-2 11.4k ± 0% 11.4k ± 0% ~ (p=0.090 n=10+8) Delete1000_Cockroach-2 108k ± 0% 108k ± 0% ~ (p=0.294 n=8+8) Scan1_Cockroach-2 276 ± 1% 276 ± 1% ~ (p=1.000 n=10+10) Scan10_Cockroach-2 356 ± 0% 355 ± 0% ~ (p=1.000 n=10+10) Scan100_Cockroach-2 1.08k ± 0% 1.08k ± 0% ~ (p=0.704 n=8+10) Scan1000_Cockroach-2 8.29k ± 0% 8.29k ± 0% ~ (p=0.173 n=9+10) Scan10000_Cockroach-2 80.4k ± 0% 80.4k ± 0% ~ (p=0.182 n=8+10) Scan1000Limit1_Cockroach-2 298 ± 1% 298 ± 0% ~ (p=1.000 n=10+10) Scan1000Limit10_Cockroach-2 377 ± 0% 378 ± 0% ~ (p=0.370 n=10+10) Scan1000Limit100_Cockroach-2 1.10k ± 0% 1.10k ± 0% ~ (p=0.173 n=9+10) PgbenchQuery_Cockroach-2 4.04k ± 7% 3.91k ± 0% -3.06% (p=0.028 n=10+8) ```
6e7db23
to
fc0bcbc
Compare
@tamird It's not necessarily a bug, it's just that the one-liner adds a few extra instructions and uses |
Why isn't that a bug? Is there any difference in behaviour? Seems like bad codegen to me. |
Filed it upstream golang/go#14718. I'm guessing it will expressed as expected behavior, but I'm curious what the Go team says nevertheless. |
roachpb: Use make and copy pattern for high profile byte slice copies
Benchmarking and examining assembly demonstrated that the following patterns
are not 100% equal when making a deep copy of a byte slice:
It turns out that the first pattern is actually about a constant
(independent of buffer size) 12 ns faster. This is pretty
insignificant and shouldn't matter in almost all cases, but actually makes a
difference for a few key manipulation routines we use extremely heavily
(
PrefixEnd
,Next
, etc.). Interestingly, the first way will actuallyperform worse for a nil byte slice, so this shouldn't be used in general
cases where nil checks aren't already in use.
Perf Impact: