Skip to content

roachpb: Use make and copy pattern for high profile byte slice copies #4963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nvanbenschoten
Copy link
Contributor

Benchmarking and examining assembly demonstrated that the following patterns
are not 100% equal when making a deep copy of a byte slice:

d := make([]byte, len(c))
copy(d, c)
d = append([]byte(nil), c...)

It turns out that the first pattern is actually about a constant
(independent of buffer size) 12 ns faster. This is pretty
insignificant and shouldn't matter in almost all cases, but actually makes a
difference for a few key manipulation routines we use extremely heavily
(PrefixEnd, Next, etc.). Interestingly, the first way will actually
perform worse for a nil byte slice, so this shouldn't be used in general
cases where nil checks aren't already in use.

BenchmarkBBytes-4   20000000            59.6 ns/op        32 B/op          1 allocs/op
BenchmarkBString-4  20000000            71.0 ns/op        32 B/op          1 allocs/op
Perf Impact:
name                          old time/op    new time/op    delta
Bank_Cockroach-2                 427µs ± 1%     427µs ± 3%    ~           (p=0.853 n=10+10)
Select1_Cockroach-2             59.9µs ± 2%    60.0µs ± 1%    ~           (p=0.436 n=10+10)
Select2_Cockroach-2             1.03ms ± 1%    1.03ms ± 1%    ~            (p=0.762 n=8+10)
Insert1_Cockroach-2              499µs ± 1%     499µs ± 0%    ~            (p=0.573 n=10+8)
Insert10_Cockroach-2             884µs ± 2%     891µs ± 1%  +0.78%        (p=0.043 n=10+10)
Insert100_Cockroach-2           4.58ms ± 2%    4.60ms ± 1%    ~            (p=0.400 n=10+9)
Insert1000_Cockroach-2          44.0ms ± 1%    44.2ms ± 1%    ~            (p=0.315 n=9+10)
Update1_Cockroach-2              758µs ± 2%     756µs ± 1%    ~           (p=0.631 n=10+10)
Update10_Cockroach-2            1.41ms ± 1%    1.41ms ± 2%    ~            (p=0.356 n=9+10)
Update100_Cockroach-2           7.69ms ± 3%    7.66ms ± 2%    ~            (p=0.497 n=10+9)
Update1000_Cockroach-2          67.8ms ± 1%    68.0ms ± 1%    ~            (p=0.211 n=9+10)
Delete1_Cockroach-2              848µs ± 2%     839µs ± 3%    ~           (p=0.165 n=10+10)
Delete10_Cockroach-2            2.63ms ± 2%    2.60ms ± 1%  -1.07%         (p=0.035 n=10+9)
Delete100_Cockroach-2           21.6ms ± 1%    21.4ms ± 1%  -0.95%         (p=0.001 n=10+9)
Delete1000_Cockroach-2           225ms ± 1%     226ms ± 2%    ~           (p=0.579 n=10+10)
Scan1_Cockroach-2                267µs ± 1%     268µs ± 1%    ~            (p=0.065 n=10+9)
Scan10_Cockroach-2               315µs ± 2%     317µs ± 1%    ~            (p=0.133 n=10+9)
Scan100_Cockroach-2              675µs ± 3%     669µs ± 2%    ~           (p=0.218 n=10+10)
Scan1000_Cockroach-2            4.03ms ± 1%    3.96ms ± 1%  -1.71%        (p=0.000 n=10+10)
Scan10000_Cockroach-2           41.3ms ± 2%    40.9ms ± 1%    ~           (p=0.105 n=10+10)
Scan1000Limit1_Cockroach-2       283µs ± 1%     284µs ± 1%    ~           (p=0.218 n=10+10)
Scan1000Limit10_Cockroach-2      330µs ± 2%     329µs ± 1%    ~           (p=0.912 n=10+10)
Scan1000Limit100_Cockroach-2     692µs ± 3%     683µs ± 2%  -1.38%        (p=0.023 n=10+10)
PgbenchQuery_Cockroach-2        3.97ms ± 4%    3.86ms ± 2%  -2.66%         (p=0.043 n=10+9)

name                          old alloc/op   new alloc/op   delta
Bank_Cockroach-2                68.5kB ± 0%    68.5kB ± 0%    ~           (p=0.239 n=10+10)
Select1_Cockroach-2             3.90kB ± 0%    3.90kB ± 0%    ~           (p=0.132 n=10+10)
Select2_Cockroach-2             80.7kB ± 0%    80.2kB ± 0%  -0.54%          (p=0.000 n=9+7)
Insert1_Cockroach-2             23.0kB ± 0%    23.0kB ± 0%    ~            (p=0.795 n=10+9)
Insert10_Cockroach-2            67.1kB ± 0%    67.1kB ± 0%    ~           (p=0.912 n=10+10)
Insert100_Cockroach-2            508kB ± 0%     508kB ± 0%    ~            (p=0.483 n=10+9)
Insert1000_Cockroach-2          4.57MB ± 1%    4.57MB ± 1%    ~           (p=0.853 n=10+10)
Update1_Cockroach-2             36.8kB ± 0%    36.8kB ± 0%    ~            (p=0.857 n=9+10)
Update10_Cockroach-2             100kB ± 0%     100kB ± 0%    ~           (p=0.631 n=10+10)
Update100_Cockroach-2            710kB ± 1%     706kB ± 0%    ~            (p=0.211 n=10+9)
Update1000_Cockroach-2          6.12MB ± 0%    6.13MB ± 1%    ~            (p=0.780 n=9+10)
Delete1_Cockroach-2             35.9kB ± 0%    35.8kB ± 0%  -0.19%         (p=0.001 n=10+9)
Delete10_Cockroach-2             109kB ± 0%     108kB ± 0%  -0.25%         (p=0.000 n=10+9)
Delete100_Cockroach-2            818kB ± 0%     816kB ± 0%  -0.30%         (p=0.000 n=10+8)
Delete1000_Cockroach-2          7.38MB ± 0%    7.37MB ± 0%  -0.05%          (p=0.000 n=8+8)
Scan1_Cockroach-2               13.3kB ± 1%    13.2kB ± 0%    ~           (p=0.435 n=10+10)
Scan10_Cockroach-2              17.1kB ± 0%    17.0kB ± 0%  -0.62%         (p=0.000 n=9+10)
Scan100_Cockroach-2             49.3kB ± 0%    48.5kB ± 0%  -1.64%         (p=0.000 n=8+10)
Scan1000_Cockroach-2             331kB ± 0%     329kB ± 0%  -0.63%        (p=0.000 n=10+10)
Scan10000_Cockroach-2           5.72MB ± 0%    5.71MB ± 0%  -0.04%         (p=0.000 n=9+10)
Scan1000Limit1_Cockroach-2      14.0kB ± 0%    14.0kB ± 0%    ~            (p=0.986 n=9+10)
Scan1000Limit10_Cockroach-2     17.8kB ± 0%    17.8kB ± 0%  -0.44%         (p=0.000 n=9+10)
Scan1000Limit100_Cockroach-2    50.0kB ± 0%    49.2kB ± 0%  -1.65%         (p=0.000 n=8+10)
PgbenchQuery_Cockroach-2         225kB ±10%     215kB ± 0%  -4.50%         (p=0.009 n=10+8)

name                          old allocs/op  new allocs/op  delta
Bank_Cockroach-2                 1.67k ± 0%     1.67k ± 0%  -0.02%         (p=0.029 n=7+10)
Select1_Cockroach-2               97.0 ± 0%      97.0 ± 0%    ~     (all samples are equal)
Select2_Cockroach-2              2.70k ± 0%     2.70k ± 0%    ~             (p=0.350 n=9+7)
Insert1_Cockroach-2                356 ± 0%       357 ± 0%    ~           (p=0.465 n=10+10)
Insert10_Cockroach-2               906 ± 0%       907 ± 0%    ~           (p=0.491 n=10+10)
Insert100_Cockroach-2            6.18k ± 0%     6.18k ± 0%    ~            (p=0.511 n=10+9)
Insert1000_Cockroach-2           59.0k ± 0%     59.0k ± 0%    ~           (p=0.754 n=10+10)
Update1_Cockroach-2                719 ± 0%       719 ± 0%    ~             (p=0.412 n=9+8)
Update10_Cockroach-2             1.41k ± 0%     1.41k ± 0%    ~           (p=0.871 n=10+10)
Update100_Cockroach-2            7.99k ± 2%     7.95k ± 0%    ~            (p=0.140 n=10+9)
Update1000_Cockroach-2           70.4k ± 0%     70.5k ± 0%    ~            (p=0.556 n=9+10)
Delete1_Cockroach-2                692 ± 0%       692 ± 0%    ~             (p=0.165 n=8+9)
Delete10_Cockroach-2             1.70k ± 0%     1.70k ± 0%    ~            (p=0.968 n=10+9)
Delete100_Cockroach-2            11.4k ± 0%     11.4k ± 0%    ~            (p=0.090 n=10+8)
Delete1000_Cockroach-2            108k ± 0%      108k ± 0%    ~             (p=0.294 n=8+8)
Scan1_Cockroach-2                  276 ± 1%       276 ± 1%    ~           (p=1.000 n=10+10)
Scan10_Cockroach-2                 356 ± 0%       355 ± 0%    ~           (p=1.000 n=10+10)
Scan100_Cockroach-2              1.08k ± 0%     1.08k ± 0%    ~            (p=0.704 n=8+10)
Scan1000_Cockroach-2             8.29k ± 0%     8.29k ± 0%    ~            (p=0.173 n=9+10)
Scan10000_Cockroach-2            80.4k ± 0%     80.4k ± 0%    ~            (p=0.182 n=8+10)
Scan1000Limit1_Cockroach-2         298 ± 1%       298 ± 0%    ~           (p=1.000 n=10+10)
Scan1000Limit10_Cockroach-2        377 ± 0%       378 ± 0%    ~           (p=0.370 n=10+10)
Scan1000Limit100_Cockroach-2     1.10k ± 0%     1.10k ± 0%    ~            (p=0.173 n=9+10)
PgbenchQuery_Cockroach-2         4.04k ± 7%     3.91k ± 0%  -3.06%         (p=0.028 n=10+8)

Review on Reviewable

@nvanbenschoten nvanbenschoten added this to the Beta milestone Mar 8, 2016
@@ -112,11 +112,15 @@ func MakeKey(keys ...[]byte) []byte {
// BytesNext returns the next possible byte by appending an \x00.
func BytesNext(b []byte) []byte {
// TODO(spencer): Do we need to enforce KeyMaxLength here?
return append(append([]byte(nil), b...), 0)
bn := make([]byte, len(b)+1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add a comment referring to the PR so that this isn't accidentally reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@petermattis
Copy link
Collaborator

LGTM

@tamird
Copy link
Contributor

tamird commented Mar 8, 2016

LGTM. Seems like this deserves a bug report upstream, too.

Benchmarking and examining assembly demonstrated that the following patterns
are not 100% equal when making a deep copy of a byte slice:

```
d := make([]byte, len(c))
copy(d, c)
```

```
d = append([]byte(nil), c...)
```

It turns out that the first pattern is actually about a constant
(independent of buffer size) **12 ns** faster. This is pretty
insignificant and shouldn't matter in most cases, but actually makes a
difference for a few key manipulation routines we use extremely heavily
(`PrefixEnd`, `Next`, etc.). Interestingly, the first way will actually
perform worse for a nil byte slice, so this shouldn't be used in general
cases where nil checks weren't already in use.

```
BenchmarkBBytes-4	20000000	        59.6 ns/op	      32 B/op	       1 allocs/op
BenchmarkBString-4	20000000	        71.0 ns/op	      32 B/op	       1 allocs/op
```

```
name                          old time/op    new time/op    delta
Bank_Cockroach-2                 427µs ± 1%     427µs ± 3%    ~           (p=0.853 n=10+10)
Select1_Cockroach-2             59.9µs ± 2%    60.0µs ± 1%    ~           (p=0.436 n=10+10)
Select2_Cockroach-2             1.03ms ± 1%    1.03ms ± 1%    ~            (p=0.762 n=8+10)
Insert1_Cockroach-2              499µs ± 1%     499µs ± 0%    ~            (p=0.573 n=10+8)
Insert10_Cockroach-2             884µs ± 2%     891µs ± 1%  +0.78%        (p=0.043 n=10+10)
Insert100_Cockroach-2           4.58ms ± 2%    4.60ms ± 1%    ~            (p=0.400 n=10+9)
Insert1000_Cockroach-2          44.0ms ± 1%    44.2ms ± 1%    ~            (p=0.315 n=9+10)
Update1_Cockroach-2              758µs ± 2%     756µs ± 1%    ~           (p=0.631 n=10+10)
Update10_Cockroach-2            1.41ms ± 1%    1.41ms ± 2%    ~            (p=0.356 n=9+10)
Update100_Cockroach-2           7.69ms ± 3%    7.66ms ± 2%    ~            (p=0.497 n=10+9)
Update1000_Cockroach-2          67.8ms ± 1%    68.0ms ± 1%    ~            (p=0.211 n=9+10)
Delete1_Cockroach-2              848µs ± 2%     839µs ± 3%    ~           (p=0.165 n=10+10)
Delete10_Cockroach-2            2.63ms ± 2%    2.60ms ± 1%  -1.07%         (p=0.035 n=10+9)
Delete100_Cockroach-2           21.6ms ± 1%    21.4ms ± 1%  -0.95%         (p=0.001 n=10+9)
Delete1000_Cockroach-2           225ms ± 1%     226ms ± 2%    ~           (p=0.579 n=10+10)
Scan1_Cockroach-2                267µs ± 1%     268µs ± 1%    ~            (p=0.065 n=10+9)
Scan10_Cockroach-2               315µs ± 2%     317µs ± 1%    ~            (p=0.133 n=10+9)
Scan100_Cockroach-2              675µs ± 3%     669µs ± 2%    ~           (p=0.218 n=10+10)
Scan1000_Cockroach-2            4.03ms ± 1%    3.96ms ± 1%  -1.71%        (p=0.000 n=10+10)
Scan10000_Cockroach-2           41.3ms ± 2%    40.9ms ± 1%    ~           (p=0.105 n=10+10)
Scan1000Limit1_Cockroach-2       283µs ± 1%     284µs ± 1%    ~           (p=0.218 n=10+10)
Scan1000Limit10_Cockroach-2      330µs ± 2%     329µs ± 1%    ~           (p=0.912 n=10+10)
Scan1000Limit100_Cockroach-2     692µs ± 3%     683µs ± 2%  -1.38%        (p=0.023 n=10+10)
PgbenchQuery_Cockroach-2        3.97ms ± 4%    3.86ms ± 2%  -2.66%         (p=0.043 n=10+9)

name                          old alloc/op   new alloc/op   delta
Bank_Cockroach-2                68.5kB ± 0%    68.5kB ± 0%    ~           (p=0.239 n=10+10)
Select1_Cockroach-2             3.90kB ± 0%    3.90kB ± 0%    ~           (p=0.132 n=10+10)
Select2_Cockroach-2             80.7kB ± 0%    80.2kB ± 0%  -0.54%          (p=0.000 n=9+7)
Insert1_Cockroach-2             23.0kB ± 0%    23.0kB ± 0%    ~            (p=0.795 n=10+9)
Insert10_Cockroach-2            67.1kB ± 0%    67.1kB ± 0%    ~           (p=0.912 n=10+10)
Insert100_Cockroach-2            508kB ± 0%     508kB ± 0%    ~            (p=0.483 n=10+9)
Insert1000_Cockroach-2          4.57MB ± 1%    4.57MB ± 1%    ~           (p=0.853 n=10+10)
Update1_Cockroach-2             36.8kB ± 0%    36.8kB ± 0%    ~            (p=0.857 n=9+10)
Update10_Cockroach-2             100kB ± 0%     100kB ± 0%    ~           (p=0.631 n=10+10)
Update100_Cockroach-2            710kB ± 1%     706kB ± 0%    ~            (p=0.211 n=10+9)
Update1000_Cockroach-2          6.12MB ± 0%    6.13MB ± 1%    ~            (p=0.780 n=9+10)
Delete1_Cockroach-2             35.9kB ± 0%    35.8kB ± 0%  -0.19%         (p=0.001 n=10+9)
Delete10_Cockroach-2             109kB ± 0%     108kB ± 0%  -0.25%         (p=0.000 n=10+9)
Delete100_Cockroach-2            818kB ± 0%     816kB ± 0%  -0.30%         (p=0.000 n=10+8)
Delete1000_Cockroach-2          7.38MB ± 0%    7.37MB ± 0%  -0.05%          (p=0.000 n=8+8)
Scan1_Cockroach-2               13.3kB ± 1%    13.2kB ± 0%    ~           (p=0.435 n=10+10)
Scan10_Cockroach-2              17.1kB ± 0%    17.0kB ± 0%  -0.62%         (p=0.000 n=9+10)
Scan100_Cockroach-2             49.3kB ± 0%    48.5kB ± 0%  -1.64%         (p=0.000 n=8+10)
Scan1000_Cockroach-2             331kB ± 0%     329kB ± 0%  -0.63%        (p=0.000 n=10+10)
Scan10000_Cockroach-2           5.72MB ± 0%    5.71MB ± 0%  -0.04%         (p=0.000 n=9+10)
Scan1000Limit1_Cockroach-2      14.0kB ± 0%    14.0kB ± 0%    ~            (p=0.986 n=9+10)
Scan1000Limit10_Cockroach-2     17.8kB ± 0%    17.8kB ± 0%  -0.44%         (p=0.000 n=9+10)
Scan1000Limit100_Cockroach-2    50.0kB ± 0%    49.2kB ± 0%  -1.65%         (p=0.000 n=8+10)
PgbenchQuery_Cockroach-2         225kB ±10%     215kB ± 0%  -4.50%         (p=0.009 n=10+8)

name                          old allocs/op  new allocs/op  delta
Bank_Cockroach-2                 1.67k ± 0%     1.67k ± 0%  -0.02%         (p=0.029 n=7+10)
Select1_Cockroach-2               97.0 ± 0%      97.0 ± 0%    ~     (all samples are equal)
Select2_Cockroach-2              2.70k ± 0%     2.70k ± 0%    ~             (p=0.350 n=9+7)
Insert1_Cockroach-2                356 ± 0%       357 ± 0%    ~           (p=0.465 n=10+10)
Insert10_Cockroach-2               906 ± 0%       907 ± 0%    ~           (p=0.491 n=10+10)
Insert100_Cockroach-2            6.18k ± 0%     6.18k ± 0%    ~            (p=0.511 n=10+9)
Insert1000_Cockroach-2           59.0k ± 0%     59.0k ± 0%    ~           (p=0.754 n=10+10)
Update1_Cockroach-2                719 ± 0%       719 ± 0%    ~             (p=0.412 n=9+8)
Update10_Cockroach-2             1.41k ± 0%     1.41k ± 0%    ~           (p=0.871 n=10+10)
Update100_Cockroach-2            7.99k ± 2%     7.95k ± 0%    ~            (p=0.140 n=10+9)
Update1000_Cockroach-2           70.4k ± 0%     70.5k ± 0%    ~            (p=0.556 n=9+10)
Delete1_Cockroach-2                692 ± 0%       692 ± 0%    ~             (p=0.165 n=8+9)
Delete10_Cockroach-2             1.70k ± 0%     1.70k ± 0%    ~            (p=0.968 n=10+9)
Delete100_Cockroach-2            11.4k ± 0%     11.4k ± 0%    ~            (p=0.090 n=10+8)
Delete1000_Cockroach-2            108k ± 0%      108k ± 0%    ~             (p=0.294 n=8+8)
Scan1_Cockroach-2                  276 ± 1%       276 ± 1%    ~           (p=1.000 n=10+10)
Scan10_Cockroach-2                 356 ± 0%       355 ± 0%    ~           (p=1.000 n=10+10)
Scan100_Cockroach-2              1.08k ± 0%     1.08k ± 0%    ~            (p=0.704 n=8+10)
Scan1000_Cockroach-2             8.29k ± 0%     8.29k ± 0%    ~            (p=0.173 n=9+10)
Scan10000_Cockroach-2            80.4k ± 0%     80.4k ± 0%    ~            (p=0.182 n=8+10)
Scan1000Limit1_Cockroach-2         298 ± 1%       298 ± 0%    ~           (p=1.000 n=10+10)
Scan1000Limit10_Cockroach-2        377 ± 0%       378 ± 0%    ~           (p=0.370 n=10+10)
Scan1000Limit100_Cockroach-2     1.10k ± 0%     1.10k ± 0%    ~            (p=0.173 n=9+10)
PgbenchQuery_Cockroach-2         4.04k ± 7%     3.91k ± 0%  -3.06%         (p=0.028 n=10+8)
```
@nvanbenschoten
Copy link
Contributor Author

@tamird It's not necessarily a bug, it's just that the one-liner adds a few extra instructions and uses runtime.growslice_n instead of runtime.makeslice.

@tamird
Copy link
Contributor

tamird commented Mar 8, 2016

Why isn't that a bug? Is there any difference in behaviour? Seems like bad codegen to me.

@nvanbenschoten
Copy link
Contributor Author

Filed it upstream golang/go#14718. I'm guessing it will expressed as expected behavior, but I'm curious what the Go team says nevertheless.

nvanbenschoten added a commit that referenced this pull request Mar 8, 2016
roachpb: Use make and copy pattern for high profile byte slice copies
@nvanbenschoten nvanbenschoten merged commit b9af601 into cockroachdb:master Mar 8, 2016
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/range branch March 8, 2016 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants