runtime: performance degradation on tip on high core count machines

### Go version

go version go1.22.4 linux/amd64

### Output of `go env` in your module/workspace:

```shell
GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go-repos/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go-repos:/opt/go/path:/home/user/go-code'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go/src/github.com/golang/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go/src/github.com/golang/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/user/go/src/github.com/uber-go/zap/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1740765925=/tmp/go-build -gno-record-gcc-switches'
```


### What did you do?

We have been doing some performance testing of Go tip at Uber in preparation for Go 1.23.

### What did you see happen?

We have noticed degradation in linux machines with a lot of cores (96) in all of Zap’s Field logging benchmark tests of around 8%. These benchmarks look something like [this](https://github.com/uber-go/zap/blob/master/logger_bench_test.go#L75):

```go
logger := New(
	zapcore.NewCore(
		zapcore.NewJSONEncoder(NewProductionConfig().EncoderConfig),
		&ztest.Discarder{}, // No actual i/o, logs get discarded.
		DebugLevel,
	),
)
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
	log.Info("Boolean.", Bool("foo", true))
}
```
We don’t have an isolated linux environment available to us, so these results are susceptible to a slight noisy neighbor problem, but we have consistently seen some amount of degradation on these benchmarks:
```
$ go version
go version go1.22.4 linux/amd64
$ go test -bench Field -run nounittests -count 25 . | tee go1224.log
$ ~/go/src/github.com/golang/go4/bin/go version
go version devel go1.23-93bbf719a6 Wed Jun 5 17:30:16 2024 +0000 linux/amd64
$ go test -bench Field -run nounittests -count 25 . | tee 93bbf719a6.log
$ benchstat go1224.log 93bbf719a6.log

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ go1.22.4.log │           93bbf719a6.log            │
                   │    sec/op    │   sec/op     vs base                │
BoolField-96          110.6n ± 6%   127.9n ± 8%  +15.64% (p=0.001 n=25)
ByteStringField-96    139.7n ± 1%   149.9n ± 2%   +7.30% (p=0.000 n=25)
Float64Field-96       112.2n ± 4%   125.2n ± 4%  +11.59% (p=0.000 n=25)
IntField-96           108.7n ± 3%   116.2n ± 2%   +6.90% (p=0.000 n=25)
Int64Field-96         105.9n ± 4%   113.2n ± 2%   +6.89% (p=0.009 n=25)
StringField-96        104.4n ± 2%   115.4n ± 4%  +10.54% (p=0.000 n=25)
StringerField-96      105.4n ± 3%   115.5n ± 4%   +9.58% (p=0.000 n=25)
TimeField-96          109.6n ± 2%   117.4n ± 2%   +7.12% (p=0.000 n=25)
DurationField-96      111.6n ± 3%   121.9n ± 3%   +9.23% (p=0.000 n=25)
ErrorField-96         108.4n ± 2%   115.7n ± 4%   +6.73% (p=0.000 n=25)
ErrorsField-96        184.1n ± 2%   205.1n ± 4%  +11.41% (p=0.000 n=25)
StackField-96         713.0n ± 3%   813.3n ± 3%  +14.07% (p=0.000 n=25)
ObjectField-96        117.2n ± 2%   130.9n ± 3%  +11.69% (p=0.000 n=25)
ReflectField-96       317.6n ± 2%   346.0n ± 3%   +8.94% (p=0.000 n=25)
10Fields-96           584.7n ± 2%   622.4n ± 4%   +6.45% (p=0.000 n=25)
100Fields-96          5.919µ ± 3%   5.630µ ± 5%        ~ (p=0.073 n=25)
geomean               196.5n        213.4n        +8.61%
```
We fiddled with GOMAXPROCS a bit and noticed the degradation is definitely related to parallelism.
![Screenshot 2024-06-06 at 2 02 53 PM](https://github.com/golang/go/assets/11602410/c8225da3-7a21-4149-927c-e23daa289b64)

We didn’t see a whole lot in CPU profiles other than a general increase of about 2-4% of samples taken in the runtime package.

We were able to use git bisect to identify [e995aa95cb5f379c1df5d5511ee09970261d877f](https://github.com/golang/go/commit/e995aa95cb5f379c1df5d5511ee09970261d877f) as one cause. Specifically, the added calls to nanotime() seem to cause degradation in these highly parallelized benchmarks. However, this commit alone does not seem to account for the entire degradation:

```
$ ~/go/src/github.com/golang/go3/bin/go version
go version devel go1.23-e995aa95cb Mon Apr 8 21:43:16 2024 +0000 linux/amd64
$ ~/go/src/github.com/golang/go3/bin/go test -bench Field -run nounittests -count 25 . | tee e995aa95cb.log
$ benchstat go1224.log e995aa95cb.log
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
                   │ go1.22.4.log │            e995aa95cb.log            │
                   │    sec/op    │    sec/op     vs base                │
BoolField-96          110.6n ± 6%   121.1n ±  6%   +9.49% (p=0.004 n=25)
ByteStringField-96    139.7n ± 1%   145.9n ±  2%   +4.44% (p=0.002 n=25)
Float64Field-96       112.2n ± 4%   121.1n ±  1%   +7.93% (p=0.000 n=25)
IntField-96           108.7n ± 3%   112.5n ±  2%   +3.50% (p=0.009 n=25)
Int64Field-96         105.9n ± 4%   111.4n ±  3%        ~ (p=0.200 n=25)
StringField-96        104.4n ± 2%   111.5n ±  2%   +6.80% (p=0.000 n=25)
StringerField-96      105.4n ± 3%   113.4n ±  3%   +7.59% (p=0.000 n=25)
TimeField-96          109.6n ± 2%   117.6n ±  2%   +7.30% (p=0.000 n=25)
DurationField-96      111.6n ± 3%   116.8n ±  2%   +4.66% (p=0.000 n=25)
ErrorField-96         108.4n ± 2%   113.7n ±  2%   +4.89% (p=0.002 n=25)
ErrorsField-96        184.1n ± 2%   201.7n ±  4%   +9.56% (p=0.000 n=25)
StackField-96         713.0n ± 3%   770.9n ±  2%   +8.12% (p=0.000 n=25)
ObjectField-96        117.2n ± 2%   127.2n ±  3%   +8.53% (p=0.000 n=25)
ReflectField-96       317.6n ± 2%   349.4n ±  5%  +10.01% (p=0.000 n=25)
10Fields-96           584.7n ± 2%   620.5n ±  5%   +6.12% (p=0.005 n=25)
100Fields-96          5.919µ ± 3%   6.046µ ± 25%        ~ (p=0.064 n=25)
geomean               196.5n        209.5n         +6.62%
```
We weren’t able to reliably identify any additional commits beyond this one that accounted for more of the degradation.

Note: this is not a duplicate of #67857, but rather an investigation of different Zap benchmark degradations.

### What did you expect to see?

No practical degradation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime: performance degradation on tip on high core count machines #67858

Go version

Output of `go env` in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

7 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

runtime: performance degradation on tip on high core count machines #67858

Description

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

Activity

ianlancetaylor commented on Jun 7, 2024

mknyszek commented on Jun 7, 2024

JacobOaks commented on Jun 7, 2024

mknyszek commented on Jun 7, 2024

gabyhelp commented on Jun 8, 2024

thanm commented on Jun 12, 2024

JacobOaks commented on Jun 13, 2024

7 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

Output of `go env` in your module/workspace: