Description
I'm seeing a reliable ~3x slowdown with go test -race
on a particular set of tests doing a lot of concurrent operations in several goroutines. I haven't yet spent the time to reduce the test case or to identify the go commit introducing the regression.
Tried with go 1.11:
$ go version; go env
go version go1.11 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/mr/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/mr/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ct/bl4_z3g51ks8239_r2k07v_40000gn/T/go-build102870134=/tmp/go-build -gno-record-gcc-switches -fno-common"
and go tip:
$ gotip version; gotip env
go version devel +7c96f9b527 Fri Aug 31 02:14:24 2018 +0000 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/mr/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/mr/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/mr/gotip/src/github.com/golang/go"
GOTMPDIR=""
GOTOOLDIR="/Users/mr/gotip/src/github.com/golang/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ct/bl4_z3g51ks8239_r2k07v_40000gn/T/go-build924877100=/tmp/go-build -gno-record-gcc-switches -fno-common"
To reproduce:
go get github.com/influxdata/platform
cd $GOPATH/src/github.com/influxdata/platform
git checkout origin/test/taskservice
dep ensure -v -vendor-only
go test -race -count=5 ./task
(That branch will likely get rebased onto master in the near future, so I've got a fork at mark-rushakoff/platform@a8269bc with the commit as it is today.)
I'm consistently seeing times like:
$ go test -race -count=5 ./task
ok github.com/influxdata/platform/task 8.513s
$ gotip test -race -count=5 ./task
ok github.com/influxdata/platform/task 25.863s
For a small test suite that doesn't exercise concurrency, the slowdown is negligible:
$ go test -race -count=5 ./task/options/
ok github.com/influxdata/platform/task/options 1.602s
$ gotip test -race -count=5 ./task/options/
ok github.com/influxdata/platform/task/options 1.655s
I might spend some time bisecting go to identify where the slowdown was introduced, or I might take a look at trace output, but I wanted to file this issue in case anyone else had time and interest to look at it before I do.
Activity
ALTree commentedon Aug 31, 2018
If I'm reading this correctly the regression is in tip but not in 1.11? You have to bisect between 227 commits, which will take 8 steps... at ~2 minutes at step, it's less than 20 minutes. It seems doable to me.
mark-rushakoff commentedon Aug 31, 2018
I'm seeing an even more drastic slowdown without using -race:
mark-rushakoff commentedon Aug 31, 2018
Okay, I bisected to find be10ad7 (
internal/poll: use F_FULLFSYNC fcntl for FD.Fsync on OS X
) was the first "bad" commit.The test suite in question is exercising some concurrent operations backed by a coreos/bbolt database. The commit message details look like the commit was a necessary change:
Maybe this is just a necessary regression for file correctness on Darwin?
Here's the tail of my bisect run, if that's useful information:
/cc @odeke-em @ianlancetaylor
[-]cmd/go: performance regression on `go test -race` at tip[/-][+]cmd/go: performance regression after F_FULLFSYNC fcntl change on Darwin[/+]odeke-em commentedon Sep 2, 2018
Thank you for the investigation and report @mark-rushakoff!
In deed, the regression is present but for consistency and correctness to avoid data corruption, the change seems inevitable.
Could we please compare the speeds and throughput on Linux vs on OS X vs Windows?
mark-rushakoff commentedon Sep 3, 2018
Unfortunately, I don't have access to a Linux or Windows machine to compare results. Fortunately, I've created a repository with a smaller reproduction case so that someone with Linux or Windows can easily help: https://github.com/mark-rushakoff/goissue27415
The case there still depends on coreos/bbolt, which is included in vendor. I tried calling
(*os.File).Sync()
but wasn't able to reproduce the issue (consistent with the comment in the commit); and I tried usingsyscall.Fsync
but also wasn't able to reproduce it, although it's very possible that was user error on my part.The benchmark in the repository is this:
And by comparing
go version go1.11 darwin/amd64
withgo version devel +be10ad7622 Wed Aug 22 22:26:40 2018 +0000 darwin/amd64
, I got the following benchstat output:(Edit: updated code and benchstat for simpler test case)
odeke-em commentedon Sep 3, 2018
At @mark-rushakoff I was able to toggle remote tests/benchmarks on both my Linux PowerEdge server and Macbook Pro. However, as you'll see the numbers of cores vary with results. The results are just to compare the range of tolerance/overhead across the various OSes
Linux
OS X Macbook Pro
Forcibly setting
GOMAXPROCS
to 4 since it is an 8 core machineGOMAXPROCS=4 go test -bench=. -count=10
Results
The results are quite interesting, on OS X even with that change, those writes are apparently ~85% faster than on Linux
Unfortunately I don't have a WIndows machine but I'll kindly page @alexbrainman to help with those benchmarks if available
However, I've also got a state of the art Macbook Pro with Flash storage while my Linux machine has a hard drive -- this might be an explanation for the discrepancy: the comparison of apples to oranges, and since that change doesn't even affect any other OS I think we just need to make a judgement call on whether we can tolerate data corruption for speed.
ianlancetaylor commentedon Sep 3, 2018
On Darwin it seems clearly correct to use
F_FULLFSYNC
, and a reduction in speed of calls to theSync
method is an inevitable result. This sounds like working as intended. If you want to trade off data corruption for speed, don't call theSync
method.mark-rushakoff commentedon Sep 3, 2018
That's fair @ianlancetaylor. I assume this performance change will surprise many other users; maybe this issue should be repurposed to track that the
F_FULLSYNC
change and its consequences are clearly documented in the Go1.12 release notes?alexbrainman commentedon Sep 4, 2018
I did not bother with the benchmarks, because
https://go-review.googlesource.com/c/go/+/130676
did not change any windows code. Please, let me know, if I am wrong.
Alex
ianlancetaylor commentedon Sep 7, 2018
I added a RELNOTES tag to the CL, so we will add something to the release notes. Closing this issue.
odeke-em commentedon Sep 7, 2018
@alexbrainman no, we meant to say just raw performance numbers regardless of the operating system, to compare with what OS X was seeing, for ballparking how OS X would be suffering. Anyways, no need for it anymore :)
Thank you Ian!
7 remaining items