Description
What version of Go are you using (go version
)?
$ go version go version go1.12.5 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GOARCH="amd64" GOBIN="" GOCACHE="/Users/spin/Library/Caches/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOOS="darwin" GOPATH="/.../go" GOPROXY="" GORACE="" GOROOT="/.../1.12.5/libexec" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.12.5/libexec/pkg/tool/darwin_amd64" GCCGO="gccgo" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/q5/b4nzm8zj3g5gcjs4rhkpvczh0000gn/T/go-build533400238=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
My company spotted serious performance degradation in one of high load services. Service handles about 100K TCP permanent connections and using complex processing for about 75K partners requests per second. When we tried to upgrade Go from 1.11.6 to 1.12.5 we noted that performance fell by a third. Service begins seriously misbehave after about 15-30 mins after start.
After investigation I found that performance degradation caused by commits 1678b2c5..873bd47d
. Commits 1678b2c5..457c8f4f
causes slight performance degradation. Performance of service builded with 873bd47d
and later is absolutely horrible as well as memory consumption. Also I noted that CPU load from 873bd47d
reduced from 100% to 67%. Last commit with acceptable performance is 6e9fb11b3
.
Very strange benchmark:
func BenchmarkWrite(b *testing.B) {
f, err := os.Create("/dev/null")
if err != nil {
panic(err)
}
defer f.Close()
buf := make([]byte, 1024*100)
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
for i:=0;i<100000;i++ {
f.SetDeadline(time.Now().Add(5*time.Minute))
f.Write(buf)
f.SetDeadline(time.Time{})
runtime.GC()
}
}
})
b.StopTimer()
}
Maybe not of all ops makes sense but the results speak for themselves. Benchmarks are executed with Go built from source 6e9fb11b..873bd47d
. Here is brief results. You can find details in attached archives.
6e9fb11b
(OK):
$ CGO_ENABLED=0 $HOME/go.dev/bin/go test -run=XXX -bench=BenchmarkWrite ./...
goos: linux
goarch: amd64
pkg: ...
BenchmarkWrite-2 1 17839662490 ns/op 93816 B/op 61 allocs/op
PASS
873bd47d
(degraded):
...
goos: linux
goarch: amd64
pkg: ...
BenchmarkWrite-2 1 41872237656 ns/op 6708104 B/op 68744 allocs/op
PASS
Detailed results:
0-6e9fb11b.zip
1-1678b2c5.zip
2-143b13ae.zip
3-457c8f4f.zip
4-873bd47.zip
Referred commits:
What did you expect to see?
Normal performance like in Go 1.11.6.
What did you see instead?
Big performance degradation with service built with Go 1.12+
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
ianlancetaylor commentedon Jun 3, 2019
CC @aclements @mknyszek
mknyszek commentedon Jun 3, 2019
Just to be totally clear, this investigation was done using your service and not the benchmark below, correct?
Can you comment more on memory consumption? Do you mean RSS?
Also, since it's related to the changes that appear to cause the regression, do you run your service with
GODEBUG=gcstoptheworld=1
orGODEBUG=gcstoptheworld=2
?That is a strange benchmark indeed. The many calls to
runtime.GC()
are going to cause this benchmark to run really slowly in the first place. Does this benchmark relate to your service in any way?akaspin commentedon Jun 4, 2019
@mknyszek This strange benchmark is just a quick and dirty try to simulate service load. Of course there are no
runtime.GC()
calls in service. This benchmark used with Git bisect to find suspicious commit. After this service was built with corresponding Go commits and deployed. Results of deployment with873bd47d
confirmed that "new" GC is absolutely ineffective under high load.gcstoptheworld
is not used. There are no notes about "new" GC in documentation. Also docs aboutgcstoptheworld
is not changed. In our company we never usedgcstoptheworld
with any Go version.I want to clarify again: right now service built with Go 1.11 works absolutely well without any
GODEBUG
options.mknyszek commentedon Jun 12, 2019
Thanks for the information, and belated thanks for all the detail and the reproducer. My apologies for how long it's taken to get back to you.
I just want to be absolutely clear that
873bd47d
and the commits before it are in fact the source of the problem for your service. Did you try the commit before1678b2c5
(6e9fb11b
) against your service? If you're confident your service is also running just fine with6e9fb11b
, then I'll dig deeper into what could be going wrong with the benchmark.akaspin commentedon Jun 12, 2019
@mknyszek Thanks for response. Yes. As I wrote earlier I made test deployments with commits from
6e9fb11b
to873bd47d
. I can confirm that6e9fb11b
works well.6e9fb11b
works absolutely same as built with Go 1.11.6 (our current production version).873bd47d
gives slight performance degradation.873bd47d
gives 30-40% performance degradation.akaspin commentedon Jun 22, 2019
@mknyszek Just notes which may be useful during investigation (Go 1.11.6):
In service we massively using
sync.Pool
,sync.Map
, large maps e. t. c.Average stats:
alloc: 55GB
,sys: 90GB
.450M
idle: 32GB
,inuse: 52GB
,released: 16GB
,sys: 85GB
mspan: 690MB
,stack: 890MB
10ms
2
2.5%
65GB
60GB
100K
Hope this helps.
jayeshwebcluesglobal commentedon Aug 2, 2019
@akaspin I am doing almost same what you have described above and am facing the same issue. we are doing around the same we are having almost 600 TCP/IP connection connected with another 600 TCP/IP so roughly around we are using 600x600 TCP/IP connection but right now we are having a temporary connection so every-time it will be creating and closing the connection. But it is consuming a huge amount of memory in short period but the Garbage collector is not freeing up in the same time so what do you suggest to have a permanent connection as you have done in you project and di you get the solution for your issue if you have solved it can you please guide us how can we solve this issue. we are using GRPC with protobuf.
akaspin commentedon Aug 2, 2019
@jayeshwebcluesglobal We just fixed Go version on 1.11.10.
jayeshwebcluesglobal commentedon Aug 5, 2019
@akaspin Thanks! it seems we are running on older version(1.9.3), will check issue by upgrading to 1.11.10
jayeshwebcluesglobal commentedon Aug 5, 2019
@akaspin Want to take your view on one thing,
We are making 600*600 TCP connections, should we make it permanent or should close and reopen when we want to make a request to each other ?