Closed
Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go version devel +dfad3f8 Tue Apr 5 16:10:33 2016 +0300 linux/amd64
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GORACE=""
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build040466407=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
What did you do?
I run an http server serving more than a million concurrent TCP keep-alive connections on a system with 32 CPU cores. Each connection has read and write deadlines set via net.Conn.SetReadDeadline
and net.Conn.SetWriteDeadline
.
What did you expect to see?
Server performance should scale with GOMAXPROCS
up to the number of CPU cores.
What did you see instead?
addtimer
anddeltimer
functions fromruntime/time.go
are at the top of CPU profile.iowait
reaches 20% due to memory ping-pong across CPU cores insidesiftupTimer
andsiftdownTimer
functions inruntime/time.go
.
@dvyukov , could you look into this?
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
dvyukov commentedon Apr 6, 2016
What did you see instead? What is performance depending on number of cores?
I don't understand relation between iowait and memory ping-pong? iowait if waiting for IO, like hard drive, memory accesses are not IO.
How exactly does profile look?
Would it be possible to change it to SetDeadline? It would setup 1 timer instead of 2.
dvyukov commentedon Apr 6, 2016
We probably can merge this into #6239
valyala commentedon Apr 6, 2016
iowait
andsystem
CPU shares grow with increasing the number of CPU coresI'm not an expert in
.
iowait
, but the fact is thatiowait
completely vanishes from 20% to 0% when deadlines onnet.TCPConn
connections are disabled. See the following image:Connection deadlines are enabled during 15:43-15:56
Connection deadlines are disabled starting from 15:56
Here is CPU profile for the process with deadlines enabled:
Here is CPU profile for the app with connection deadlines disabled:
No, this won't work, since read and write deadlines are configured independently in our application.
valyala commentedon Apr 6, 2016
FYI, this is not an OS scalability problem, since the CPU usage graph above until 15:33 corresponds to 'app process per CPU core' mode of the app with enabled connection deadlines. We had to switch to this mode due to the issue with timer scalability in Go runtime.
dvyukov commentedon Apr 6, 2016
Thanks for the detained info!
This confirms that the main issue is the global timers mutex. siftdown/up consume ~3% of time. If we distribute timers, then siftdown/up should become faster as well (smaller per-P heap + better locality).
Unfortunately this is not trivial to do. There is #6239 for this.
Sounds pretty bad. @rsc @aclements
Server optimization: reduce the number of SetReadDeadline/SetWriteDea…
Server optimization: reduce the number of SetReadDeadline/SetWriteDea…
Client optimization: reduce the number of SetReadDeadline/SetWriteDea…
do not use SetReadDeadline/SetWriteDeadline refers to golang/go#15133
gopherbot commentedon Jan 6, 2017
CL https://golang.org/cl/34784 mentions this issue.
valyala commentedon Jan 10, 2017
The CL fixes timers scalability issue for us. Simple repro:
Hammer the following http server with more than 100K concurrent http keep-alive connections and send serveral requests per second on each such a connection. The server must run on multi-core machine with GOMAXPROCS=NumCPU.
CPU load should look like

iowait
shareOptimized connection SetDeadline.