Description
- What version of Go are you using (
go version
)?
go version go1.6.2 windows/amd64 - What operating system and processor architecture are you using (
go env
)?
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\dev\Go
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GO15VENDOREXPERIMENT=1
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1 - What did you do?
Hello, I am writing a POC software that should replace a currently running C application.
The server part loads a SQLITE db3 file in memory (using maps) and then waits for text commands on UDP socket.
The client reads a text file with commands, sends them to the server and waits for a response.
The client takes measurement on server reply time.
My tests were taken on the same machine (localhost) to avoid network delays
While running the client, response time jumps from 3 milliseconds to more than 200 milliseconds
and in the same moment the server displays this line (I am running it with SET GODEBUG=gctrace=1
)
gc 14 @18.400s 2%: 0.50+380+1.0 ms clock, 2.0+294/380/266+4.0 ms cpu, 344->344->182 MB, 359 MB goal, 4 P
I think that 200ms STW is a bit too much and would like to know how to fix this beahvior
I have a dropbox folder with server and client sources, and corresponding SQLITE DB and command text file
(https://www.dropbox.com/sh/e2timsyjifvp61i/AAAmDoKSP7NANjDlsYcGcBvGa?dl=0)
the client must be executed with 2 parameters:
- serverip:port (4420)
- sendername (L1)
My concern is not the throughput but the latency
Our application drives an automated machine that must have the response back in 10ms
Activity
genez commentedon May 26, 2016
Maybe related to this issue: #9477
genez commentedon May 26, 2016
Just changing the key of the maps from string to int drop the GC time to 5 milliseconds
RLH commentedon May 26, 2016
First kudo's for taking end to end latencies instead of just focusing on GC latencies. Did the end to end latency also drop when you changed the map keys?. Could you send up the complete before and after gctrace? How many goroutines does the application have and what is the nature of the goroutines work.
When reading the gctrace line you sent. The 2% is the overall time spend in GC. the 0.50 and the 1.0ms are the STW times. The 380ms is the time spent doing the concurrent mark. During the concurrent mark we have seen the GC starve the mutators (your application) in certain situations and fixed several things in this area for 1.7. You might want to try tip to see if this will be a problem going forward.
genez commentedon May 26, 2016
Sorry but the test with int keys was not real
I was failing to convert to int and then adding every item at key 0
I tried with [20]byte keys but I have the same issue
C:\dev\ETSGO\etsgo>etsgo
gc 1 @0.046s 2%: 1.1+1.8+0.75 ms clock, 2.3+0/1.8/0.80+1.5 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 2 @0.066s 4%: 0+3.0+1.2 ms clock, 0+1.0/1.0/1.0+4.8 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 3 @0.085s 5%: 0+13+1.0 ms clock, 0+3.0/2.0/1.0+4.0 ms cpu, 5->7->6 MB, 6 MB goal, 4 P
gc 4 @0.131s 4%: 0+3.0+0 ms clock, 0+0/0/3.0+0 ms cpu, 7->7->4 MB, 9 MB goal, 4 P
gc 5 @0.171s 3%: 0+7.0+0 ms clock, 0+4.0/3.0/9.0+0 ms cpu, 10->10->9 MB, 11 MB goal, 4 P
gc 6 @0.179s 5%: 0+7.0+1.0 ms clock, 0+3.0/7.0/0+4.3 ms cpu, 13->13->13 MB, 18 MB goal, 4 P
gc 7 @0.304s 5%: 0+19+0 ms clock, 0+19/9.0/10+0 ms cpu, 27->27->18 MB, 28 MB goal, 4 P
gc 8 @0.326s 6%: 0+11+0.50 ms clock, 0+7.5/11/9.0+2.0 ms cpu, 26->26->26 MB, 37 MB goal, 4 P
gc 9 @0.586s 5%: 0+28+0 ms clock, 0+28/15/14+0 ms cpu, 54->54->37 MB, 55 MB goal, 4 P
gc 10 @0.698s 6%: 0+66+1.0 ms clock, 0+0/57/9.0+4.0 ms cpu, 55->56->51 MB, 75 MB goal, 4 P
gc 11 @1.191s 5%: 0+65+1.0 ms clock, 0+65/34/31+4.0 ms cpu, 104->104->75 MB, 105 MB goal, 4 P
gc 12 @1.576s 4%: 0+111+1.0 ms clock, 0+0/4.5/109+4.0 ms cpu, 117->118->72 MB, 150 MB goal, 4 P
gc 13 @2.396s 4%: 0+124+0.50 ms clock, 0+124/57/66+2.0 ms cpu, 173->173->150 MB, 174 MB goal, 4 P
gc 14 @2.532s 6%: 0+178+0.54 ms clock, 0+62/178/0+2.1 ms cpu, 212->214->213 MB, 300 MB goal, 4 P
gc 15 @4.635s 6%: 0.49+316+1.0 ms clock, 1.9+316/172/143+4.0 ms cpu, 437->437->300 MB, 438 MB goal, 4 P
gc 16 @4.978s 8%: 0+553+0.47 ms clock, 0+115/553/0+1.9 ms cpu, 424->430->426 MB, 601 MB goal, 4 P
Ready. Loaded 1002662 serialsgc 17 @6.215s 14%: 0+0+303 ms clock, 0+115/553/0+1215 ms cpu, 452->452->283 MB, 452 MB goal, 4 P (forced)
GC took 304.8374ms
GC forced
gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P
scvg0: inuse: 335, idle: 227, sys: 563, released: 0, consumed: 563 (MB)
genez commentedon May 26, 2016
I am going to try with Go tip tomorrow and let you know the results
genez commentedon May 26, 2016
tested with Go tip
go version devel +2168f2a Thu May 26 16:27:24 2016 +0000 windows/amd64
same results (client side):
min: 0.451400 | avg: 0.873292 | max: 218.594800
I am attaching gctrace file
thanks
gctrace.txt
ianlancetaylor commentedon May 26, 2016
CC @aclements
RLH commentedon May 26, 2016
By client side I'm assuming we are talking about end to end, which is what matters. I'm also assuming the trace is from the server and the numbers are in ms.
From the gctrace I notice this line
gc 17 @5.964s 10%: 0+0+235 ms clock, 0+275/166/108+942 ms cpu, 456->456->283 MB, 456 MB goal, 4 P (forced)
indicating a 235ms STW pause. The "(forced)" annotation means that GC 17 is the result of a call to runtime.GC() which does a STW collection instead of a concurrent GC. Removing the runtime.GC() call from server code will eliminate that STW pause which may be the problem. If the client pauses could be matched with the server GC pauses that would help confirm it is a GC issue. It says "Ready" just before the call to runtime.GC(), what does that mean?
From the trace it also looks like the code spends 5 seconds populating the in-(go)heap database. This can be deduced by looking at the heap sizes which reach 456MB at GC 17 and then the server goes into a steady state where no GC activity happens between 5.96 seconds and 34 .93 seconds.
Let us know if removing the runtime.GC() call helps and if the client pauses are coordinated with the server GC times. Finally if it is easy to calculate 95, 99 and 99.9 percentiles on the client that would also be interesting.
[-]High garbage collection latency (+200ms) with test client/server programs[/-][+]runtime: >200ms GC latency in test programs (calling runtime.GC?)[/+]genez commentedon May 27, 2016
Let me clarify a bit more what's happening at client side:
(map[[20]byte]FullCode)
One map is "global" with all codes, the other maps are a subset (distinct by Type)
objects are structured extactly like this:
type FullCode struct {
Type int
Code string
Status int
Children map[[20]byte]FullCode
}
after startup, my server now performs a manual GC
func timeGC() time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}
At this stage, testclient program starts and begins sending UDP requests to the server.
after a few seconds I can see that cilent detects RTT > 200ms and exactly at the same time the server prints out a GC trace line.
The trace line I am talking about is this
gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P
I can record a video and upload it to youtube to better explain the situation if it's ok for you
aclements commentedon May 27, 2016
This is almost certainly #10345 (which is a generalization of #9477), given the map with 1 million objects. @genez, would you be able to test a small patch to the runtime? There's an easy way to confirm or deny the connection with #10345 if we can tweak the runtime.
dlsniper commentedon May 27, 2016
Hi @RLH and @aclements I've tested this out for a bit yesterday as it came out on slack.
The issue doesn't happen with 200k or 300k items in the maps. As soon as this gets bigger then during the test there will be a pause at some moment when the GC runs automatically.
With lower count of items, the pause varies from 1-2 ms to 25 ms but it's within ok limits.
For me it looks like #9477 .
Since the maps in this case are allocated once and the elements are not deleted then would it could we maybe get a way to track if the state is changed between GC runs so that this doesn't need to do anything during the STW phase? Something like: the GC computes the hash of the map while running the the parallel phase and in the STW phase it compares the hash from the previous one with the current one and if they are different they then it doesn't do anything for it?
genez commentedon May 27, 2016
@aclements yes I can test the patch, just let me know how to apply it
Thanks
39 remaining items