Skip to content

runtime: >200ms GC latency in test programs (calling runtime.GC?) #15847

Closed
@genez

Description

@genez
  1. What version of Go are you using (go version)?
    go version go1.6.2 windows/amd64
  2. What operating system and processor architecture are you using (go env)?
    set GOARCH=amd64
    set GOBIN=
    set GOEXE=.exe
    set GOHOSTARCH=amd64
    set GOHOSTOS=windows
    set GOOS=windows
    set GOPATH=C:\dev\Go
    set GORACE=
    set GOROOT=C:\Go
    set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
    set GO15VENDOREXPERIMENT=1
    set CC=gcc
    set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
    set CXX=g++
    set CGO_ENABLED=1
  3. What did you do?
    Hello, I am writing a POC software that should replace a currently running C application.
    The server part loads a SQLITE db3 file in memory (using maps) and then waits for text commands on UDP socket.

The client reads a text file with commands, sends them to the server and waits for a response.
The client takes measurement on server reply time.

My tests were taken on the same machine (localhost) to avoid network delays

While running the client, response time jumps from 3 milliseconds to more than 200 milliseconds
and in the same moment the server displays this line (I am running it with SET GODEBUG=gctrace=1)

gc 14 @18.400s 2%: 0.50+380+1.0 ms clock, 2.0+294/380/266+4.0 ms cpu, 344->344->182 MB, 359 MB goal, 4 P

I think that 200ms STW is a bit too much and would like to know how to fix this beahvior
I have a dropbox folder with server and client sources, and corresponding SQLITE DB and command text file
(https://www.dropbox.com/sh/e2timsyjifvp61i/AAAmDoKSP7NANjDlsYcGcBvGa?dl=0)

the client must be executed with 2 parameters:

  • serverip:port (4420)
  • sendername (L1)

My concern is not the throughput but the latency
Our application drives an automated machine that must have the response back in 10ms

Activity

genez

genez commented on May 26, 2016

@genez
Author

Maybe related to this issue: #9477

genez

genez commented on May 26, 2016

@genez
Author

Just changing the key of the maps from string to int drop the GC time to 5 milliseconds

RLH

RLH commented on May 26, 2016

@RLH
Contributor

First kudo's for taking end to end latencies instead of just focusing on GC latencies. Did the end to end latency also drop when you changed the map keys?. Could you send up the complete before and after gctrace? How many goroutines does the application have and what is the nature of the goroutines work.

When reading the gctrace line you sent. The 2% is the overall time spend in GC. the 0.50 and the 1.0ms are the STW times. The 380ms is the time spent doing the concurrent mark. During the concurrent mark we have seen the GC starve the mutators (your application) in certain situations and fixed several things in this area for 1.7. You might want to try tip to see if this will be a problem going forward.

genez

genez commented on May 26, 2016

@genez
Author

Sorry but the test with int keys was not real
I was failing to convert to int and then adding every item at key 0

I tried with [20]byte keys but I have the same issue

C:\dev\ETSGO\etsgo>etsgo
gc 1 @0.046s 2%: 1.1+1.8+0.75 ms clock, 2.3+0/1.8/0.80+1.5 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 2 @0.066s 4%: 0+3.0+1.2 ms clock, 0+1.0/1.0/1.0+4.8 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc 3 @0.085s 5%: 0+13+1.0 ms clock, 0+3.0/2.0/1.0+4.0 ms cpu, 5->7->6 MB, 6 MB goal, 4 P
gc 4 @0.131s 4%: 0+3.0+0 ms clock, 0+0/0/3.0+0 ms cpu, 7->7->4 MB, 9 MB goal, 4 P
gc 5 @0.171s 3%: 0+7.0+0 ms clock, 0+4.0/3.0/9.0+0 ms cpu, 10->10->9 MB, 11 MB goal, 4 P
gc 6 @0.179s 5%: 0+7.0+1.0 ms clock, 0+3.0/7.0/0+4.3 ms cpu, 13->13->13 MB, 18 MB goal, 4 P
gc 7 @0.304s 5%: 0+19+0 ms clock, 0+19/9.0/10+0 ms cpu, 27->27->18 MB, 28 MB goal, 4 P
gc 8 @0.326s 6%: 0+11+0.50 ms clock, 0+7.5/11/9.0+2.0 ms cpu, 26->26->26 MB, 37 MB goal, 4 P
gc 9 @0.586s 5%: 0+28+0 ms clock, 0+28/15/14+0 ms cpu, 54->54->37 MB, 55 MB goal, 4 P
gc 10 @0.698s 6%: 0+66+1.0 ms clock, 0+0/57/9.0+4.0 ms cpu, 55->56->51 MB, 75 MB goal, 4 P
gc 11 @1.191s 5%: 0+65+1.0 ms clock, 0+65/34/31+4.0 ms cpu, 104->104->75 MB, 105 MB goal, 4 P
gc 12 @1.576s 4%: 0+111+1.0 ms clock, 0+0/4.5/109+4.0 ms cpu, 117->118->72 MB, 150 MB goal, 4 P
gc 13 @2.396s 4%: 0+124+0.50 ms clock, 0+124/57/66+2.0 ms cpu, 173->173->150 MB, 174 MB goal, 4 P
gc 14 @2.532s 6%: 0+178+0.54 ms clock, 0+62/178/0+2.1 ms cpu, 212->214->213 MB, 300 MB goal, 4 P
gc 15 @4.635s 6%: 0.49+316+1.0 ms clock, 1.9+316/172/143+4.0 ms cpu, 437->437->300 MB, 438 MB goal, 4 P
gc 16 @4.978s 8%: 0+553+0.47 ms clock, 0+115/553/0+1.9 ms cpu, 424->430->426 MB, 601 MB goal, 4 P
Ready. Loaded 1002662 serialsgc 17 @6.215s 14%: 0+0+303 ms clock, 0+115/553/0+1215 ms cpu, 452->452->283 MB, 452 MB goal, 4 P (forced)
GC took 304.8374ms
GC forced
gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P
scvg0: inuse: 335, idle: 227, sys: 563, released: 0, consumed: 563 (MB)

genez

genez commented on May 26, 2016

@genez
Author

I am going to try with Go tip tomorrow and let you know the results

genez

genez commented on May 26, 2016

@genez
Author

tested with Go tip

go version devel +2168f2a Thu May 26 16:27:24 2016 +0000 windows/amd64

same results (client side):
min: 0.451400 | avg: 0.873292 | max: 218.594800

I am attaching gctrace file

thanks
gctrace.txt

added this to the Go1.7Maybe milestone on May 26, 2016
ianlancetaylor

ianlancetaylor commented on May 26, 2016

@ianlancetaylor
Contributor
RLH

RLH commented on May 26, 2016

@RLH
Contributor

By client side I'm assuming we are talking about end to end, which is what matters. I'm also assuming the trace is from the server and the numbers are in ms.

From the gctrace I notice this line

gc 17 @5.964s 10%: 0+0+235 ms clock, 0+275/166/108+942 ms cpu, 456->456->283 MB, 456 MB goal, 4 P (forced)

indicating a 235ms STW pause. The "(forced)" annotation means that GC 17 is the result of a call to runtime.GC() which does a STW collection instead of a concurrent GC. Removing the runtime.GC() call from server code will eliminate that STW pause which may be the problem. If the client pauses could be matched with the server GC pauses that would help confirm it is a GC issue. It says "Ready" just before the call to runtime.GC(), what does that mean?

From the trace it also looks like the code spends 5 seconds populating the in-(go)heap database. This can be deduced by looking at the heap sizes which reach 456MB at GC 17 and then the server goes into a steady state where no GC activity happens between 5.96 seconds and 34 .93 seconds.

Let us know if removing the runtime.GC() call helps and if the client pauses are coordinated with the server GC times. Finally if it is easy to calculate 95, 99 and 99.9 percentiles on the client that would also be interesting.

changed the title [-]High garbage collection latency (+200ms) with test client/server programs[/-] [+]runtime: >200ms GC latency in test programs (calling runtime.GC?)[/+] on May 27, 2016
genez

genez commented on May 27, 2016

@genez
Author

Let me clarify a bit more what's happening at client side:

  • testclient is run with GOGC=off just to make sure that the delay is not related to client GC
  • server is started with GOGC=100 and GODEBUG=gctrace=1.
  • when server starts up, it reads 1 million "objects" from a SQLITE db into 5 distinct maps
    (map[[20]byte]FullCode)
    One map is "global" with all codes, the other maps are a subset (distinct by Type)

objects are structured extactly like this:

type FullCode struct {
Type int
Code string
Status int
Children map[[20]byte]FullCode
}

after startup, my server now performs a manual GC

func timeGC() time.Duration {
start := time.Now()
runtime.GC()
return time.Since(start)
}

At this stage, testclient program starts and begins sending UDP requests to the server.
after a few seconds I can see that cilent detects RTT > 200ms and exactly at the same time the server prints out a GC trace line.

The trace line I am talking about is this

gc 18 @126.525s 0%: 0+525+0.53 ms clock, 0+0/525/3.0+2.1 ms cpu, 283->283->283 MB, 566 MB goal, 4 P

I can record a video and upload it to youtube to better explain the situation if it's ok for you

aclements

aclements commented on May 27, 2016

@aclements
Member

This is almost certainly #10345 (which is a generalization of #9477), given the map with 1 million objects. @genez, would you be able to test a small patch to the runtime? There's an easy way to confirm or deny the connection with #10345 if we can tweak the runtime.

dlsniper

dlsniper commented on May 27, 2016

@dlsniper
Contributor

Hi @RLH and @aclements I've tested this out for a bit yesterday as it came out on slack.
The issue doesn't happen with 200k or 300k items in the maps. As soon as this gets bigger then during the test there will be a pause at some moment when the GC runs automatically.
With lower count of items, the pause varies from 1-2 ms to 25 ms but it's within ok limits.
For me it looks like #9477 .

Since the maps in this case are allocated once and the elements are not deleted then would it could we maybe get a way to track if the state is changed between GC runs so that this doesn't need to do anything during the STW phase? Something like: the GC computes the hash of the map while running the the parallel phase and in the STW phase it compares the hash from the previous one with the current one and if they are different they then it doesn't do anything for it?

genez

genez commented on May 27, 2016

@genez
Author

@aclements yes I can test the patch, just let me know how to apply it

Thanks

39 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @quentinmit@genez@dlsniper@RLH@gianpaolog

        Issue actions

          runtime: >200ms GC latency in test programs (calling runtime.GC?) · Issue #15847 · golang/go