-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: efficiency of collection 1.5.3 vs 1.6 #15068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @RLH @aclements |
It looks like for some reason your application is actually retaining a lot more memory in Go 1.6, since the STW ("forced") GC lines should be showing precise retained memory (there's no floating garbage and STW GC doesn't have any control systems that can get into bad feedback loops). The GC could be incorrectly retaining something, but that seems unlikely, which would mean this isn't a GC problem per se. Could you try using the heap profiler (https://godoc.org/runtime/pprof#Profile) to see what objects are being retained in Go 1.6? |
Any difference if you disable HTTP/2? Set: GODEBUG=http2client=0,http2server=0 And if so, try to narrow it down to either client or server. |
CL https://golang.org/cl/21451 mentions this issue. |
Heap dump attached. |
Here is go1.6 top10 dump:
and go1.5.3 (virtually the same code, just different compiler):
or a bit earlier the same 1.5.3
|
For comparison, heap profile dump with 1.5.3 build |
@bioothod, thanks for the heap dumps. I'm not sure why your top10s are reporting deferproc and _Cfunc_session_get_routes; those don't appear anywhere as allocation sites in the raw heap dumps files. Are you sure you ran pprof with the same binary that generated each profile? From the raw heap dumps, it's clear that the vast majority of your memory in both cases is coming from 10MB allocations at github.com/bioothod/elliptics-go/elliptics.NewReadSeekerKeyOffsetSize (github.com/bioothod/elliptics-go/elliptics/reader.go:89). What's not clear is why there are far more of those sticking around in Go 1.6 than there were in Go 1.5.3. What happens with the ReadSeeker constructed there? @randall77, would it be possible to use the heap dumper to figure out why this allocation is being retained in 1.6 and not in 1.5.3? |
Probably I used different container, I've redone it and made it sure to match binary, top10 indeed shows 1.6
1.5.3
|
By "memory usage", do you mean what's being reported by gctrace or what's being reported by the OS? The profiles are showing only 300MB and 165MB in use, respectively, which should match reasonably closely with what gctrace says. If you still have the raw heap profile files from your new run, could you post them? Are both versions under the same amount of load? |
Attached 1.5.3/1.6 gctrace and heap dumps, clean run for about 2 minutes, the same load. top10 shows about 10 times difference, but OS reported 768Mb (1.5.3) vs 2.9Gb (1.6) 1.6
1.5.3
|
Thanks. top10 is somewhat approximate, but the OS memory load you reported matches pretty closely with the peak heap size in both gctraces (which, in both cases, was a bit before the end of the trace, which also contributes to why top10 is showing a somewhat lower number). Is it possible for me to run this locally to reproduce it (or, at least run some analysis)? |
Here are steps to reproduce problem with synthetic load though
|
How are you monitoring memory usage ? Top will not tell you the truth. You Also, Go cannot tell the operating system to take back virtual address On Wed, Apr 6, 2016 at 12:56 PM, Evgeniy Polyakov [email protected]
|
Of course You may also detect that something is wrong by trying to move your mouse and observe kswapd being crazy, but that's of course not the real truth, since gctrace will tell you something like dumb below, while proxy eats 4+ Gb of resident ram.
As you may expect the real truth told you by heap dump and pprof is a little bit misleading, since go proxy has eaten the memory and hasn't returned it back, at least kswapd and OS performance degrade heavily. It can be that this is some memory used by c++ libraries, but I doubt so since there are virtually no such number of allocations in the c++ library used for 10k requests even if it leaks like hell. Second, subsequent run of the same test eats noticeably less additional memory.
|
@aclements I have a different idea to think about how this leak has been created, but I need your opinion on how to debug this. There will be 2 parts which are both related to leak but from quite different perspectives. The first part is about leak itself, i.e. about memory used by the go proxy. It is unrelated to go version (at least 1.5.1, 1.5.3 and 1.6 behave similar but with different volumes of leaked ram). Containers and test I crafted above lead to substantial (or plain huge) memory consumption because of quite ugly 10m allocation per request. This design choice is out of the scope of this post. I modified proxy to allocate 10m slice at every call to http handler and read data from small file from disk to that 10m area and then return it back to the http client via It looks like go 1.6 introduced something new to the web server (or goroutines scheduling or anything else) which started to work more parallel, which ends up calling http handler more times in parallel and each parallel call allocates 10Mb buffer. I highlight that there is no c++ code involved in this new test, only plain go http handler which does the following (error checks omitted for space saving in the post)
Since amount of memory used by this test is the same as with running my previous c++/elliptics reader test, I suspect that running this new test compiled with go 1.6 will end up eating more memory just like in the previous test. Heap dump, pprof top10 and gctrace show that go Which brings us to the second problem. 1.5
1.6 is quite different - it allocates memory much faster and releases it much slower, I had to interrupt test since it ate 10+Gb of memory and computer started heavily lagging. I show below the whole test duration, i.e. 36 seconds compared to 72+ seconds above where test completed, while it is far from being completed below. Also, 1.6 timings are much worse, but it should be related to the memory consumption and since it became worse, it takes more time to allocate buffer, which in turn ends up longer handling http request.
Basically, the second problem is the fact that after some changes in 1.6 web server started to call http handlers much more frequently in parallel and memory is released by go much slower than 1.5. And to refresh, the first problem is that after go releases memory, i.e. gc magic tells us in gctrace that memory is no longer used, it is not returned to kernel and go proxy continues to occupy all that memory somewhere. This can be glibc allocator cache or it can be some other cache if go uses custom allocator. Anyway, this all ends up with the case when allocating big slice in http handler will lead to OOM kill after a while if number of requests handled in parallel is rather big. 1.6 makes this problem much sharper. I will cook up a much smaller test with this allocation strategy. |
And this can be that there were no changes which ended up with more parallel http handlers, but it is all related to the fact, that go 1.6 releases memory much slower, so it looks like more handlers allocate more memory in parallel. Please notice that in 1.6 gctrace above go stopped automatic GC invocation at some point, there are only handcrafted |
Fun fact is that when using smaller concurrency on client, go eats/leaks less memory, looks like allocated memory is reused. If this memory resides in glibc cache (does go use glibc allocator, can I change it to something else by exporting LD_PRELOAD pointing to binary with different malloc/free symbols like tcmalloc/jemalloc?), this could partially explain the problem (allocation part, i.e. the first problem above), yet without solution. I will cook up boost::asio test with the same allocation strategy to check whether it is system problem, but I do believe that this test will release memory to kernel. |
Timed out. |
@bioothod, to answer some of your questions: Go has its own memory manager from top to bottom (it doesn't use glibc at all, and garbage collection is so tightly integrated with allocation that it's probably impossible to build a good garbage collector on top of glibc malloc). The numbers reported by gctrace (and the heap profiler, etc) and by top are different because they are answering different questions, not because either of them is wrong. In particular, while Go's memory manager does return memory to the system, there's a few minutes of hysteresis built in to that mechanism (because there's overhead to returning memory to the system and then asking for it back), so the numbers in top should roughly match the peak memory usage over the last five minutes or so. There's also a difference between heap memory and total memory, since there are non-heap allocations such as stacks and runtime-internal memory (which should be small, but #15319 suggests they may not always be as small as they're supposed to be). If you can create a small Go program that reproduces the issue, we may have a handle on debugging this (though it may have to wait for 1.8). Right now there's simply too much going on in this issue to effectively debug it. |
go version
)?go 1.5.3/1.6 linux amd64
go env
)?Trusty
I have an HTTPS proxy which uses cgo to work with a storage system.
If the app is built using 1.5.3 memory usage is much lower than when go1.6 is used.
gtraces were colelcted for both cases and are provided below. In both of them
runtime.GC()
is called every second. Evengocheck=2
does not lead to crash.1.5.3:
1.6:
Similar memory usage is expected, as the app works in the same conditions under the same load.
Memory usage differs about 100 times
The text was updated successfully, but these errors were encountered: