Skip to content

runtime: RSS creeps over 1GB even though heap is 4MB #13552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aclements opened this issue Dec 9, 2015 · 14 comments
Closed

runtime: RSS creeps over 1GB even though heap is 4MB #13552

aclements opened this issue Dec 9, 2015 · 14 comments
Milestone

Comments

@aclements
Copy link
Member

This bug was originally reported by Ilia Kandrashou in https://groups.google.com/d/msg/golang-nuts/AVstzvKAej4/v1F_wreDCAAJ.

The test program at http://play.golang.org/p/zMHtw0-8Ph causes the process' RSS to slowly creep up over time. On a test amd64 machine running Linux 4.1.13-1-lts with GOMAXPROCS=16 at current master (b8b65c1), the RSS reaches 1GB after about 2 hours. According to gctrace=1, the live heap remains under 4MB the whole time (which it should):

...
gc 20460 @8457.750s 0%: 0.23+0+0+2.4+0.64 ms clock, 1.6+0+0+0.67/8.0/0+4.5 ms cpu, 4->4->0 MB, 5 MB goal, 16 P
gc 20461 @8458.117s 0%: 0.15+0+0+1.7+0.78 ms clock, 1.1+0+0+0.95/3.9/0+5.4 ms cpu, 4->4->0 MB, 5 MB goal, 16 P
gc 20462 @8458.477s 0%: 0.17+0+0+1.3+0.67 ms clock, 1.1+0+0+0.46/3.9/0+4.7 ms cpu, 4->4->0 MB, 5 MB goal, 16 P
...

The scavenger trace indicates that we're losing all of this memory in idle spans:

scvg59: inuse: 16, idle: 1018, sys: 1035, released: 0, consumed: 1035 (MB)

In Go 1.5.1, the RSS surpassed 1GB in less than 30 seconds, while the gctrace never reported a live heap over 22MB, so we've at least improved things. I haven't tested Go 1.5.2 yet, but according to the original report, the memory grows slowly but surely.

@aclements aclements self-assigned this Dec 9, 2015
@aclements aclements added this to the Go1.6 milestone Dec 9, 2015
@tuxdna
Copy link

tuxdna commented Dec 9, 2015

The given program in above post works fine on:

$ go version
go version go1.3.3 linux/amd64

However it does become a memory hog on:

$ go version
go version go1.5.2 linux/amd64

If I would want to investigate the root cause, where should I start from?

@aclements
Copy link
Member Author

If I would want to investigate the root cause, where should I start from?

I think the key is figuring out where all of those idle spans are coming from. It appears we're allocating new spans when we should be reusing existing spans, since the live heap never exceeds 5MB (just checked the gctrace), but we have 100's of megabytes worth of spans. Another piece of this puzzle might be figuring out why the scavenger is releasing almost none of these spans back to the OS. This might mean that we're systematically cycling through all of the spans so no individual span has been idle for long enough to be scavenged.

@aclements
Copy link
Member Author

b8b65c1.txt is the gctrace of this program running for about 1 hour and 45 minutes. Its RSS reached ~750MB. The live heap (the second heap size in the gctrace line) never exceeded 5 MB (checked with grep -v -- '->[45]->' log.master). Curiously, the scavenger released a total of only 8MB the whole time.

@tuxdna
Copy link

tuxdna commented Dec 9, 2015

On reading this code:

https://github.com/golang/go/blob/master/src/runtime/mheap.go#L872

It could be that actually stack is using more memory than heap. In the test program ( repeated below ) we are only allocating channels:

package main

const lenth = 1 << 16 //32kB
type array [lenth]byte
var defaultArray array

func main() {
    // start workers
    for i := 0; i < 200; i++ {
        go testloop()
    }

    // sleep forever
    sleep := make(chan struct{})
    <-sleep
}

func testloop() {
    for {
        c := make(chan struct{})
        go test(c)
        _ = <-c
    }
}

func test(c chan struct{}) {
    a := defaultArray
    a[0] = a[lenth-1]
    c <- struct{}{}
}

My analysis could be totally wrong.

@aclements
Copy link
Member Author

I believe you're right that the channel and stacks are the only allocations. It turns out that a, despite being rather large, is small enough to be stack-allocated.

However, there's also not much stack in use. I ran it with Go 1.5.1, let the heap reach 3.8GB (which took a matter of seconds) and here's runtime.memstats:

{alloc = 0, total_alloc = 0, sys = 0, nlookup = 0, nmalloc = 0, nfree = 0, heap_alloc = 0, 
  heap_sys = 3812622336, heap_idle = 3778732032, heap_inuse = 33890304, heap_released = 0, 
  heap_objects = 0, stacks_inuse = 23560192, stacks_sys = 0, mspan_inuse = 0, 
  mspan_sys = 5718016, mcache_inuse = 0, mcache_sys = 16384, buckhash_sys = 1442838, 
  gc_sys = 119151616, other_sys = 5080074, next_gc = 4194304, last_gc = 1449702135103864578, 
  pause_total_ns = 1178124667, pause_ns = {...}, pause_end = {...}, numgc = 16, 
  gc_cpu_fraction = 0.05263619482552593, enablegc = true, debuggc = false, by_size = {...}, tinyallocs = 0, heap_live = 1847312, 
  heap_scan = 195904, heap_marked = 1847312, heap_reachable = 265520}

Some of these fields are out of date because they're cached in various places, but heap_sys and heap_idle are many gigabytes, while stacks_inuse (which is updated eagerly) is only 23MB. It may be that these spans were allocated for stack use, but we're not holding on to them for stack use.

@aclements
Copy link
Member Author

I suspect the rapid memory explosion in Go 1.5.1 is a consequence of #11466. Because a in test is large, it pushes test's stack over the "small" limit, which means that, currently, when that goroutine exits, that stack isn't available for reuse until the next GC cycle finishes. If the GC falls behind just a little, it can quickly rack up these stacks, since there's also currently no GC back-pressure on stack allocations (unlike regular allocations).

I suspect this is better on tip simply because GC scheduling is a lot tighter, so the window for falling behind is much smaller. However, this doesn't explain why there's a slow RSS growth on tip over the course of thousands of GCs, since each GC cycle should clean up all of these stacks and make their spans available for reuse and the allocations following that should simply reuse them.

@aclements
Copy link
Member Author

I've confirmed my hypothesis about Go 1.5.1. I set ulimit -v to 1GB and let it crash and the npages across all of the to-be-freed stacks in stackFreeQueue totals to 967MB. So #11466 explains the rapid memory consumption in Go 1.5.1. But it doesn't yet explain the slow growth on tip.

/cc @RLH. This is an interesting case where delaying the freeing of stacks is really an issue.

@tuxdna
Copy link

tuxdna commented Dec 10, 2015

@aclements Thanks for your pointers, very helpful for me to understand what is going on underneath.

I was wondering where we set the frequency of GC cycles, either explicitly via tooling / APIs, or implicitly in Go compiler's code.

@aclements
Copy link
Member Author

@tuxdna, the scheduling of GC cycles and GC work is done dynamically by a control system in the runtime: https://golang.org/s/go15gcpacing. For much of the implementation, see gcControllerState and where the runtime sets memstats.next_gc.

@aclements
Copy link
Member Author

But it doesn't yet explain the slow growth on tip.

It turns out there's simply a long, low-probability tail on how much memory is tied up in unfreed large stack spans. That means if you run the program long enough, you'll eventually have a single GC cycle that ties up over a GB in stack spans (if you were sufficiently unlucky, I don't think there's anything stopping this from happening early in an execution).

I added a debug print to report how much memory in large stack spans is being freed during mark termination and here's what the distribution looked like after just two minutes:

N 691  sum 45753  mean 66.2127  gmean 54.7903  std dev 51.492  variance 2651.43

     min 10
   1%ile 17
   5%ile 27
  25%ile 38
  median 49
  75%ile 71
  95%ile 180.7
  99%ile 267.767
     max 400

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠲⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡖ 0.026
⠀⠀⠀⠀⠀⠀⠀⠀⢀⠞⠁⠀⠀⠉⠳⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇
⠠⠤⠤⠤⠤⠤⠴⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠉⠓⠒⠒⠒⠒⠒⠢⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠄⠧ 0.000
⠈⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠁
   0        50         100       150       200        250       300

@randall77
Copy link
Contributor

Couldn't we allow stackalloc to allocate from stackFreeQueue? If we kept
the free queue bucketed by size it would be easy to reuse the stacks and
not depend on GC to free them before reallocation.

On Mon, Dec 14, 2015 at 11:41 AM, Austin Clements [email protected]
wrote:

But it doesn't yet explain the slow growth on tip.

It turns out there's simply a long, low-probability tail on how much
memory is tied up in unfreed large stack spans. That means if you run the
program long enough, you'll eventually have a single GC cycle that ties up
over a GB in stack spans (if you were sufficiently unlucky, I don't think
there's anything stopping this from happening early in an execution).

I added a debug print to report how much memory in large stack spans is
being freed during mark termination and here's what the distribution looked
like after just two minutes:

N 691 sum 45753 mean 66.2127 gmean 54.7903 std dev 51.492 variance 2651.43

 min 10

1%ile 17
5%ile 27
25%ile 38
median 49
75%ile 71
95%ile 180.7
99%ile 267.767
max 400

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠲⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡖ 0.026
⠀⠀⠀⠀⠀⠀⠀⠀⢀⠞⠁⠀⠀⠉⠳⣄⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇
⠠⠤⠤⠤⠤⠤⠴⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠉⠓⠒⠒⠒⠒⠒⠢⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠄⠧ 0.000
⠈⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠙⠉⠉⠉⠁
0 50 100 150 200 250 300


Reply to this email directly or view it on GitHub
#13552 (comment).

@aclements
Copy link
Member Author

Couldn't we allow stackalloc to allocate from stackFreeQueue? If we kept
the free queue bucketed by size it would be easy to reuse the stacks and
not depend on GC to free them before reallocation.

I've done exactly that, actually. I'm trying (and failing) to write a test and I'll send out a CL.

@randall77
Copy link
Contributor

Great minds think alike.

On Mon, Dec 14, 2015 at 12:32 PM, Austin Clements [email protected]
wrote:

Couldn't we allow stackalloc to allocate from stackFreeQueue? If we kept
the free queue bucketed by size it would be easy to reuse the stacks and
not depend on GC to free them before reallocation.

I've done exactly that, actually. I'm trying (and failing) to write a test
and I'll send out a CL.


Reply to this email directly or view it on GitHub
#13552 (comment).

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17814 mentions this issue.

@golang golang locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants