Skip to content

runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases #35848

Open
@savalin

Description

@savalin

What version of Go are you using (go version)?

$ go version
go version go1.12.13 darwin/amd64
go version go1.13.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes. Tested on

go version go1.4-bootstrap-20170531 darwin/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/qwe/Library/Caches/go-build"
GOENV="/Users/qwe/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="none"
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/qwe/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qv/37vc1dv526n0drc9jljj4v240000gp/T/go-build366955304=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Original issue description is posted at gonum library repo gonum/gonum#1174

In a few words: we have a huge graph in memory and we need to update it periodically.
After a new graph initialization, we switch pointer to the new graph and wait for GC to free unused memory.
But it never reuses all the HeapIdle and asks OS for new pages increasing HeapSys infinitely.

Base self-contained demo: https://github.com/savalin/example/blob/master/main.go.
It's simplified code but even this one shows some memory management strangeness.

What did you expect to see?

HeapIdle re-usage instead of asking OS for new memory.

What did you see instead?

HeapIdle/HeapSys grow infinitely (until OMM-killer wake up and kill the app)

Currently, we downgraded golang to v1.11.13 as a workaround, but we hope to be able to use new golang releases

Activity

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Nov 26, 2019
dmitshur

dmitshur commented on Nov 26, 2019

@dmitshur
Member

/cc @aclements @mknyszek

(Please let me know if there are other people who are better to ping about an issue like this.)

mknyszek

mknyszek commented on Nov 26, 2019

@mknyszek
Contributor

@savalin FWIW, I can't reproduce this at tip. HeapSys stays nice and low. This leads me to an interesting conclusion. Go 1.12 introduced the concept of treating memory separately if it's returned to the OS or if it isn't. This policy was carried through into Go 1.13, but in Go 1.14, we no longer do this because it would be expensive and cumbersome to do so.

The reason this decision was made in the first place was to accommodate the more aggressive mechanism for returning memory to the OS in Go 1.13 (#30333). The decision was deemed a good idea because jemalloc and other allocators follow a similar rule for returning memory to the OS. It seems like for whatever reason this decision is causing fragmentation which really hurts this application.

Anyway, it shouldn't matter for future releases. This bug is effectively fixed for the Go 1.14 release. Please give the tip of the master branch a try and see if you get the same results.

Finally, this made me realize that debug.FreeOSMemory no longer works as it's supposed to in Go 1.14, so I'll open up an issue and fix that.

CC @aclements, since this is the first consequence we're seeing of that original decision.

Please continue reading if you'd like to see some additional notes I wrote while sussing this out.

  • HeapSys is expected to never decrease since we never unmap memory in the runtime. Instead we call madvise which increases HeapReleased.
  • In Go 1.13 we added a more aggressive scavenging mechanism (runtime: smarter scavenging #30333) which should keep HeapSys - HeapReleased closer to your actual working set, not accounting for certain accounts of fragmentation.
  • I noticed in the other thread also that you tried GODEBUG=madvdontneed=1, but here you specify darwin/amd64; note that that option only has any effect on Linux, where it'll properly drop RSS counters at the cost of additional page faults (but the OS lets you overcommit, it'll just kick out MADV_FREE'd pages when it needs it). We don't do what drops RSS counters on darwin/amd64 because it's non-trivial to do, though perhaps we could now that we rely on libSystem.
  • Looking at your example, HeapIdle == HeapReleased after you call debug.FreeOSMemory, which is exactly what that does. I'm not sure why your OOM killer is triggering because HeapSys is just virtual memory. Do you have ulimit -v set? Perhaps this is a Darwin overcommit thing we need to work around.
  • There was temporarily some issues in Go 1.12/1.13 around fragmentation causing HeapSys to keep getting larger (runtime: treap implementation of find() doesn't return the best-fit span #31616) because of a bug. We also noticed that there was a problem in Go 1.12 if we decided to prefer unscavenged memory over scavenged memory in the allocator, so we decided to take the best between the two instead, which resolved the issue.
changed the title [-]HeapIdle is increasing infinitely on big data volume allocations[/-] [+]runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases[/+] on Nov 26, 2019
mknyszek

mknyszek commented on Nov 26, 2019

@mknyszek
Contributor

@savalin I will close this once you're able to verify that this is fixed for you. If you're still experiencing problems, we'll move on from there.

gopherbot

gopherbot commented on Nov 26, 2019

@gopherbot
self-assigned this
on Nov 27, 2019
removed
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Nov 27, 2019
added this to the Backlog milestone on Nov 27, 2019
mknyszek

mknyszek commented on Dec 2, 2019

@mknyszek
Contributor

@savalin My mistake. This doesn't reproduce at tip as long as you put a runtime.GC() call at the very top of the for loop (when all the heap memory is actually dead).

I can get it to increase the same as in #35890. What @randall77 mentioned in that one is pretty much what I'm seeing here: it caps out at around 20 GiB.

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @cagedmantis@savalin@mknyszek@dmitshur@gopherbot

        Issue actions

          runtime: being unable to allocate across scavenged and unscavenged memory leads to fragmentation blow-up in some cases · Issue #35848 · golang/go