Description
What version of Go are you using (go version
)?
$ go version go version go1.19.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="" GOENV="" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="" GOPRIVATE="" GOPROXY="" GOROOT="/usr/lib/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.19.2" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/tmp/bug/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3528461531=/tmp/go-build -gno-record-gcc-switches"
What did you do?
After mass-upgrading dependencies in a proprietary HTTP + gRPC reverse proxy, I observed unbounded memory growth when serving gRPC streams.
The full data flow is:
gRPC client stream -> gRPC proxy -> gRPC service stream
The implicated component that depends on x/net
and observing the memory leak is the gRPC proxy.
I bisected the behavior to 2e0b12c274b70e5e836cf4bb0cda99c44f61a4d1
from golang.org/x/net
. Prior to this commit, memory usage while serving unidirectional (client -> server) streams remained constant. At this commit and later, memory usage grows consistently as long as the stream is active.
Please see #56315 (comment) for a minimally complete example of the data flow described above that demonstrates the bug.
The client logic is to repeatedly send messages to the server over a unidirectional stream, each about 32 KB in size. The server logic is to read the message and noop (discard the data).
I don't have enough knowledge of the HTTP/2 specification to do any investigation into 2e0b12c274b70e5e836cf4bb0cda99c44f61a4d1 itself, so I don't have an RCA.
What did you expect to see?
I expected to be able to serve the gRPC stream without unbounded memory growth.
What did you see instead?
I observe what appears to be a memory leak regression starting from this commit.
I verified this behavior by watching pmap -x $(pidof ...)
over time for the good/bad commits identified by bisect against the same client logic.
After running the test workload:
Here is an example heap usage graph captured with pprof at version v0.0.0-20220920183852-bf014ff85ad5
.
Here is an example heap usage graph captured with pprof at version v0.0.0-20221019024206-cb67ada4b0ad
.
cc @neild
Activity
neild commentedon Oct 19, 2022
https://go.dev/cl/432038 changes the frequency with which the HTTP/2 server sends window size updates to clients. Previously it would send updates to extend the stream window immediately upon receiving data; now it sends updates only when at least half the current window has been consumed. This reduces the frequency of window updates and should improve performance overall.
Your heap profiles show that your heap is pretty much entirely consumed by HTTP/2 stream buffers, used to hold data being received or sent. Buffers are allocated as needed from a set of pools of varying size classes; the largest class is 16384 bytes.
I'm not sure why this change would lead to more buffers being consumed. I don't see any obvious way for a new memory leak to have been introduced here; perhaps I'm missing something, but I suspect any change is a result of the pattern of data arriving having changed. This change might result in clients sending data in fewer, larger chunks; perhaps this results in more bursty memory usage? Or perhaps clients are sending data faster, and running ahead of the ability of the server to forward it?
Do you configure the HTTP/2 server flow control limits at all? (
Server. MaxUploadBufferPerConnection
andServer. MaxUploadBufferPerStream
inhttp2.Server
.)LINKIWI commentedon Oct 20, 2022
Thanks for taking a look.
I am also not clear on how this change ultimately results in unbounded memory usage. I set up a closed-environment test workload comparing the behavior at both the good and bad commits of
golang.org/x/net
(with the same pattern of data sent by the client), and can reliably reproduce unbounded memory growth with the commit in question.This hypothesis makes sense, though in practice I haven't been observing bursty memory usage. This graph illustrates the unbounded heap usage growth over time from ~15 MB to ~80 GB as measured by
/memory/classes/heap/objects:bytes
from Go's runtime metrics system:(The sudden drop was due to OOM kill)
MaxUploadBufferPerConnection
andMaxUploadBufferPerStream
are not set explicitly, so the defaults are being used.LINKIWI commentedon Oct 20, 2022
I have created a minimally complete and verifiable example workload that demonstrates the issue.
Source tree:
go.mod
:proto/service.proto
:proxy/main.go
:server/main.go
:client/main.go
:In words, this is:
localhost:7000
proxying tolocalhost:7001
.localhost:7001
.localhost:7000
.First, generate gRPC code:
To reproduce the bug:
golang.org/x/net@latest
To verify that this is a regression:
golang.org/x/net@v0.0.0-20220920183852-bf014ff85ad5
ZekeLu commentedon Oct 21, 2022
I think I have found out what's wrong. There is not memory leak. It's just that the proxy writes to the buffer (data from the client to the proxy) too fast and reads from the buffer (data from the proxy to the server) too slow. As indicated by neild@, you should configure the HTTP/2 server flow control limits.
The proxy writes to the buffer too fast thanks to the improvement of golang/net@2e0b12c. To verify this hypothesis:
log.Println("chunks:", len(b.chunks))
after this line https://github.com/golang/net/blob/master/http2/databuffer.go#L91, and run the test, you should find that the number of chunks keeps growing.b.chunks
grows slower or does not grow.1<<13
works for me:LINKIWI commentedon Oct 22, 2022
Thanks for looking into this @ZekeLu. I think you are right. I tested a more restrictive
MaxUploadBufferPerConnection
andMaxUploadBufferPerStream
in the proxy atgolang.org/x/net@v0.0.0-20221019024206-cb67ada4b0ad
and can confirm that the example test workload is no longer causing unbounded memory growth. I guess this was a latent behavior in the proxy that was only exposed by the optimization introduced in2e0b12c274b70e5e836cf4bb0cda99c44f61a4d1
.That said, I think there's a couple remaining open questions:
(1) The example workload described in the issue seems fairly vanilla. Is there an opportunity to perhaps improve the "out-of-the-box" default
http2.Server
behavior to avoid the original problem? For example, does it make sense to change the default value ofMaxUploadBufferPerConnection
andMaxUploadBufferPerStream
to1 << 13
?(2) The original incident described in the memory usage graph above increased total heap allocation size from 15 MB to 80 GB (which exhausted all 80 GB of on-disk swap space on the system in question). It sounds like there should exist some capability to set/tune the maximum
http2.Server
buffer usage across all connections and streams to avoid this potential failure mode.gopherbot commentedon Oct 22, 2022
Change https://go.dev/cl/444816 mentions this issue:
http2: calcualte a correct window increment size for a stream
ZekeLu commentedon Oct 22, 2022
My two cents:
1 << 13
works for most of the applications. I think it's better to leave the decision to the developers. Maybe it should be documented to remind the developers to provide a proper configuration. I'm not sure.MaxUploadBufferPerConnection
andMaxUploadBufferPerStream
affect the buffer indirectly. So this question seems essentially the same as question 1 to me.I found a new issue while investigating this issue. It seems that the increment size for a stream could be wrong in some cases. For example, when the flow control configuration is like this:
The proxy will panic with
http: panic serving 127.0.0.1:44136: internal error; sent too many window updates without decrements?
. I have sent CL 444816 to fix it.http2: calculate a correct window increment size for a stream
gopherbot commentedon Nov 4, 2022
Change https://go.dev/cl/448055 mentions this issue:
Revert "http2: Send WindowUpdates when remaining bytes are below a threshold"
neild commentedon Nov 4, 2022
I've discovered another issue with https://go.dev/cl/432038, which is effectively disabling server pushback on clients. (Sorry, I really should have detected that in review. Mea culpa.)
Sent rollback.
http2: calculate a correct window increment size for a stream
Revert "http2: Send WindowUpdates when remaining bytes are below a th…