Description
What version of Go are you using (go version
)?
$ go version go1.16.5 darwin/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/amandachow/Library/Caches/go-build" GOENV="/Users/amandachow/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/amandachow/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/amandachow/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GOVCS="" GOVERSION="go1.16.5" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_f/1tl8g2w95wx34b7zp53y0kqc0000gp/T/go-build310981905=/tmp/go-build -gno-record-gcc-switches -fno-common"
What's the issue?
I use httputil.ReverseProxy with http2.Transport. Downstream clients were hitting what looked like connection flakes - Either getting streams closed with RST_STREAM, or receiving an error response from our ErrorHandler. After some investigation, I found that it hit a race condition when a stream is closed on the http2.Server side, and on the http2.Transport side the stream is closed while a DATA frame is being processed. This caused the entire TCP client connection to error and close. It is the same issue described here: https://go.googlesource.com/net/+/6c4ac8bdbf06a105c4baf3dcda28bd9b0fb15588
The stream is getting closed between the clientconn unlock and the bufpipe write. It errors on attempting to write to a closed bufpipe, and as a result closes the entire clientconn, erroring all other open streams.
I tried holding onto the lock for the bufpipe write. I made the following change and ran it into production, and I saw the error stop:
// Check connection-level flow control.
cc.mu.Lock()
+ defer cc.mu.Unlock()
if cs.inflow.available() >= int32(f.Length) {
cs.inflow.take(int32(f.Length))
} else {
- cc.mu.Unlock()
return ConnectionError(ErrCodeFlowControl)
}
// Return any padded flow control now, since we won't
@@ -2246,11 +2246,13 @@ func (rl *clientConnReadLoop) processData(f *DataFrame) error {
cc.bw.Flush()
cc.wmu.Unlock()
}
- cc.mu.Unlock()
if len(data) > 0 && !didReset {
if _, err := cs.bufPipe.Write(data); err != nil {
rl.endStreamError(cs, err)
return err
}