Skip to content

x/net/http2: transport race in closing stream while processing DATA frame, kills connection. #47076

Open
@amandachow

Description

@amandachow

What version of Go are you using (go version)?

$ go version
go1.16.5 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/amandachow/Library/Caches/go-build"
GOENV="/Users/amandachow/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/amandachow/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/amandachow/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_f/1tl8g2w95wx34b7zp53y0kqc0000gp/T/go-build310981905=/tmp/go-build -gno-record-gcc-switches -fno-common"

What's the issue?

I use httputil.ReverseProxy with http2.Transport. Downstream clients were hitting what looked like connection flakes - Either getting streams closed with RST_STREAM, or receiving an error response from our ErrorHandler. After some investigation, I found that it hit a race condition when a stream is closed on the http2.Server side, and on the http2.Transport side the stream is closed while a DATA frame is being processed. This caused the entire TCP client connection to error and close. It is the same issue described here: https://go.googlesource.com/net/+/6c4ac8bdbf06a105c4baf3dcda28bd9b0fb15588

The stream is getting closed between the clientconn unlock and the bufpipe write. It errors on attempting to write to a closed bufpipe, and as a result closes the entire clientconn, erroring all other open streams.

I tried holding onto the lock for the bufpipe write. I made the following change and ran it into production, and I saw the error stop:

                // Check connection-level flow control.
                cc.mu.Lock()
+               defer cc.mu.Unlock()
                if cs.inflow.available() >= int32(f.Length) {
                        cs.inflow.take(int32(f.Length))
                } else {
-                       cc.mu.Unlock()
                        return ConnectionError(ErrCodeFlowControl)
                }
                // Return any padded flow control now, since we won't
@@ -2246,11 +2246,13 @@ func (rl *clientConnReadLoop) processData(f *DataFrame) error {
                        cc.bw.Flush()
                        cc.wmu.Unlock()
                }
-               cc.mu.Unlock()
                if len(data) > 0 && !didReset {
                        if _, err := cs.bufPipe.Write(data); err != nil {
                                rl.endStreamError(cs, err)
                                return err
                        }

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions