Skip to content

x/net/http2: random read deadlock #39812

Closed
@Xfennec

Description

@Xfennec

Hi!

What version of Go are you using (go version)?

$ go version
go version go1.13.1 linux/amd64

(seen from at least go1.10.4)

Does this issue reproduce with the latest release?

Currently testing with 1.14.3

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/xfennec/.cache/go-build"
GOENV="/home/xfennec/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xfennec/Quiris/Go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build988333864=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We use a chain of two HTTP proxies written in Go to route incoming traffic to our virtual machines:

          incoming traffic
                 |
                 v
          +-------------+
          |   primary   |
          +--+-------+--+
             |       |
        +----+       +----+   HTTP/2
        |                 |
        v                 v
+---------------+    +---------------+
| secondary (1) |    | secondary (2) |
+---+---+---+---+    +---+---+---+---+
    |   |   |            |   |   |
    v   v   v            v   v   v

             VMs, HTTP/1.1

What did you expect to see?

Stable service over time.

What did you see instead?

Every ~10 days, we detect a total stall of requests between our frontal ("primary") proxy and one of the secondary proxies. New requests are accepted, sent to the secondary, then to the VM, but the response is stalled somewhere between the secondary and the primary proxy.

All request to this secondary proxy will then wait forever (since all requests are multiplexed inside the same HTTP2 connection)

Restarting one of the proxies, or even killing the "faulty" HTTP2 connection (using ss -K) between our proxies makes the service work perfectly again.

We've had this issue for months now, and I still fail to see any common pattern (time of the day, server load, requests, …) that would be the trigger. I've tried to write a few test/repro, without success either.

Here's a sample of the stack on the primary proxy during one of the outages:

goroutine 72163481 [sync.Cond.Wait, 5 minutes]:
runtime.goparkunlock(...)
        /usr/lib/golang/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc001b55300, 0x0)
        /usr/lib/golang/src/runtime/sema.go:510 +0xf8
sync.(*Cond).Wait(0xc001b552f0)
        /usr/lib/golang/src/sync/cond.go:56 +0x9d
net/http.(*http2pipe).Read(0xc001b552e8, 0xc0004e0000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/h2_bundle.go:3512 +0xa6
net/http.http2transportResponseBody.Read(0xc001b552c0, 0xc0004e0000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/h2_bundle.go:8448 +0xac
net/http/httputil.(*ReverseProxy).copyBuffer(0xc00014f810, 0x7f4cf39cc878, 0xc0000100a0, 0x7f4cf3a120f8, 0xc001b552c0, 0xc0004e0000, 0x8000, 0x8000, 0xc0006876d0, 0x920ca887, ...)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:408 +0xa7
net/http/httputil.(*ReverseProxy).copyResponse(0xc00014f810, 0x7f4cf39cc878, 0xc0000100a0, 0x7f4cf3a120f8, 0xc001b552c0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:396 +0xb5
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc00014f810, 0x859e00, 0xc0000100a0, 0xc00111fc00)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:289 +0x7e2
main.(*ProxyServer).serveReverseProxy(0xc00006f1a0, 0xc00148e680, 0x7d6772, 0x5, 0x859e00, 0xc0000100a0, 0xc00111fc00, 0xc001f07100)
        /home/xfennec/Quiris/Go/src/github.com/OnitiFR/mulch/cmd/mulch-proxy/proxy.go:208 +0x277
main.(*ProxyServer).handleRequest(0xc00006f1a0, 0x859e00, 0xc0000100a0, 0xc00111fc00)
        /home/xfennec/Quiris/Go/src/github.com/OnitiFR/mulch/cmd/mulch-proxy/proxy.go:287 +0x608
net/http.HandlerFunc.ServeHTTP(0xc0000d98b0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2007 +0x44
net/http.(*ServeMux).ServeHTTP(0xc0000124c0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2387 +0x1bd
net/http.serverHandler.ServeHTTP(0xc0001460e0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2802 +0xa4
net/http.initNPNRequest.ServeHTTP(0x85a8c0, 0xc0018b72f0, 0xc000a38380, 0xc0001460e0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:3366 +0x8d
net/http.(*http2serverConn).runHandler(0xc0008b9500, 0xc0000100a0, 0xc00111ac00, 0xc0003f52c0)
        /usr/lib/golang/src/net/http/h2_bundle.go:5713 +0x9f
created by net/http.(*http2serverConn).processHeaders
        /usr/lib/golang/src/net/http/h2_bundle.go:5447 +0x4eb

At the same time, on the secondary proxy, I see this:

goroutine 32086483 [select, 5 minutes]:
net/http.(*http2serverConn).writeDataFromHandler(0xc0024f7e00, 0xc0002b8d10, 0xc0007b2000, 0x53, 0x1000, 0xc000d4fc01, 0x1d, 0x6779ed)
        /usr/lib/golang/src/net/http/h2_bundle.go:4577 +0x23f
net/http.(*http2responseWriterState).writeChunk(0xc001188380, 0xc0007b2000, 0x53, 0x1000, 0xc00170fdb8, 0x1, 0xc000bf0340)
        /usr/lib/golang/src/net/http/h2_bundle.go:6043 +0x475
net/http.http2chunkWriter.Write(0xc001188380, 0xc0007b2000, 0x53, 0x1000, 0xc00144c420, 0x855e20, 0xc0003e0c80)
        /usr/lib/golang/src/net/http/h2_bundle.go:5926 +0x49
bufio.(*Writer).Flush(0xc000bf1340, 0x0, 0x0)
        /usr/lib/golang/src/bufio/bufio.go:593 +0x75
net/http.(*http2responseWriter).Flush(0xc001582088)
        /usr/lib/golang/src/net/http/h2_bundle.go:6122 +0x42
net/http.(*http2responseWriter).handlerDone(0xc001582088)
        /usr/lib/golang/src/net/http/h2_bundle.go:6258 +0x46
net/http.(*http2serverConn).runHandler.func1(0xc001582088, 0xc00170ff67, 0xc0024f7e00)
        /usr/lib/golang/src/net/http/h2_bundle.go:5711 +0x2bf
net/http.(*http2serverConn).runHandler(0xc0024f7e00, 0xc001582088, 0xc0020c2a00, 0xc0003509e0)
        /usr/lib/golang/src/net/http/h2_bundle.go:5715 +0xa5
created by net/http.(*http2serverConn).processHeaders
        /usr/lib/golang/src/net/http/h2_bundle.go:5447 +0x4eb
  • how can I make this bug report more helpful?
  • any idea for a sample code to replicate this issue without waiting 10 days each time?
  • as a workaround, what would be the timeout/deadline setting at Tranport level to mitigate this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeWaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions