x/net/http2: random read deadlock

Hi!

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version go1.13.1 linux/amd64
</pre>
(seen from at least `go1.10.4`)

### Does this issue reproduce with the latest release?

Currently testing with 1.14.3

### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/xfennec/.cache/go-build"
GOENV="/home/xfennec/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xfennec/Quiris/Go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build988333864=/tmp/go-build -gno-record-gcc-switches"
</pre></details>

### What did you do?

We use a chain of two HTTP proxies written in Go to route incoming traffic to our virtual machines:
<pre>
          incoming traffic
                 |
                 v
          +-------------+
          |   primary   |
          +--+-------+--+
             |       |
        +----+       +----+   HTTP/2
        |                 |
        v                 v
+---------------+    +---------------+
| secondary (1) |    | secondary (2) |
+---+---+---+---+    +---+---+---+---+
    |   |   |            |   |   |
    v   v   v            v   v   v

             VMs, HTTP/1.1
</pre>

### What did you expect to see?

Stable service over time.

### What did you see instead?

Every ~10 days, we detect a total stall of requests between our frontal ("primary") proxy and **one** of the secondary proxies. New requests are accepted, sent to the secondary, then to the VM, but the response is stalled somewhere between the secondary and the primary proxy.

All request to this secondary proxy will then wait forever (since all requests are multiplexed inside the same HTTP2 connection)

Restarting one of the proxies, or even killing the "faulty" HTTP2 connection (using `ss -K`) between our proxies makes the service work perfectly again.

We've had this issue for months now, and I still fail to see any common pattern (time of the day, server load, requests, …) that would be the trigger. I've tried to write a few test/repro, without success either.

Here's a sample of the stack on the **primary** proxy during one of the outages:
<pre>
goroutine 72163481 [sync.Cond.Wait, 5 minutes]:
runtime.goparkunlock(...)
        /usr/lib/golang/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc001b55300, 0x0)
        /usr/lib/golang/src/runtime/sema.go:510 +0xf8
sync.(*Cond).Wait(0xc001b552f0)
        /usr/lib/golang/src/sync/cond.go:56 +0x9d
net/http.(*http2pipe).Read(0xc001b552e8, 0xc0004e0000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/h2_bundle.go:3512 +0xa6
net/http.http2transportResponseBody.Read(0xc001b552c0, 0xc0004e0000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/h2_bundle.go:8448 +0xac
net/http/httputil.(*ReverseProxy).copyBuffer(0xc00014f810, 0x7f4cf39cc878, 0xc0000100a0, 0x7f4cf3a120f8, 0xc001b552c0, 0xc0004e0000, 0x8000, 0x8000, 0xc0006876d0, 0x920ca887, ...)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:408 +0xa7
net/http/httputil.(*ReverseProxy).copyResponse(0xc00014f810, 0x7f4cf39cc878, 0xc0000100a0, 0x7f4cf3a120f8, 0xc001b552c0, 0x0, 0x0, 0x0)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:396 +0xb5
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc00014f810, 0x859e00, 0xc0000100a0, 0xc00111fc00)
        /usr/lib/golang/src/net/http/httputil/reverseproxy.go:289 +0x7e2
main.(*ProxyServer).serveReverseProxy(0xc00006f1a0, 0xc00148e680, 0x7d6772, 0x5, 0x859e00, 0xc0000100a0, 0xc00111fc00, 0xc001f07100)
        /home/xfennec/Quiris/Go/src/github.com/OnitiFR/mulch/cmd/mulch-proxy/proxy.go:208 +0x277
main.(*ProxyServer).handleRequest(0xc00006f1a0, 0x859e00, 0xc0000100a0, 0xc00111fc00)
        /home/xfennec/Quiris/Go/src/github.com/OnitiFR/mulch/cmd/mulch-proxy/proxy.go:287 +0x608
net/http.HandlerFunc.ServeHTTP(0xc0000d98b0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2007 +0x44
net/http.(*ServeMux).ServeHTTP(0xc0000124c0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2387 +0x1bd
net/http.serverHandler.ServeHTTP(0xc0001460e0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:2802 +0xa4
net/http.initNPNRequest.ServeHTTP(0x85a8c0, 0xc0018b72f0, 0xc000a38380, 0xc0001460e0, 0x859e00, 0xc0000100a0, 0xc00111ac00)
        /usr/lib/golang/src/net/http/server.go:3366 +0x8d
net/http.(*http2serverConn).runHandler(0xc0008b9500, 0xc0000100a0, 0xc00111ac00, 0xc0003f52c0)
        /usr/lib/golang/src/net/http/h2_bundle.go:5713 +0x9f
created by net/http.(*http2serverConn).processHeaders
        /usr/lib/golang/src/net/http/h2_bundle.go:5447 +0x4eb
</pre>

At the same time, on the **secondary** proxy, I see this:
<pre>
goroutine 32086483 [select, 5 minutes]:
net/http.(*http2serverConn).writeDataFromHandler(0xc0024f7e00, 0xc0002b8d10, 0xc0007b2000, 0x53, 0x1000, 0xc000d4fc01, 0x1d, 0x6779ed)
        /usr/lib/golang/src/net/http/h2_bundle.go:4577 +0x23f
net/http.(*http2responseWriterState).writeChunk(0xc001188380, 0xc0007b2000, 0x53, 0x1000, 0xc00170fdb8, 0x1, 0xc000bf0340)
        /usr/lib/golang/src/net/http/h2_bundle.go:6043 +0x475
net/http.http2chunkWriter.Write(0xc001188380, 0xc0007b2000, 0x53, 0x1000, 0xc00144c420, 0x855e20, 0xc0003e0c80)
        /usr/lib/golang/src/net/http/h2_bundle.go:5926 +0x49
bufio.(*Writer).Flush(0xc000bf1340, 0x0, 0x0)
        /usr/lib/golang/src/bufio/bufio.go:593 +0x75
net/http.(*http2responseWriter).Flush(0xc001582088)
        /usr/lib/golang/src/net/http/h2_bundle.go:6122 +0x42
net/http.(*http2responseWriter).handlerDone(0xc001582088)
        /usr/lib/golang/src/net/http/h2_bundle.go:6258 +0x46
net/http.(*http2serverConn).runHandler.func1(0xc001582088, 0xc00170ff67, 0xc0024f7e00)
        /usr/lib/golang/src/net/http/h2_bundle.go:5711 +0x2bf
net/http.(*http2serverConn).runHandler(0xc0024f7e00, 0xc001582088, 0xc0020c2a00, 0xc0003509e0)
        /usr/lib/golang/src/net/http/h2_bundle.go:5715 +0xa5
created by net/http.(*http2serverConn).processHeaders
        /usr/lib/golang/src/net/http/h2_bundle.go:5447 +0x4eb
</pre>

- how can I make this bug report more helpful?
- any idea for a sample code to replicate this issue without waiting 10 days each time?
- as a workaround, what would be the timeout/deadline setting at `Tranport` level to mitigate this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

x/net/http2: random read deadlock #39812

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

x/net/http2: random read deadlock #39812

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?