Description
What version of Go are you using (go version
)?
$ go1.13 version go version go1.13.1 darwin/amd64 $ go-tip version go version devel +f6c624a22a Wed Oct 16 15:58:33 2019 +0000 darwin/amd64 $ go1.12 version go version go1.12.10 darwin/amd64 $ go1.11 version go version go1.11.13 darwin/amd64
Does this issue reproduce with the latest release?
Yes: The test hangs with Go 1.13. The test panics with tip.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go1.13 env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GONOPROXY="*" GONOSUMDB="*" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/Users/rhys/go/version/go1.13" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/Users/rhys/go/version/go1.13/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build216795429=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
Using an http.Client
with an http.Transport
configured to use http/2 and with MaxConnsPerHost = 1
, I started a large number of concurrent http/2 requests: more than the server would allow on a single TCP connection.
In the test environment, I did this by having the server delay its responses until I knew a large number of requests were either active on the connection or waiting for permission to use the connection, at which point I allowed the http.Handler
to respond.
What did you expect to see?
I expected that every request I started would eventually be seen by the server, and that the client would receive the server's response.
What did you see instead?
In Go 1.11 where the default behavior is to use a single TCP connection for each http/2 authority, the requests queue up and once I allow the handler to return they are all processed. That's fine in this context.
In Go 1.12 where the http.Transport is correctly able to use more than one TCP connection for each http/2 authority (but doesn't yet know how to use MaxConnsPerHost
), it passes all of the requests on to the server immediately. That's also fine in this context.
In Go 1.13, the server only sees the first 249 requests. The others appear stuck indefinitely. I see that as a bug.
In tip, the Transport panics. That's a second bug.
$ go1.13 test -v ./repro_test.go
=== RUN TestMaxConns
2019/10/16 14:27:43 http: TLS handshake error from 127.0.0.1:50300: remote error: tls: bad certificate
--- FAIL: TestMaxConns (1.15s)
repro_test.go:82: starts=249 finishes=0
repro_test.go:82: starts=249 finishes=0
repro_test.go:82: starts=249 finishes=0
repro_test.go:82: starts=249 finishes=0
repro_test.go:82: starts=249 finishes=0
repro_test.go:82: starts=249 finishes=249
repro_test.go:82: starts=249 finishes=249
repro_test.go:82: starts=249 finishes=249
repro_test.go:82: starts=249 finishes=249
repro_test.go:82: starts=249 finishes=249
repro_test.go:86: HTTP/2 requests started: 249 != 300
repro_test.go:89: HTTP/2 requests completed: 249 != 300
FAIL
FAIL command-line-arguments 2.287s
FAIL
$ go-tip test -v ./repro_test.go
=== RUN TestMaxConns
2019/10/16 14:27:57 http: TLS handshake error from 127.0.0.1:50603: remote error: tls: bad certificate
panic: net/http: internal error: connCount underflow
goroutine 371 [running]:
net/http.(*Transport).decConnsPerHost(0xc0000d63c0, 0x0, 0x0, 0xc0000c6540, 0x5, 0xc0000160d0, 0xf, 0x0)
/usr/local/go/src/net/http/transport.go:1318 +0x604
net/http.(*Transport).roundTrip(0xc0000d63c0, 0xc00024f200, 0x0, 0xc000626ac0, 0x100e518)
/usr/local/go/src/net/http/transport.go:542 +0x762
net/http.(*Transport).RoundTrip(0xc0000d63c0, 0xc00024f200, 0xc0000d63c0, 0x0, 0x0)
/usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc00024f200, 0x13bd4c0, 0xc0000d63c0, 0x0, 0x0, 0x0, 0xc000010338, 0x241ef00, 0x1, 0x0)
/usr/local/go/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0, 0xc000010338, 0x0, 0x1, 0x20300000000000)
/usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/client.go:692 +0x3c8
net/http.(*Client).Do(...)
/usr/local/go/src/net/http/client.go:560
command-line-arguments.TestMaxConns.func2(0xc00040ca70, 0xc00009e2f8, 0xc00040e700, 0xc000590468, 0xc000086300)
/tmp/repro_test.go:69 +0x1fb
created by command-line-arguments.TestMaxConns
/tmp/repro_test.go:63 +0x451
FAIL command-line-arguments 1.250s
FAIL
And for context, the passing results with Go 1.11 and Go 1.12:
$ go1.11 test -v ./repro_test.go
=== RUN TestMaxConns
2019/10/16 14:27:50 http: TLS handshake error from 127.0.0.1:50307: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.15s)
repro_test.go:82: starts=155 finishes=0
repro_test.go:82: starts=250 finishes=0
repro_test.go:82: starts=250 finishes=0
repro_test.go:82: starts=250 finishes=0
repro_test.go:82: starts=250 finishes=0
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
PASS
ok command-line-arguments 2.281s
$ go1.12 test -v ./repro_test.go
=== RUN TestMaxConns
2019/10/16 14:27:35 http: TLS handshake error from 127.0.0.1:49994: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.17s)
repro_test.go:82: starts=84 finishes=0
repro_test.go:82: starts=174 finishes=0
repro_test.go:82: starts=263 finishes=0
repro_test.go:82: starts=300 finishes=0
repro_test.go:82: starts=300 finishes=0
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
repro_test.go:82: starts=300 finishes=300
PASS
ok command-line-arguments 3.505s
package repro
import (
"context"
"crypto/tls"
"net/http"
"net/http/httptest"
"sync"
"sync/atomic"
"testing"
"time"
)
func TestMaxConns(t *testing.T) {
totalRequests := 300
allow := make(chan struct{})
var (
starts int64
finishes int64
)
h := func(w http.ResponseWriter, r *http.Request) {
if !r.ProtoAtLeast(2, 0) {
t.Errorf("Request is not http/2: %q", r.Proto)
return
}
atomic.AddInt64(&starts, 1)
<-allow
}
s := httptest.NewUnstartedServer(http.HandlerFunc(h))
s.TLS = &tls.Config{
NextProtos: []string{"h2"},
}
s.StartTLS()
defer s.Close()
transport := s.Client().Transport.(*http.Transport)
clientConfig := transport.TLSClientConfig
transport.TLSClientConfig = nil
// make a request to trigger HTTP/2 autoconfiguration
resp, err := s.Client().Get(s.URL)
if err == nil {
resp.Body.Close()
}
// now allow the client to connect to the ad-hoc test server
transport.TLSClientConfig.RootCAs = clientConfig.RootCAs
transport.MaxConnsPerHost = 1
ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
defer cancel()
var wg sync.WaitGroup
for i := 0; i < totalRequests; i++ {
req, err := http.NewRequest("GET", s.URL, nil)
if err != nil {
t.Fatalf("NewRequest: %s", err)
}
wg.Add(1)
go func() {
defer wg.Done()
defer atomic.AddInt64(&finishes, 1)
ctx, cancel := context.WithCancel(ctx)
defer cancel()
req = req.WithContext(ctx)
resp, err := s.Client().Do(req)
if err != nil {
return
}
resp.Body.Close()
}()
}
for i := 0; i < 10; i++ {
if i == 5 {
close(allow)
}
time.Sleep(100 * time.Millisecond)
t.Logf("starts=%d finishes=%d", atomic.LoadInt64(&starts), atomic.LoadInt64(&finishes))
}
if have, want := atomic.LoadInt64(&starts), int64(totalRequests); have != want {
t.Errorf("HTTP/2 requests started: %d != %d", have, want)
}
if have, want := atomic.LoadInt64(&finishes), int64(totalRequests); have != want {
t.Errorf("HTTP/2 requests completed: %d != %d", have, want)
}
}
Activity
julieqiu commentedon Oct 18, 2019
/cc @bradfitz
bradfitz commentedon Oct 18, 2019
@fraenkel, another MaxConnsPerHost issue. (sorry)
(The other is #34978).
fraenkel commentedon Oct 19, 2019
As written, the test case as a data race.
I changed the test case slightly:
ForceAttemptHTTP2
to simplifiy the setupThe failure does still occur but I can get a few successful runs.
There is obviously some book keeping issue.
It is the same for 1.13.3 and tip (46aa835)
fraenkel commentedon Oct 19, 2019
A fix is coming shortly. We cannot blindly decrement the conn count. We need to only decrement if we have removed the idle connection.
fraenkel commentedon Oct 19, 2019
Bah. A simple tweak to our existing test for MaxConnsPerHost has uncovered yet another issue.
Just tweak the loop to 300 and tip will hit the same issue reported. My fix will fix it but now I have
fraenkel commentedon Oct 19, 2019
All I have determined at this point is that the http2 side starts sending back
http.http2noCachedConnError
after some time. The test passes when it doesn't occur which is not very often.gopherbot commentedon Oct 19, 2019
Change https://golang.org/cl/202087 mentions this issue:
net/http: only decrement connection count if we removed a connection
fraenkel commentedon Oct 19, 2019
Part of the issue is the connection coordination between http and http2. When using 300 clients, the first 50+ compete to create the connection on the http side before the http2 side is aware of it. There is some glitch (still investigating) when the http2 side is aware, it actually causes a second connection.
By changing the test case to first do a single client request, the failure rate is greatly reduced but the problem still occurs. However, it shows that we will always reuse the first connection but may dial/tls handshake a second.
I don't think this will ever be perfect but I would at least like to better understand what is causing the second connection. We may need to relax the test case slightly.
fraenkel commentedon Oct 19, 2019
The http2 server imposes a limit of 250 concurrent streams. Once we reach that number which does happen when we stampede with 300 requests, there is a point where a new connection is created.
mchtech commentedon Nov 18, 2019
There is still a certain chance to crash with "internal error: connCount underflow" on golang 1.13.4 when go client enables HTTP2.
The "connCount underflow" error did not appeared when using golang 1.13.3fraenkel commentedon Nov 18, 2019
Can you also provide the stack trace? And the potential scenario if different than reported.
A new issue would be best.
18 remaining items