Skip to content

net/http: Transport race condition by Content-Length == 0 response #41600

Closed
@akihiro

Description

@akihiro

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/hiroaki-m/.cache/go-build"
GOENV="/home/hiroaki-m/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/hiroaki-m/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/hiroaki-m/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build121864325=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptest"
)

type handler struct{}

func (h handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	w.Header().Add("Content-Length", "0")
}

func requestAndCancel(url string) {
	for {
		wait := make(chan struct{})
		done := make(chan struct{})
		ctx, cancel := context.WithCancel(context.Background())
		req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
		go func() {
			close(wait)
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
		<-wait
		cancel()
		<-done
	}
}

func main() {
	ts := httptest.NewServer(&handler{})
	defer ts.Close()

	http.DefaultTransport.(*http.Transport).MaxConnsPerHost = 1

	// request and cancel
	for i := 0; i < 8; i++ {
		go requestAndCancel(ts.URL)
	}

	// request only
	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if _, err := http.DefaultTransport.RoundTrip(req); err != nil {
			fmt.Printf("An unexpected error has occurred: %#v\n", err)
			break
		}
	}
}

I'm gussing that the keep-alived tcp connection is used simultaneously by multiple goroutine.
Early return to connection pool https://go.googlesource.com/go/+/refs/tags/go1.15.2/src/net/http/transport.go#2089

What did you expect to see?

No error response and loop infinity.

What did you see instead?

$ date -u; go run context_cancel_race.go ; date -u
Thu 24 Sep 2020 03:28:55 AM UTC
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
Thu 24 Sep 2020 03:30:00 AM UTC

Activity

changed the title [-]net/http: Transport race condition by Context-Length == 0 response[/-] [+]net/http: Transport race condition by Content-Length == 0 response[/+] on Sep 24, 2020
fraenkel

fraenkel commented on Sep 25, 2020

@fraenkel
Contributor

Take a look at how many connections you have in TIME_WAIT when it fails.

davecheney

davecheney commented on Sep 25, 2020

@davecheney
Contributor

Looking at your logic there is nothing to prevent this sequence

                // step 1
		go func() {
                         // step 4
			close(wait)
                         // step 5, req is already cancelled.
			http.DefaultTransport.RoundTrip(req)
			close(done)
		}()
                // step 2
		<-wait
                // step 3
		cancel()
		<-done
akihiro

akihiro commented on Sep 25, 2020

@akihiro
Author

Take a look at how many connections you have in TIME_WAIT when it fails.

$ go run context-test.go; ss -atn > case1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case1.log | wc -l
4096

I set tcp_max_tw_buckets to 65536 for relax limitation of a number of TIME-WAIT sockets.

# echo 65536  >  /proc/sys/net/ipv4/tcp_max_tw_buckets

I test again.

$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60
$ sleep 120 # wait for transition TIME-WAIT to CLOSED 
$ go run context-test.go; ss -atn > case2-1.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-1.log | wc -l
8243
$ sleep 120
$ go run context-test.go; ss -atn > case2-2.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-2.log | wc -l
1495
$ sleep 120
$ go run context-test.go; ss -atn > case2-3.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-3.log | wc -l
14615
$ sleep 120
$ go run context-test.go; ss -atn > case2-4.log
An unexpected error has occurred: &errors.errorString{s:"context canceled"}
$ grep TIME-WAIT case2-4.log | wc -l
9449
fraenkel

fraenkel commented on Sep 26, 2020

@fraenkel
Contributor

Sorry about the wild goose chase. I have narrowed down the issue. Still trying to understand how its happening but I do understand why you get the error.
The connection handed to a request, not just the non-cancelling one is already broken, closed and cancelled. Now the question is why this connection actually exists in the pool.

fraenkel

fraenkel commented on Sep 26, 2020

@fraenkel
Contributor

I am going to have to think about this. There doesn't seem to be an easy fix. The goroutine which reads the response, puts the connection back into the pool and then sends the response back via a channel to roundTrip(). However, the roundTrip() is also watching for contexts that are Done() and will cancel the connection. The connection is already in the idleConn pool. Turns out a request will get this connection before the previous one is actually done so the connection is sometimes broken out of the pool and sometimes healthy but then breaks later.

fraenkel

fraenkel commented on Sep 26, 2020

@fraenkel
Contributor

Here is a minimal testcase to reproduce the issue:

package http_test

import (
	"context"
	"net/http"
	"net/http/httptest"
	"sync"
	"testing"
)

func Test41600(t *testing.T) {
	ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		w.Header().Add("Content-Length", "0")
	}))
	defer ts.Close()

	client := ts.Client()
	transport := client.Transport.(*http.Transport)
	transport.MaxIdleConns = 1
	transport.MaxConnsPerHost = 1

	var wg sync.WaitGroup

	ctx, cancel := context.WithCancel(context.Background())

	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for ctx.Err() == nil {
				reqctx, reqcancel := context.WithCancel(context.Background())
				go reqcancel()
				req, _ := http.NewRequestWithContext(reqctx, http.MethodGet, ts.URL, nil)
				rsp, err := client.Do(req)
				if err == nil {
					defer rsp.Body.Close()
				}
			}
		}()
	}

	for {
		req, _ := http.NewRequest(http.MethodGet, ts.URL, nil)
		if rsp, err := client.Do(req); err != nil {
			t.Errorf("unexpected: %p %v", req, err)
			break
		} else {
			rsp.Body.Close()
		}
	}

	cancel()
	wg.Wait()
}

fraenkel

fraenkel commented on Sep 26, 2020

@fraenkel
Contributor

The problem is easier to solve than I thought. The race is close to what is described but slightly incorrect. This problem can occur with all request but its easiest to reproduce with a HEAD/no response body request.
The readLoop for request 1 will put the connection back into the idle pool. Request 2 will pick it up and cancel it. Request 1's roundTrip will be notified that the connection has closed because it hasn't received the response. Once it puts the connection back, it should no longer care about any connection issue since a response will be coming.

gopherbot

gopherbot commented on Sep 26, 2020

@gopherbot
Contributor

Change https://golang.org/cl/257818 mentions this issue: net/http: ignore connection closes once done with the connection

networkimprov

networkimprov commented on Sep 26, 2020

@networkimprov

cc @odeke-em as possible reviewer :-)

added
NeedsFixThe path to resolution is known, but the work has not been done.
on Sep 28, 2020
added this to the Backlog milestone on Sep 28, 2020
networkimprov

networkimprov commented on Dec 2, 2020

@networkimprov

@gopherbot please backport

The relevant CL is https://golang.org/cl/257818 - net/http: ignore connection closes once done with the connection

26 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @davecheney@cagedmantis@akihiro@networkimprov@fraenkel

        Issue actions

          net/http: Transport race condition by Content-Length == 0 response · Issue #41600 · golang/go