Skip to content

net.http: http.Client sometimes does not use a connection produced by a canceled dialer #53627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cwmos opened this issue Jun 30, 2022 · 5 comments

Comments

@cwmos
Copy link

cwmos commented Jun 30, 2022

I have a use case where:

  • I want to make many HTTP requests each with a short deadline
  • It may take a long time to dial the HTTP server - longer than the deadline of each HTTP request.

This does not work. Consider the following set of events:

  • I create an http.Client with a custom dialer
  • I start an HTTP request in the http.Client
  • http.Client invokes the custom dialer
  • I cancel the HTTP request
  • The dialer completes successfully producing a net.Conn

After this, if I make a new HTTP request in the http.Client, then the produced net.Conn is not reused. Instead a new dial is performed.

I also made the program below to illustrate the problem. The program dials two times where I would like it to dial just one time. To be precise, I get this output:

Request 0: Starts
Dial 0: Sleeps for 2 seconds!
Request 0: Done with error Get "https://www.google.com/": context deadline exceeded
Request 1: Starts
Dial 1: Sleeps for 2 seconds!
Dial 0: Performs actual dial
Dial 0: Done with success
Dial 1: Performs actual dial
Dial 1: Done with success
Request 1: Got HTTP status 200

Where I would like this output:

Request 0: Starts
Dial 0: Sleeps for 2 seconds!
Request 0: Done with error Get "https://www.google.com/": context deadline exceeded
Request 1: Starts
Dial 0: Performs actual dial
Dial 0: Done with success
Request 1: Got HTTP status 200

The program is here:

package main

import (
	"context"
	"crypto/tls"
	"fmt"
	"io/ioutil"
	"net"
	"net/http"
	"time"
)

func main() {
	// Custom dialer using two seconds to dial
	dial := 0
	myDialer := func(ctx context.Context, network, addr string) (net.Conn, error) {
		d := dial
		dial += 1
		dialer := tls.Dialer{}
		fmt.Printf("Dial %v: Sleeps for 2 seconds!\n", d)
		time.Sleep(2 * time.Second)
		fmt.Printf("Dial %v: Performs actual dial\n", d)
		// NOTE: Uses context.Background() to allow the dial to continue even if the request is aborted.
		ret, err := dialer.DialContext(context.Background(), network, addr)
		if err != nil {
			fmt.Printf("Dial %v: Done with error %v\n", d, err)
		} else {
			fmt.Printf("Dial %v: Done with success\n", d)
		}
		return ret, err
	}

	// HTTP client using custom dialer
	client := http.Client{
		Transport: &http.Transport{
			DialTLSContext: myDialer,
		},
	}

	// Function that makes an HTTP request
	makeRequest := func(ctx context.Context, r int) {
		req, err := http.NewRequestWithContext(ctx, "GET", "https://www.google.com/", nil)
		if err != nil {
			panic(err)
		}

		fmt.Printf("Request %v: Starts\n", r)
		resp, err := client.Do(req)
		if err != nil {
			fmt.Printf("Request %v: Done with error %v\n", r, err)
		} else {
			ioutil.ReadAll(resp.Body)
			resp.Body.Close()
			fmt.Printf("Request %v: Got HTTP status %v\n", r, resp.StatusCode)
		}
	}

	// Make request with 1 second timeout:
	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
	defer cancel()
	makeRequest(ctx, 0)

	// Make request without timeout
	makeRequest(context.Background(), 1)
}
@cwmos cwmos added the Proposal label Jun 30, 2022
@gopherbot gopherbot added this to the Proposal milestone Jun 30, 2022
@ZekeLu
Copy link
Contributor

ZekeLu commented Jun 30, 2022

I have modified the code to enable http tracing. According to my test, the second request will use the connection from the first dial (see the first highlighted line below). But the second dial will continue because the call to myDialer has already started (see the second highlighted line below).

Your log is a little different from mine. But I think that's because the first dial takes a little longer. I guess the second request still uses the connection from the first dial in your case. Can you run my modified code (attached to the end of this comment) and check the log again?

My environment: go version go1.18.3 linux/amd64

  19:22:20.460226 Request 0: Starts
  19:22:20.460357 **trace 0**GetConn: www.google.com:443
  19:22:20.460377 Dial 0: Sleeps for 2 seconds!
  19:22:21.460536 Request 0: Done with error Get "https://www.google.com/": context deadline exceeded
  19:22:21.460638 Request 1: Starts
  19:22:21.460667 **trace 1**GetConn: www.google.com:443
  19:22:21.460742 Dial 1: Sleeps for 2 seconds!
  19:22:22.460801 Dial 0: Performs actual dial
  19:22:22.588999 Dial 0: Done with success
  19:22:22.589029 **trace 0**TLSHandshakeStart
  19:22:22.589048 **trace 0**TLSHandshakeDone
+ 19:22:22.589077 **trace 1**GotConn: {Conn:0xc0000a5180 Reused:true WasIdle:false IdleTime:0s}
  19:22:22.589119 **trace 1**WroteHeaderField
  19:22:22.589123 **trace 1**WroteHeaderField
  19:22:22.589145 **trace 1**WroteHeaderField
  19:22:22.589146 **trace 1**WroteHeaders
  19:22:22.589169 **trace 1**WroteRequest
  19:22:22.662630 **trace 1**GotFirstResponseByte
  19:22:22.665751 **trace 1**PutIdleConn: <nil>
  19:22:22.665822 Request 1: Got HTTP status 200
+ 19:22:23.461797 Dial 1: Performs actual dial
  19:22:23.584438 Dial 1: Done with success
  19:22:23.584499 **trace 1**TLSHandshakeStart
  19:22:23.584515 **trace 1**TLSHandshakeDone
modified code
package main

import (
	"context"
	"crypto/tls"
	"io/ioutil"
	"log"
	"net"
	"net/http"
	"net/http/httptrace"
	"time"
)

func main() {
	log.SetFlags(log.Lmicroseconds)
	// Custom dialer using two seconds to dial
	dial := 0
	myDialer := func(ctx context.Context, network, addr string) (net.Conn, error) {
		d := dial
		dial += 1
		dialer := tls.Dialer{}
		log.Printf("Dial %v: Sleeps for 2 seconds!\n", d)
		time.Sleep(2 * time.Second)
		log.Printf("Dial %v: Performs actual dial\n", d)
		// NOTE: Uses context.Background() to allow the dial to continue even if the request is aborted.
		ret, err := dialer.DialContext(context.Background(), network, addr)
		if err != nil {
			log.Printf("Dial %v: Done with error %v\n", d, err)
		} else {
			log.Printf("Dial %v: Done with success\n", d)
		}
		return ret, err
	}

	tracer := func(r int) *httptrace.ClientTrace {
		return &httptrace.ClientTrace{
			GetConn: func(hostPort string) {
				log.Printf("**trace %d**GetConn: %s\n", r, hostPort)
			},
			GotConn: func(connInfo httptrace.GotConnInfo) {
				log.Printf("**trace %d**GotConn: %+v\n", r, connInfo)
			},
			PutIdleConn: func(err error) {
				log.Printf("**trace %d**PutIdleConn: %+v\n", r, err)
			},
			GotFirstResponseByte: func() {
				log.Printf("**trace %d**GotFirstResponseByte", r)
			},
			Got100Continue: func() {
				log.Printf("**trace %d**Got100Continue", r)
			},
			DNSStart: func(httptrace.DNSStartInfo) {
				log.Printf("**trace %d**DNSStart", r)
			},
			DNSDone: func(dnsInfo httptrace.DNSDoneInfo) {
				log.Printf("**trace %d**DNS Info: %+v\n", r, dnsInfo)
			},
			ConnectStart: func(network string, addr string) {
				log.Printf("**trace %d**ConnectStart", r)
			},
			ConnectDone: func(network string, addr string, err error) {
				log.Printf("**trace %d**ConnectDone", r)
			},
			TLSHandshakeStart: func() {
				log.Printf("**trace %d**TLSHandshakeStart", r)
			},
			TLSHandshakeDone: func(tls.ConnectionState, error) {
				log.Printf("**trace %d**TLSHandshakeDone", r)
			},
			WroteHeaderField: func(key string, value []string) {
				log.Printf("**trace %d**WroteHeaderField", r)
			},
			WroteHeaders: func() {
				log.Printf("**trace %d**WroteHeaders", r)
			},
			Wait100Continue: func() {
				log.Printf("**trace %d**Wait100Continue", r)
			},
			WroteRequest: func(httptrace.WroteRequestInfo) {
				log.Printf("**trace %d**WroteRequest", r)
			},
		}
	}

	// HTTP client using custom dialer
	client := http.Client{
		Transport: &http.Transport{
			DialTLSContext: myDialer,
		},
	}

	// Function that makes an HTTP request
	makeRequest := func(ctx context.Context, r int) {
		req, err := http.NewRequestWithContext(ctx, "GET", "https://www.google.com/", nil)
		if err != nil {
			panic(err)
		}

		req = req.WithContext(httptrace.WithClientTrace(req.Context(), tracer(r)))

		log.Printf("Request %v: Starts\n", r)
		resp, err := client.Do(req)
		if err != nil {
			log.Printf("Request %v: Done with error %v\n", r, err)
		} else {
			ioutil.ReadAll(resp.Body)
			resp.Body.Close()
			log.Printf("Request %v: Got HTTP status %v\n", r, resp.StatusCode)
		}
	}

	// Make request with 1 second timeout:
	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
	defer cancel()
	makeRequest(ctx, 0)

	// Make request without timeout
	makeRequest(context.Background(), 1)

	// wait for the second dial
	time.Sleep(2 * time.Second)
}

@ianlancetaylor ianlancetaylor changed the title proposal: net.http: Allow http.Client to use a connection produced by a canceled dialer proposal: net.http: allow http.Client to use a connection produced by a canceled dialer Jun 30, 2022
@ianlancetaylor
Copy link
Contributor

CC @neild @bradfitz

I'm not sure there is a API change here, so I'm not sure that this needs to go through the proposal process.

@cwmos
Copy link
Author

cwmos commented Jun 30, 2022

I tried to run your code.

It turns out that sometimes it works bad similar to what I got in my original report:

23:30:01.657620 Request 0: Starts
23:30:01.658132 **trace 0**GetConn: www.google.com:443
23:30:01.658132 Dial 0: Sleeps for 2 seconds!
23:30:02.670997 Request 0: Done with error Get "https://www.google.com/": context deadline exceeded
23:30:02.670997 Request 1: Starts
23:30:02.670997 **trace 1**GetConn: www.google.com:443
23:30:02.671524 Dial 1: Sleeps for 2 seconds!
23:30:03.665208 Dial 0: Performs actual dial
23:30:03.717689 Dial 0: Done with success
23:30:03.717913 **trace 0**TLSHandshakeStart
23:30:03.717913 **trace 0**TLSHandshakeDone
23:30:04.675651 Dial 1: Performs actual dial
23:30:04.701410 Dial 1: Done with success
23:30:04.701410 **trace 1**TLSHandshakeStart
23:30:04.701952 **trace 1**TLSHandshakeDone
23:30:04.701952 **trace 1**GotConn: {Conn:0xc0001b0000 Reused:false WasIdle:false IdleTime:0s}
23:30:04.701952 **trace 1**WroteHeaderField
23:30:04.701952 **trace 1**WroteHeaderField
23:30:04.702470 **trace 1**WroteHeaderField
23:30:04.702470 **trace 1**WroteHeaders
23:30:04.702470 **trace 1**WroteRequest
23:30:04.743771 **trace 1**GotFirstResponseByte
23:30:04.745453 **trace 1**PutIdleConn: <nil>
23:30:04.745647 Request 1: Got HTTP status 200

And sometimes it works well similar to what you got:

23:31:42.542287 Request 0: Starts
23:31:42.542803 **trace 0**GetConn: www.google.com:443
23:31:42.542803 Dial 0: Sleeps for 2 seconds!
23:31:43.549381 Request 0: Done with error Get "https://www.google.com/": context deadline exceeded
23:31:43.549381 Request 1: Starts
23:31:43.549381 **trace 1**GetConn: www.google.com:443
23:31:43.549381 Dial 1: Sleeps for 2 seconds!
23:31:44.556903 Dial 0: Performs actual dial
23:31:44.613383 Dial 0: Done with success
23:31:44.613383 **trace 0**TLSHandshakeStart
23:31:44.613383 **trace 0**TLSHandshakeDone
23:31:44.613908 **trace 1**GotConn: {Conn:0xc00007ce00 Reused:true WasIdle:false IdleTime:0s}
23:31:44.613908 **trace 1**WroteHeaderField
23:31:44.613908 **trace 1**WroteHeaderField
23:31:44.613908 **trace 1**WroteHeaderField
23:31:44.614412 **trace 1**WroteHeaders
23:31:44.614556 **trace 1**WroteRequest
23:31:44.654053 **trace 1**GotFirstResponseByte
23:31:44.654053 **trace 1**PutIdleConn: <nil>
23:31:44.654053 Request 1: Got HTTP status 200
23:31:45.552938 Dial 1: Performs actual dial
23:31:45.575223 Dial 1: Done with success
23:31:45.575295 **trace 1**TLSHandshakeStart
23:31:45.575295 **trace 1**TLSHandshakeDone

So perhaps you could try to run it a few times?

I am happy to see that it sometimes seems to work as I would have hoped; so I guess this issue can be turned into a bug report rather than a proposal.

I use go1.17.3 windows/amd64.

@cwmos cwmos changed the title proposal: net.http: allow http.Client to use a connection produced by a canceled dialer net.http: http.Client sometimes does not use a connection produced by a canceled dialer Jun 30, 2022
@ZekeLu
Copy link
Contributor

ZekeLu commented Jun 30, 2022

I have updated the code to log the connection returned by dial to make it easy to find out which connection is used.

I tested it with go version go1.17.3 linux/amd64. I can confirm that it sometimes uses the connection from the first dial and sometimes uses the second one.

But with go version go1.18.3 linux/amd64, it always uses the first connection (10 out of 10). So I think we can say that the issue has been fixed in go1.18.3.

The bisect shows that it's fixed by https://go.dev/cl/379034 (for #50657). And the fix has been backported to 1.17.11 and 1.18.3.

updated code to log conn returned by dial
package main

import (
	"context"
	"crypto/tls"
	"fmt"
	"io/ioutil"
	"log"
	"net"
	"net/http"
	"net/http/httptrace"
	"time"
)

func main() {
	log.SetFlags(log.Lmicroseconds)
	// Custom dialer using two seconds to dial
	dial := 0
	myDialer := func(ctx context.Context, network, addr string) (net.Conn, error) {
		d := dial
		dial += 1
		dialer := tls.Dialer{}
		log.Printf("Dial %v: Sleeps for 2 seconds!\n", d)
		time.Sleep(2 * time.Second)
		log.Printf("Dial %v: Performs actual dial\n", d)
		// NOTE: Uses context.Background() to allow the dial to continue even if the request is aborted.
		ret, err := dialer.DialContext(context.Background(), network, addr)
		if err != nil {
			log.Printf("Dial %v: Done with error %v\n", d, err)
		} else {
			c := fmt.Sprintf("%+v", ret)
			log.Printf("Dial %v: Done with success. %s", d, c[:50])
		}
		return ret, err
	}

	tracer := func(r int) *httptrace.ClientTrace {
		return &httptrace.ClientTrace{
			GetConn: func(hostPort string) {
				log.Printf("**trace %d**GetConn: %s\n", r, hostPort)
			},
			GotConn: func(connInfo httptrace.GotConnInfo) {
				c := fmt.Sprintf("%+v", connInfo.Conn)
				log.Printf("**trace %d**GotConn: %s\n", r, c[:50])
			},
			PutIdleConn: func(err error) {
				log.Printf("**trace %d**PutIdleConn: %+v\n", r, err)
			},
			GotFirstResponseByte: func() {
				log.Printf("**trace %d**GotFirstResponseByte", r)
			},
			Got100Continue: func() {
				log.Printf("**trace %d**Got100Continue", r)
			},
			DNSStart: func(httptrace.DNSStartInfo) {
				log.Printf("**trace %d**DNSStart", r)
			},
			DNSDone: func(dnsInfo httptrace.DNSDoneInfo) {
				log.Printf("**trace %d**DNS Info: %+v\n", r, dnsInfo)
			},
			ConnectStart: func(network string, addr string) {
				log.Printf("**trace %d**ConnectStart", r)
			},
			ConnectDone: func(network string, addr string, err error) {
				log.Printf("**trace %d**ConnectDone", r)
			},
			TLSHandshakeStart: func() {
				log.Printf("**trace %d**TLSHandshakeStart", r)
			},
			TLSHandshakeDone: func(tls.ConnectionState, error) {
				log.Printf("**trace %d**TLSHandshakeDone", r)
			},
			WroteHeaderField: func(key string, value []string) {
				log.Printf("**trace %d**WroteHeaderField", r)
			},
			WroteHeaders: func() {
				log.Printf("**trace %d**WroteHeaders", r)
			},
			Wait100Continue: func() {
				log.Printf("**trace %d**Wait100Continue", r)
			},
			WroteRequest: func(httptrace.WroteRequestInfo) {
				log.Printf("**trace %d**WroteRequest", r)
			},
		}
	}

	// HTTP client using custom dialer
	client := http.Client{
		Transport: &http.Transport{
			DialTLSContext: myDialer,
		},
	}

	// Function that makes an HTTP request
	makeRequest := func(ctx context.Context, r int) {
		req, err := http.NewRequestWithContext(ctx, "GET", "https://www.google.com/", nil)
		if err != nil {
			panic(err)
		}

		req = req.WithContext(httptrace.WithClientTrace(req.Context(), tracer(r)))

		log.Printf("Request %v: Starts\n", r)
		resp, err := client.Do(req)
		if err != nil {
			log.Printf("Request %v: Done with error %v\n", r, err)
		} else {
			ioutil.ReadAll(resp.Body)
			resp.Body.Close()
			log.Printf("Request %v: Got HTTP status %v\n", r, resp.StatusCode)
		}
	}

	// Make request with 1 second timeout:
	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
	defer cancel()
	makeRequest(ctx, 0)

	// Make request without timeout
	makeRequest(context.Background(), 1)

	// wait for the second dial
	time.Sleep(2 * time.Second)
}

@cwmos
Copy link
Author

cwmos commented Jul 1, 2022

Thanks for the quick turnaround! I can confirm that I cannot reproduce the problem with go 1.18.3.

@cwmos cwmos closed this as completed Jul 1, 2022
@golang golang locked and limited conversation to collaborators Jul 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants