Description
Issue
GO1.7's functionality for HTTP 1.1's keep-alive, net/http.Server.ReadTimeout, and net/http.CloseNotifier is not consistent with the documented and/or expected behavior, and cannot be used together properly. The timer created by setting the server's readTimeout value is reset only at the end of processing each request and is not paused/reset when we have finished reading in each request, causing CloseNotify() to signal without cause.
What did I do
This test creates a simple http server and then sends 2 requests, each of which takes 40ms of processing time with a 70ms gap between them, responding 200OK if the 40ms of work completes, and 503 if it receives a closeNotify signal before finishing that work.
Expected Result
We should receive 2 200 OK messages and not receive a signal from CloseNotify().
We have a readTimeout of 100ms, and a 70ms delay between receiving a response for the first request and sending the second request, so both requests should be read within the timeout and be able to complete their work.
Actual Result
We actually see that during the second request, we receive a closeNotify() signal approximately 30ms after the client has sent the second request. At this time we have clearly read the second request and started processing it with a still alive TCP connection, but still receive a closeNotify() signal, which should not be happening. There is also an attached wireshark log showing this.
package main
import (
"io"
"io/ioutil"
"net/http"
"net/http/httptest"
"sync"
"testing"
"time"
)
func TestServerKeepAliveReadTimeoutCloseNotify(t *testing.T) {
var startMu sync.Mutex
var start time.Time
testServer := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
cn := w.(http.CloseNotifier)
select {
case <-time.After(40 * time.Millisecond):
w.WriteHeader(http.StatusOK)
case <-cn.CloseNotify():
startMu.Lock()
delta := time.Since(start)
startMu.Unlock()
t.Errorf("Unexpected CloseNotify received after %s", delta)
w.WriteHeader(http.StatusServiceUnavailable)
}
}))
testServer.Config.ReadTimeout = 100 * time.Millisecond
testServer.Start()
defer testServer.Close()
for i := 0; i < 2; i++ {
startMu.Lock()
start = time.Now()
startMu.Unlock()
resp, err := http.Get(testServer.URL)
if err != nil {
t.Fatal(err)
}
_, err = io.Copy(ioutil.Discard, resp.Body)
if err != nil {
t.Fatal(err)
}
err = resp.Body.Close()
if err != nil {
t.Fatal(err)
}
time.Sleep(70 * time.Millisecond)
}
}
Test output
$ go test notifierIssue_test.go
--- FAIL: TestServerKeepAliveReadTimeoutCloseNotify (0.22s)
notifierIssue_test.go:26: Unexpected CloseNotify received after 30.752781ms
FAIL
FAIL command-line-arguments 0.225s
Potential Fix
Currently the read timeout is only reset here: https://github.com/golang/go/blob/go1.7/src/net/http/server.go#L749 after a request has finished being processed and we start waiting on the next one. However, the timeout is never paused/reset for the period between the end of reading the request and the end of sending the response, as it should be.
What version of Go are you using (go version
)?
go version go1.7 darwin/amd64
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4r/1g2c4css78b1yn2q32dmqqm0l05w9h/T/go-build102138663=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
Activity
bradfitz commentedon Sep 1, 2016
Your bug report starts off saying "GO1.7's functionality ...". Is this a regression from Go 1.6? I don't recall anything changing here.
[-]ReadTimeout Breaks Server CloseNotify Functionality For Keep-Alive Connections[/-][+]net/http: ReadTimeout Breaks Server CloseNotify Functionality For Keep-Alive Connections[/+]freeformz commentedon Sep 1, 2016
I'm pretty sure this is documented under https://golang.org/pkg/net/http/#CloseNotifier, or am I misunderstanding the docs?
freeformz commentedon Sep 1, 2016
For instance if you instead create a NewRequest and set req.Close=true (which forces another TCP connection), then your test works, which is what is described in the documentation.
prabrisat1 commentedon Sep 1, 2016
As far as I know this should act the same way in GO 1.6 and 1.7. The oldest version I saw the same issue with was 1.5.
As for the documentation for closeNotifier, i'm not sure what you're referring to.
From my understanding of the documentation if you set req.Close=true either on the client side or the server side (for response) you will kill the tcp connection after a single request-response pair, which is the same as not using keep-alive connections at all since you will need to reestablish tcp for the next request. This issue is specifically a problem when using keep-alive connections.
rhysh commentedon Sep 1, 2016
What I find confusing here is that the ReadTimeout timer starts when the previous handler returns, and that it's not reset when the first byte of the following request arrives.
It's not clear to me how to correctly use net/http.Server.ReadTimeout without entirely disabling HTTP/1.1 keepalives. The timeout it controls is neither how long it takes for a subsequent request to arrive, nor how long it takes to read that request and body once it starts arriving.
bradfitz commentedon Sep 1, 2016
Yeah, ReadTimeout is extremely ill-specified (and implemented).
Related bugs:
Those two should be fixed together.
I haven't re-read this bug enough to understand whether there's a unique problem here, or just a different view on one of the existing ones.
freeformz commentedon Sep 2, 2016
Looks like I was confusing pipelining with connection keepalives. :-( Sorry for the noise.
freeformz commentedon Sep 3, 2016
After peppering the code with debug info I see the first request happening and then as soon as the first request finishes server.go (in readRequest) resets the deadline to 100 ms in the future.
This makes sense because we're again reading from the socket for the next http request and both the http client and the server are holding the connection open.
AFAICT this has nothing to do with sending the response since we reset the timeout after sending the response. The client is just being slow from the perspective of the the server.
I think this is effectively working as intended.
prabrisat1 commentedon Sep 3, 2016
You're correct that as soon as the server finishes the first request it resets the deadline. That is the problem, the readtimeout is reset after first request is completed, not after the message is read specifically.
Its much clearer if you take a look at the attached wireshark logs. The read timeout is not being reset after the request is read, but after the request has been read and the handler's work has been completed. In the above example, the client receives a 200 OK ~41ms after process starts, and the readTimeout is reset here @ 41ms. The server receives the next request at ~116ms, so ~75ms later. The message is just a GET with no body and takes ~1-2ms to be read, a total of ~77ms after the timer has been reset, so we should have read a message within the readTimeout right? This isn't true because the readTimeout is limiting the readTime and the time required for the handler to complete, which is why we receive a closeNotify signal at ~142ms before our 40ms of work on the request completes at ~117ms.
Effectively, the current readTimeout measures (time spent waiting for a request + time spend reading request + time spent doing work on the request) when it should really only represent either (time waiting on request + time reading request) or even just (time spent reading request) for the name readTimeout to actually be applicable.
freeformz commentedon Sep 3, 2016
@prabrisat1
But, after the first request completes is the start of the next request when the TCP connection is reused as part of HTTP 1.1 keepalives. That is when you have to start the read timeout, otherwise a client can just hold open the connection and starve you of resources. In effect ReadTimeout becomes RequestTimeout though because of it.
Here is the sequence of events as I've seen them:
Client Issues Get -> Server SetsDeadline to 100ms in the future as part of readRequest -> Handler is fired, which sleeps for 40ms (60ms left) -> Sends Response -> Server SetsDeadline(100ms in the future) as part of readRequest (for next request, which blocks waiting on i/o from the client) -> client sleeps for 70ms (only ~30ms left until new deadline) -> client Issues Get -> Handler is fired, which sleeps 40ms (boom because only 30ms left).
We could keep extending the deadline in various places, but that is costly from a syscall perspective IIRC. Possibly other issues with that as well, not sure. I defer to @bradfitz.
At the very least this should be documented better. If there is interest in that I can work up a PR that attempts to explain it.
freeformz commentedon Sep 3, 2016
PS: We've also run into this issue internally, hence my interest.
prabrisat1 commentedon Sep 3, 2016
Definitely agree that the deadline can't be extended. @bradfitz referenced this issue in #14204 which I think is a pretty good solution to the issue. Separating the current readTimeout into a readTimeout and an idleTimeout makes the naming much clearer and solves the functional issue as well I think.
quentinmit commentedon Sep 6, 2016
Can this be closed as a dupe of #14204?
Or is this representing the "read timeout is actually response timeout" issue and the other is representing "read timeout is actually idle timeout"?
bradfitz commentedon Sep 6, 2016
I'd like to keep both open. They're related, but might be closed separately.
bradfitz commentedon Nov 1, 2016
I verified that this was fixed by the changes in #14204, etc. Mostly faf882d and c7e0dda, but a number of other supporting ones.