Description
When the http.Transport has cached persistent connections to a server and the machine is suspended (e.g. laptop lid closing, serverless environments), the monotonic clock might not advance but the wall clock will.
This is a tracking bug to verify how Transport behaves in such cases.
It would be bad if during the machine's time asleep we got a TCP packet to close a connection but we never saw it (due to being asleep), and then upon resume we try to re-use that dead TCP connection, get a write error, and then are unable to retry for whatever reason (non-idempotent or non-replayable request).
Before using a connection we should look at the current wall time and compare it to the wall time of when it was last idle. We might already be doing that (by accident?) but we might also be accidentally using monotonic time, in which case we wouldn't notice the missing chunks of time.
Investigate.
/cc @jadekler
Activity
ianlancetaylor commentedon Dec 17, 2018
See also #24595.
barrier
errors &no TLS config found
hashicorp/vault#6641chris-vest commentedon Jun 12, 2019
@bradfitz Any idea if this is being worked on / investigated?
odeke-em commentedon Jun 12, 2019
@chris-vest thanks for the ping! Brad is currently very busy and he is also currently on leave but I am going to start looking at this issue next week and hopefully have something for Go1.14.
odeke-em commentedon Jun 22, 2019
I think perhaps one of these might be reversed? After a SIGCONT is sent, the monotonic clock usually accumulates more nanoseconds than the wall clock -- I might be mistaken though.
So firstly
pconn.idleTimer
can't figure out how to reaps stale requests that have when it is frozen in time/stopped, so we should also examine a way of checking for stale reaps.When we go to sleep, there are 2 main cases or (3 sub-cases):
pconn.t.CloseIdleTimeout + pconn.idleAt <= time.Now()
and immediately close this connectionif !pc.isReused()
check and returnconnection reset by peer
as reported in the precursory bug code herepconn.createTime
or reusepconn.idleTime
to capture when it was created. Might require perhaps exposing an internal runtime time API that'll compare monotonic vs wall clock to capture the scenario being reported hereI spent quite sometime digging through the standard library as well as the google-cloud-go work and basically these scenarios can be reproduced just by simulating how to stop processes by sending
SIGTSTP
and then reviving them withSIGCONT
after a period, showing that reused connections that receive ECONNRESET will spawn a new persistentConn and after the server has suddenly closed the TCP connection, the persistentConn finally responds withread tcp [::1]:59102->[::1]:59098: read: connection reset by peer
for example in this reprohttps://github.com/odeke-em/bugs/tree/master/golang/29308
Way forward
Perhaps a
//golinkname mono time.mono
link and then some mechanisms for detecting drift e.g.in this patch meant to handle the case of clock drifts/frozen time:
Results
Before patch
After proposed patch
Please let me know what y'all think.
gopherbot commentedon Jun 24, 2019
Change https://golang.org/cl/183557 mentions this issue:
net/http: detect and make persistConn handle time drifts
odeke-em commentedon Sep 13, 2019
One other thing I discovered today while thinking out loud about this issue is that perhaps the runtime on getting a SIGCONT can refetch the current time and go update all the previous timers that may have drifted. This might even be the simpler and more correct solution instead of the addition to get time.mono.
5 remaining items
odeke-em commentedon Nov 1, 2019
Roger that and nice work @bradfitz! Let me try it out right now.
gopherbot commentedon Nov 1, 2019
Change https://golang.org/cl/204797 mentions this issue:
net/http: only use wall time in Transport idle conn timeouts
odeke-em commentedon Nov 1, 2019
@bradfitz this gist might help in automation of the code for easier feedback loops when making the change https://gist.github.com/odeke-em/639f947edded2f86ae34d286fb12f875#file-main-go
net/http: make Transport.IdleConnTimeout consider wall (not monotonic…
mpx commentedon Nov 25, 2019
This seems like another example where using BOOTTIME would help? (#24595)
gopherbot commentedon Nov 25, 2019
Change https://golang.org/cl/208798 mentions this issue:
http2: make Transport.IdleConnTimeout consider wall (not monotonic) time
http2: make Transport.IdleConnTimeout consider wall (not monotonic) time
gopherbot commentedon Nov 27, 2019
Change https://golang.org/cl/209077 mentions this issue:
net/http: update bundled x/net/http2