Description
The default values for Session.config.KeepAliveInterval
and Session.config.ConnectionWriteTimeout
of 30s
and 10s
create the possibility for timed out writes that most (Hashicorp Nomad included) aren't handling in their readers.
Calls to Stream.Read
on one side of a connection will hang until the underlying Session
is closed if the corresponding Stream.Write
call on the other side it's waiting for returns with ErrConnectionWriteTimeout
. This happens in the case of network congestion between the two sides.
If you keep Session.sendCh
full (fixed capacity of 64) for ConnectionWriteTimeout
, but for less than the KeepAliveInterval + ConnectionWriteTimeout
(which would kill the Session
), Stream.Write
will return ErrConnectionWriteTimeout
. The state of the underlying Session
or Stream
is not modified. When this happens (or doesn't, heh), the other side's Stream.Read
call that's waiting for that write will never return because there's no timeout for this edge-case.
Since no keep alive timed out, you can continue to use the Session
once the network congestion is resolved, but that Stream.Read
call will only return when the Session
closes or the response shows up. Since the write call on the other side timed out... well, that's a problem.
I can see three possible fixes, one heavier handed than the other-
- If
Stream.Write
times out, it should implicitly callStream.Close
while returning, to notify any blocked calls toStream.Read
on the other side return.- If the call to
Stream.Close
fails thenSession.Close
should get called, which it doesn't look like that happens now either.
- If the call to
- A call to
Stream.Write
that times out should just close the entireSession
. - (The hacky fix) Make the default keep alive interval less than the write timeout