-
Notifications
You must be signed in to change notification settings - Fork 18.1k
net: deadlock in TestNotTemporaryRead via net.withTCPConnPair #29685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The names used are confusing but I can't see that it matters in this case.
Go expects that calling the
As opposed to what? The Go code will work fine if The stack traces from the trybots suggest that the |
That's the problem. On AIX,
There is nothing similar on AIX. Therefore, once a socket is opened with Using the C program above, I've the following trace on AIX:
The last
If you look at TestDialContextCancelRace, the connect function explicitly returns EINPROGRESS if not error is returned. It's the case on AIX. I don't know if it's possible to change how Anyway, this difference only impacts TestNotTemporaryRead at the moment. Moreover, this bug only happens when a machine is very busy. Increasing the sleep duration and adding another sleep in the |
On GNU/Linux |
Change https://golang.org/cl/158038 mentions this issue: |
Well, I've submitted a CL to fix this test. I'll see later if I can provide a more general fix on that issue. |
On aix/ppc64, if the server closes before the client calls Accept, this test will fail. Increasing the time before the server closes should resolve this timeout. Updates #29685 Change-Id: Iebb849d694fc9c37cf216ce1f0b8741249b98016 Reviewed-on: https://go-review.googlesource.com/c/158038 Reviewed-by: Ian Lance Taylor <[email protected]>
Another recent timeout in this test, again on the |
Yes, it seems the new sleep time is still not enough... This test isn't fully compatible with AIX behavior which is slightly different if an |
Yep, see if runtime.GOOS == "aix" {
testenv.SkipFlaky(t, 29685)
} |
Change https://golang.org/cl/185717 mentions this issue: |
This test sometimes times out when the machine is busy. The reason behind is still a bit blurry. But it seems to comes from the fact that on AIX, once a listen is performed a socket, every connection will be accepted even before an accept is made (which only occurs when a machine is busy). On Linux, a socket is created as a "passive socket" which seems to wait for the accept before allowing incoming connections. Updates #29685 Change-Id: I41b053b7d5f5b4420b72d6a217be72e41220d769 Reviewed-on: https://go-review.googlesource.com/c/go/+/185717 Run-TryBot: Clément Chigot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
This failure mode does not appear to be specific to AIX.
2022-02-09T22:58:35-2bf5ae0/openbsd-arm64-jsing The stack traces in the more recent
|
Change https://go.dev/cl/385314 mentions this issue: |
Change https://go.dev/cl/385754 mentions this issue: |
…lan9 Updates #29685 Change-Id: Id8dca078213942666871ac8ded663326e98427fe Reviewed-on: https://go-review.googlesource.com/c/go/+/385754 Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Emmanuel Odeke <[email protected]> Trust: Bryan Mills <[email protected]> Run-TryBot: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
Hi,
I'm trying to resolve timeouts occurring on aix/ppc64 with
net.TestNotTemporaryRead
.https://build.golang.org/log/45540cc03c1d37057e8f725d7f2dd431652ddf4c
https://build.golang.org/log/37d60c3b3cd46cf39d118f84e695049d390da40e
...
This timeout occurs because
Accept()
seems to be stuck in a infinite loop if the server is already closed. It's only a guess because I can't trigger the bug manually on my local machine. However, a similar behaviorr can be easily made with: https://play.golang.org/p/0IXrHf87i-2.It does work on
linux/amd64
but it times out onaix/ppc64
. This might not be the root of this bug but a possible workaround can be to increase the delay on the server.However, I've several questions:
Accept()
and the server doingDial()
? Is it supposed to be the opposite or it doesn't matter ? This is the case for some others tests of net_test.go.accept()
is made when the server is already closed (but the port is still listened) ?Should it succeed or not ? On aix/ppc64, accept syscall returns EAGAIN (because of O_NONBLOCK flag) and on linux/amd64 it does succeed.
I've also discovered that the behavior of
connect
is slightly different on AIX than on Linux (I don't know about others OSes). I've tried with the following C code (taken from #6828): accept_after_connect.c.txt. The firstconnect
doesn't return EINPROGRESS as on Linux. It doesn't seem a bug as a connection can result from thelisten
syscall.Does Go want EINPROGRESS to be returned ? (*netFD).connect will wait with netpoll if it is.
The text was updated successfully, but these errors were encountered: