-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: test timeout with truncated stack trace on linux-ppc64le-power9osu #43175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unfortunately there is a known possibility for deadlock in the stack-dumping code. (See #42062 (comment).) The static lock checker has known about this for a while. @prattmic has a fix which is basically to say "if we're going down anyway, it's probably OK to just race; better than a deadlock, anyway." (golang.org/cl/270861). I think this is a duplicate of #42669 as a result. |
Ah, so now we have a new data point for this part of the comment!
I think this is a case where the deadlock does hurt diagnostics for us: the test process itself is not in a position to dump everything more forcefully, and the |
Change https://golang.org/cl/270861 mentions this issue: |
Since we think this is caused by the allglock deadlock fixed in https://golang.org/cl/270861, I'm going to close this for now. We can reopen if we find this still occurring. |
tracebackothers is called from fatal throw/panic. A fatal throw may be taken with allglock held (notably in the allocator when allglock is held), which would cause a deadlock in tracebackothers when we try to take allglock again. Locking allglock here is also often a lock order violation w.r.t. the locks held when throw was called. Avoid the deadlock and ordering issues by skipping locking altogether. It is OK to miss concurrently created Gs (which are generally avoided by freezetheworld(), and which were possible previously anyways if created after the loop). Fatal throw/panic freezetheworld(), which should freeze other threads that may be racing to modify allgs. However, freezetheworld() does _not_ guarantee that it stops all other threads, so we can't simply drop the lock. Fixes #42669 Updates #43175 Change-Id: I657aec46ed35fd5d1b3f1ba25b500128ab26b088 Reviewed-on: https://go-review.googlesource.com/c/go/+/270861 Reviewed-by: Michael Knyszek <[email protected]> Trust: Michael Pratt <[email protected]>
This looks suspiciously like a deadlock in the runtime's stack-dumping code: the test times out after 3 minutes, but hasn't finished dumping stacks after 4m, and appears to have been interrupted in the middle of a GC.
CC @danscales @mknyszek @aclements
2020-12-14T15:03:28-451b6b3/linux-ppc64le-power9osu
The text was updated successfully, but these errors were encountered: