-
Notifications
You must be signed in to change notification settings - Fork 18k
Go programs stored on AFS may crash with SIGBUS
#50545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have the feeling this is just #48997 with a different filesystem. Go uses a lot more signals than other runtimes, which might account for the fact that other runtimes don't exhibit this problem. It would be interesting to try to reproduce using a C program that sends itself lots of signals. That might illuminate matters. |
+1 |
This looks like a kernel bug. It seems that the kernel is trying to cancel the I/O operation, but you can’t usefully cancel the handling of a page fault! |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
I think this is indeed not a bug in Go, but it is a bug in Linux. As a workaround, could Go register all of its signal handlers with |
Go does already register all signal handlers with |
That makes this either a kernel bug or an AFS bug, then. Probably the former. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Attempt to run a go binary (including the main
go
binary) from an AFS volume:What did you expect to see?
The application should run normally, i.e. without a
SIGBUS
signal.What did you see instead?
The program (at least one of the tests, in this case) crashes with a
SIGBUS
. Note that the problem seems to happen inconsistently, so short workflows seem more likely to succeed without problems, but clearing the caches and retrying seems to reproduce it within a couple of attempts:./go/bin/go test ./go/src/...
OutputMore Information
This does not appear to be specific to the
go test
command or thego
executable in general, that just happens to be an easy example to reproduce the problem. The problem is reproducible (to varying degrees of success) with user applications written in go, although so far I have only managed to trigger the signal with (filesystem) I/O intensive applications. That seems to be true regardless of where the other files are located (and what filesystem they are on), so long as the application binary itself is on an AFS volume, as noted in go-hep/hep#885.I believe this is may be the same problem mentioned in the comment #37310 (comment), which does not appear to have a dedicated issue. This also appears suspiciously similar to an issue with go binaries and gcsfuse, see: #48997. The workaround from the latter issue appears to work here as well: running with
GODEBUG=asyncpreemptoff=1
seems to prevent theSIGBUS
error from occuring (for both the maingo
binary or for compiled user applications).If I had to guess, I would say that this looks like something is being preempted at a point that's unsafe if the binary happens to be on AFS (or apparently gcsfuse), but which turns out to be safe (in practice) on other filesystems. I don't know if that's a Go problem or an AFS (and gcsfuse) problem, but other binaries do not encounter a
SIGBUS
when running from the same AFS volume (or on gcsfuse, according to that issue), so my initial suspicious is that the problem is on the Go side.The text was updated successfully, but these errors were encountered: