-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: GC crash on linux-amd64-noopt #17785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Weird. Something is definitely confused here, the state of the span is _MSpanStack but the GC is scanning it as if it were a span of 16-byte objects. |
Four more on the first page of build.golang.org: https://build.golang.org/log/b3ae09ee6ba8718e621f8548694c317e425057fe All on the noopt builder so far. That's usually something that's usually On 8 November 2016 at 06:17, Keith Randall [email protected] wrote:
|
This crash continues again, now that an unrelated noopt test failure is no longer hiding it. @aclements, any ideas? |
Some statistical observations. I had assumed this was caused by the hybrid barrier, but it actually started a little before the hybrid barrier went in. The failure probability did significantly increase after the hybrid barrier---from ~1.5% to 18%---but the timing isn't quite right to say the hybrid barrier itself increased the failure probability. Here are the first few instances of this failure: 2016-10-19T07:09:08-9aed16e/linux-amd64-noopt All failures are on linux-amd64-noopt. We've observed 68 failures so far. All of these are when running misc/cgo/test. 43 of them are when running misc/cgo/test in the "Testing race detector" section. The rest are when running misc/cgo/test regularly (without the race detector). There are some slight variations on the failure. They're all roughly:
18 failures have |
The following reproduces it with high probability: cd $GOROOT/src
GO_GCFLAGS="-N -l" ./make.bash
cd ../misc/cgo/test
go test -gcflags '-N -l' -c
GOTRACEBACK=2 GOGC=1 ./test.test -test.short You can also reproduce it with All failures are specifically in
The block that's being scanned is the arguments to The caller probably doesn't matter since it's the arguments map, but here's its declaration and compilation with _Cfunc_issue7978c//go:cgo_import_static _cgo_4d9b5135caa8_Cfunc_issue7978c
//go:linkname __cgofn__cgo_4d9b5135caa8_Cfunc_issue7978c _cgo_4d9b5135caa8_Cfunc
_issue7978c
var __cgofn__cgo_4d9b5135caa8_Cfunc_issue7978c byte
var _cgo_4d9b5135caa8_Cfunc_issue7978c = unsafe.Pointer(&__cgofn__cgo_4d9b5135ca
a8_Cfunc_issue7978c)
//go:cgo_unsafe_args
func _Cfunc_issue7978c(p0 *_Ctype_uint32_t) (r1 _Ctype_void) {
_cgo_runtime_cgocall(_cgo_4d9b5135caa8_Cfunc_issue7978c, uintptr(unsafe.Pointer(&p0)))
if _Cgo_always_false {
_Cgo_use(p0)
}
return
}
From this we can see that the bad pointer is the Here's the compilation of runtime.cgocall:
The args map for runtime.cgocall is {0b011, 0b000, 0b011, 0b000}. In this compilation, the failure happened at cgocall+0xb0, and the code path to there uses either map index 0 or 2, both of which have both arguments marked live. |
I think I understand what's happening:
@khr or @ianlancetaylor, any thoughts on the right way to fix this? It's sufficient to add |
One might argue that all of cgo callbacks feel like a horrible, horrible hack. I see nothing in your description that mentions noopt. How is that relevant? Would it help to give asmcgocall the correct args map (that its two args are pointers)? |
If I understand this correctly, it seems like the key failure is that in |
I can easily trigger this is on a regular opt build by forcing stack growth in the cgo callback of this test, so I think noopt just enlarged the existing stacks enough to cause a stack growth.
No, I don't think so. The problem is the arguments to cgocall, not asmcgocall. When the GC stack trace runs, it doesn't even see asmcgocall on the stack (since it's starting from the entersyscall-saved PC/SP). |
So if I understand this correctly, the nature of cgo calls is that we only see If that all sounds true I don't see why your earlier suggestion of using |
Essentially, yes. We see cgocall twice: in the first stack trace (for stack growth) the arguments are dead, and in the second trace (for GC) they've risen from the dead and are live again. The undead arguments cause the crash.
Okay. I checked all assignments to syscallsp/pc and this appears to be the only situation where we ever "roll them back". |
CL https://golang.org/cl/33710 mentions this issue. |
https://build.golang.org/log/05d53c7b89a4539c468ebec21fdfd59ee7ac52b3
linux-amd64-noopt at d1e9104
The text was updated successfully, but these errors were encountered: