Description
In rare cases on linux/amd64 race builds we've seen crashes that look like:
fatal: morestack on gsignal
signal: trace/breakpoint trap (core dumped)
The root cause is signal delivery on a sigaltstack allocated very close to the g0 stack. When cgo is enabled, mstart
estimates the g0 stack bounds (cgo side), but this is a rough estimate and the g0 stack.lo
may actually be beyond the end of the g0 stack.
On signal delivery, adjustSignalStack
may then incorrectly determine that the signal was delivered on the g0 stack . Since the overlap is likely to be very close to g0 stack.lo
, functions in signal handling have a high probability of "running out of stack space" and calling morestack
. Boom.
Here's one example of overlap I captured:
Our SP on sigtrampgo entry: 0x7f99841fe328
sigaltstack from sigcontext: [0x7f99841ef000, 0x7f99841ff000)
g0 stack from gp.m.g0.stack: [0x7f99841fded8, 0x7f99849fdad8)
mstart
contains a fudge factor of 1024 to try to address this inaccuracy, but checking against pthread_attr_getstack
indicates that the mstart
SP is actually 9616 bytes below the top of the stack (that may be off by 1 page (4096), I need to double check. Either way > 1024 bytes).