Description
In reviewing @ianlancetaylor's CL 46003 (addressing #14571), I noticed what I believe to be a rare race-condition that could result in unexpected termination of a Go program.
I attempted to write a reproducer for the race in question, which would cause the program to erroneously exit if a _SigKill
or _SigThrow
signal arrives in between a call to signal.Notify
and signal.Ignore
for that signal. The symptom I was looking for is a termination with the same signal number (for _SigKill
) or a goroutine dump (for _SigThrow
).
Unfortunately, I am unable to reproduce the race in question because the program instead crashes with a SIGSEGV
in runtime.sigfwd
.
sigrace/sigrace.go:
package main
import (
"os"
"os/signal"
"runtime"
"syscall"
)
func main() {
signal.Ignore(syscall.SIGHUP)
go func() {
runtime.LockOSThread()
pid := os.Getpid()
tid := syscall.Gettid()
for {
syscall.Tgkill(pid, tid, syscall.SIGHUP)
}
}()
c := make(chan os.Signal, 1)
for {
signal.Notify(c, syscall.SIGHUP)
signal.Ignore(syscall.SIGHUP)
}
}
$ go version
go version devel +c4e0e81653 Wed Jun 21 15:54:38 2017 +0000 linux/amd64
$ go build sigrace && ./sigrace
Segmentation fault (core dumped)
Investigation with GDB gives the following stack trace:
#0 0x0000000000000001 in ?? ()
#1 0x0000000000452a6d in runtime.sigfwd ()
at /home/bcmills/go/src/runtime/sys_linux_amd64.s:245
#2 0x000000c420053b18 in ?? ()
#3 0x000000000043b665 in runtime.sigfwdgo (sig=1, info=0xc420053d70, ctx=0xc420053c40, ~r3=false)
at /home/bcmills/go/src/runtime/signal_unix.go:606
#4 0x000000000043a83b in runtime.sigtrampgo (sig=1, info=0xc420053d70, ctx=0xc420053c40)
at /home/bcmills/go/src/runtime/signal_unix.go:272
#5 0x0000000000452ac3 in runtime.sigtramp ()
at /home/bcmills/go/src/runtime/sys_linux_amd64.s:265
#6 0x0000000000452bb0 in ?? ()
at /home/bcmills/go/src/runtime/sys_linux_amd64.s:349
#7 0x0000000000000000 in ?? ()
The faulting instruction in runtime.sigfwd
is the CALL
instruction that invokes fwdSig[sig]
(CC @aclements)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
bcmills commentedon Jun 21, 2017
If we can fix the
SIGSEGV
, where I'm going with this reproducer is that I think we'll want to either combine theruntime.sig.ignored
andruntime.sig.wanted
arrays, or remove theignored
array entirely and use a single bit per signal to indicate "handled or ignored".aclements commentedon Nov 8, 2017
Ping @bcmills, @ianlancetaylor.
bcmills commentedon Nov 9, 2017
I don't think I'm going to be able to get to this for 1.10.
aclements commentedon Nov 9, 2017
Thanks for the update. Bumping to 1.11.
odeke-em commentedon Jun 11, 2018
How's it going @bcmills? Just a kind ping as we had moved this to Go1.11, perhaps a Go1.12 bump too?
bcmills commentedon Jun 11, 2018
Sadly, yes. I think at this point in the cycle any attempt to mess with signal handling will introduce more risk than it eliminates.