Closed
Description
What version of Go are you using (go version
)?
go1.14.7, also observed the same problem with earlier versions: go1.13.7
Does this issue reproduce with the latest release?
Unknown
What operating system and processor architecture are you using (go env
)?
Linux 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
(Oracle Linux Server release 7.6)
What did you do?
Original issue reported here: hashicorp/consul#8558
The Consul binary appears to be crashing with errors from the runtime, ex:
fatal error: stopg: invalid status
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime: unexpected return pc for runtime.sigpanic called from 0x0
The stack trace from the application shows the line as a select
blocking on a channel from signal.Notify
(source). I'm not sure if that is related to the underlying issue, or has to do with the signal from the runtime.
Full trace
runtime: gp: gp=0xc000450180, goid=0, gp->atomicstatus=0
runtime: g: g=0xc0005b5680, goid=0, g->atomicstatus=0
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450300, goid=0, gp->atomicstatus=0
runtime: g: g=0xc000604180, goid=0, g->atomicstatus=0
fatal error: unexpected signal during runtime execution
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450000, goid=0, gp->atomicstatus=0
runtime: g: g=0xc000566180, goid=0, g->atomicstatus=0
fatal error: stopg: invalid status
runtime stack:
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
goroutine 34 [GC worker (idle)]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00028ef60 sp=0xc00028ef58 pc=0x45d6d0
runtime.gcBgMarkWorker(0xc00005ef00)
/usr/local/go/src/runtime/mgc.go:1900 +0x1bd fp=0xc00028efd8 sp=0xc00028ef60 pc=0x41c38d
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00028efe0 sp=0xc00028efd8 pc=0x45f621
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1794 +0x77
goroutine 1 [select, 149 minutes]:
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000651500, 0xc000170020, 0x8, 0x8, 0x0)
/home/circleci/project/consul/command/agent/agent.go:331 +0x13eb
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000651500, 0xc000170020, 0x8, 0x8, 0xc0001452a0)
/home/circleci/project/consul/command/agent/agent.go:78 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00031c000, 0xc00031c000, 0x80, 0xc000145ac0)
/go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x1da
main.realMain(0xc00014a058)
/home/circleci/project/consul/main.go:53 +0x393
main.main()
/home/circleci/project/consul/main.go:20 +0x22
goroutine 20 [syscall, 149 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 23 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x50186a0)
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x26f
goroutine 0 [idle]:
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x452486]
runtime stack:
runtime.throw(0x300b2ed, 0x2a)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:378 +0x47c
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:159 +0x156
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0)
/usr/local/go/src/runtime/traceback.go:722 +0xf0
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000)
/usr/local/go/src/runtime/traceback.go:676 +0x52
runtime.tracebackothers(0xc0005b5680)
/usr/local/go/src/runtime/traceback.go:929 +0x1a6
runtime.dopanic_m(0xc0005b5680, 0x42f602, 0xc000aabdf8, 0x1)
/usr/local/go/src/runtime/panic.go:974 +0x29e
runtime.fatalthrow.func1()
/usr/local/go/src/runtime/panic.go:829 +0x5f
runtime.fatalthrow()
/usr/local/go/src/runtime/panic.go:826 +0x57
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0xc000c61dd8, fp:0xc000c61e08} stack=[0xc000c60000,0xc000c62000)
000000c000c61cd8: 000000c0007bc180 01000000000003e8
000000c000c61ce8: 0000000000000004 000000000000001f
000000c000c61cf8: 0000000000000000 0000000000000000
000000c000c61d08: 0000000000000001 0000000002fdb0bd
000000c000c61d18: 000000c000c61d60 000000000045bfbf <runtime.fatalthrow.func1+95>
I looked around this issue tracker for an existing issue. I saw some related things, but nothing that sounded like exactly the same problem.
One issue mentioned C signal handlers. There should not be any such handlers in this binary.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
davecheney commentedon Aug 28, 2020
Sadly, with the release of go 1.15, go 1.13 is no longer receiving active support.
Can you please try with the latest release of Go can confirm if the problem occurs.
dnephin commentedon Aug 28, 2020
Ah, that's right, thank you. I've asked they try with a more recent release, which uses go1.14.x.
[-][go1.13.15] fatal error: unexpected signal during runtime execution[/-][+]runtime: unexpected signal during runtime execution[/+]dnephin commentedon Sep 8, 2020
More details from hashicorp/consul#8558 (comment). They were able to reproduce with go1.14.7. The full log is attached. They also said it occurs only on hosts with Intel Optane connected in RAM mode.
The stack is a bit different, but the errors look similar
consul-agent-kn-0030_2020.09.02_trace.log
networkimprov commentedon Sep 8, 2020
cc @aclements @randall77 @prattmic
@gopherbot remove WaitingForInfo
ianlancetaylor commentedon Sep 8, 2020
This looks a lot like memory corruption.
fanatl commentedon Sep 9, 2020
Hello to all.
These fatal errors have appeared since version 1.12 and older.
And is repeated only on hosts with Intel optane as RAM.
davecheney commentedon Sep 9, 2020
All the more reason to determine if the binary contains data races or cgo related memory corruption.
networkimprov commentedon Sep 9, 2020
@dnephin have you built the app with the race detector?
dnephin commentedon Sep 9, 2020
I've been working on running the tests with the race detector enabled. We run some packages with
-race
, but not all. We don't build binaries with it enabled. Would that help identify the problem?networkimprov commentedon Sep 9, 2020
A data race could cause memory corruption, so building the app with the race detector may help.
woblerr commentedon Feb 20, 2021
@dnephin any new about this problem?
blanchonvincent commentedon May 16, 2021
@dnephin @woblerr I got the same issues in my code, and it broke since Go 1.14 with the introduction of the goroutine async preemption (I do not have control over the C code I am using, but it looks like it mess up with the signals), so I just disable the async preemption when running my tests/building the binary. Your logs suggests it could be the same issue (crash in the garbage collector that preempt the goroutines). Can you try running your tests with
GODEBUG=asyncpreemptoff=1
likeGODEBUG=asyncpreemptoff=1 go test ./...
to see if it is the cause?gopherbot commentedon Jun 17, 2021
Timed out in state WaitingForInfo. Closing.
(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)
networkimprov commentedon Jun 22, 2021
I previously proposed a way to disable async preemption in all builds of an app; see #40870
It was closed without action, but you might want to comment there to request reconsideration of the idea.