Closed
Description
##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46ee1c]
runtime stack:
runtime.throw(0x6a72fd, 0x2a)
/tmp/gobuilder-mips64/go/src/runtime/panic.go:1112 +0x60 fp=0x6fb81e04 sp=0x6fb81df0 pc=0x4461cc
runtime.sigpanic()
/tmp/gobuilder-mips64/go/src/runtime/signal_unix.go:714 +0x3b4 fp=0x6fb81e1c sp=0x6fb81e04 pc=0x463048
runtime.deltimer(0xc80000, 0x0)
/tmp/gobuilder-mips64/go/src/runtime/time.go:317 +0x164 fp=0x6fb81e30 sp=0x6fb81e20 pc=0x46ee1c
time.stopTimer(...)
/tmp/gobuilder-mips64/go/src/runtime/time.go:219
runtime.wakeScavenger()
/tmp/gobuilder-mips64/go/src/runtime/mgcscavenge.go:197 +0xa4 fp=0x6fb81e40 sp=0x6fb81e30 pc=0x42c274
runtime.sysmon()
/tmp/gobuilder-mips64/go/src/runtime/proc.go:5191 +0x428 fp=0x6fb81e94 sp=0x6fb81e40 pc=0x457198
runtime.mstart1()
/tmp/gobuilder-mips64/go/src/runtime/proc.go:1275 +0xf0 fp=0x6fb81ea4 sp=0x6fb81e94 pc=0x44c830
runtime.mstart()
/tmp/gobuilder-mips64/go/src/runtime/proc.go:1240 +0x68 fp=0x6fb81eb8 sp=0x6fb81ea4 pc=0x44c710
…
goroutine 176543 [waiting]:
runtime.systemstack_switch()
/tmp/gobuilder-mips64/go/src/runtime/asm_mipsx.s:159 +0x8 fp=0xcea978 sp=0xcea974 pc=0x48be64
runtime.stopTheWorld(0x6994d3, 0xe)
/tmp/gobuilder-mips64/go/src/runtime/proc.go:984 +0xa0 fp=0xcea998 sp=0xcea978 pc=0x44be60
runtime.ReadMemStats(0xcea9dc)
/tmp/gobuilder-mips64/go/src/runtime/mstats.go:473 +0x48 fp=0xcea9ac sp=0xcea998 pc=0x43e358
testing.AllocsPerRun(0x3e8, 0xcebf9c, 0x0, 0x0)
/tmp/gobuilder-mips64/go/src/testing/allocs.go:28 +0xcc fp=0xcebf78 sp=0xcea9ac pc=0x50c340
runtime_test.TestStringConcatenationAllocs(0xd121c0)
/tmp/gobuilder-mips64/go/src/runtime/malloc_test.go:140 +0x60 fp=0xcebfa4 sp=0xcebf78 pc=0x5c459c
testing.tRunner(0xd121c0, 0x6ae020)
/tmp/gobuilder-mips64/go/src/testing/testing.go:1194 +0x130 fp=0xcebfe4 sp=0xcebfa4 pc=0x5176d0
runtime.goexit()
/tmp/gobuilder-mips64/go/src/runtime/asm_mipsx.s:635 +0x4 fp=0xcebfe4 sp=0xcebfe4 pc=0x48e2ec
created by testing.(*T).Run
/tmp/gobuilder-mips64/go/src/testing/testing.go:1239 +0x294
2021-01-14T21:55:29-eb33002/linux-mips-rtrk
2020-12-17T20:25:45-8fcf318/linux-mips-rtrk
CC @aclements @mknyszek @prattmic @mengzhuo; compare #43625.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
bcmills commentedon Jan 15, 2021
Both of those failures are during the current-cycle code freeze, as is the one in #43625, but I notice that #43625 was unmarked as release-blocker without further comment.
@golang/release, should this be milestoned to 1.16, given that it may be a regression from (or exposed by) the timer changes in 1.16? (Marking as release-blocker until that is answered, but I'm not particularly invested in the answer.)
mengzhuo commentedon Jan 15, 2021
I think this issue is not related to #43625
Might related to #35541 since tpp shouldn't be nil
go/src/runtime/time.go
Line 317 in b78b427
CC @cherrymui
cherrymui commentedon Jan 15, 2021
I don't think this is related to #35541, at least not directly. There, the problem is likely a malformed pointer but not 0. (Of course a malformed pointer could lead to memory corruption which could lead to anything.)
I'm not really familiar with the timer code to tell for sure that tpp cannot be 0.
This comment, and MIPS being a weak memory model machine, make me worried.
prattmic commentedon Jan 15, 2021
I've been looking at the same thing.
t.pp
should be set before transition totimerModifying
(and doesn't change on transition totimerModifiedLater
), and the strongly-ordered CAS should act as a barrier, but I'm still going through to see if we missed a case.ianlancetaylor commentedon Jan 20, 2021
We only have two failing cases to look at, but they are very similar.
In both cases it is running the test
GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
.The test running is
TestStringConcatenationAllocs
. It callstesting.AllocsPerRun
which callsruntime.ReadMemStats
which callsstopTheWorld
. In both stack traces the goroutine is waiting for a call tosystemstack
to complete. That call is tostopTheWorldWithSema
. We have no stack trace for the call tostopTheWorldWithSema
, so we don't know where it is.In both failures
sysmon
wakes up while the world is stopping and decides to callwakeScavenger
.wakeScavenger
callsstopTimer(scavenge.timer)
, and that crashes becausescavenge.timer.pp == 0
whilescavenge.timer.status
is eithertimerWaiting
ortimerModifiedLater
. This should be impossible.testing.AllocsPerRun
starts by callingruntime.GOMAXPROCS(1)
. It seems likely that this is while the test is running withGOMAXPROCS
set to2
or4
. That means that we will have gone through(*p).destroy
which callsmoveTimers
. This is interesting because if the timerstatus
istimerWaiting
,moveTimers
setst.pp = 0
without first adjusting the status. It then callsdoaddtimer
which setst.pp
to the new P. I don't see how this could possibly cause a problem, as there are quite a few atomic operations around, but it is a case where we havestatus == timerWaiting
andt.pp = 0
, which is the case that crashes.gopherbot commentedon Jan 20, 2021
Change https://golang.org/cl/284775 mentions this issue:
runtime: don't adjust timer pp field in timerWaiting status
mknyszek commentedon Jan 20, 2021
@ianlancetaylor I think you've figured it out:
sysmon
runs without a P and callsdeltimer
, and that call could happen while the world is stopped, while timers are being moved due todestroy
. Since there is this window of time wheret.pp == 0
while intimerWaiting
inmoveTimers
and adeltimer
could run concurrently and assumet.pp != nil
after observingtimerWaiting
, yeah I think that's it.I think your fix is exactly right, since then the
deltimer
will seetimerMoving
and will spin.More broadly, maybe the scavenger really shouldn't use timers, or
sysmon
really shouldn't be the one kicking it. Sigh.mknyszek commentedon Jan 21, 2021
It turns out that this is also a problem in Go 1.15.
@gopherbot Please open a backport issue for Go 1.15.
gopherbot commentedon Jan 21, 2021
Backport issue(s) opened: #43833 (for 1.15).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
dmitshur commentedon Jan 21, 2021
@mknyszek Is it known whether it's also a problem for 1.14?
mknyszek commentedon Jan 21, 2021
I don't believe so.
wakeScavenger
and the whole mechanism around it was only added in Go 1.15.gopherbot commentedon Jan 27, 2021
Change https://golang.org/cl/287092 mentions this issue:
[release-branch.go1.15] runtime: don't adjust timer pp field in timerWaiting status
[release-branch.go1.15] runtime: don't adjust timer pp field in timer…
runtime: don't adjust timer pp field in timerWaiting status