Skip to content

runtime: SIGSEGV in runtime.deltimer on linux-mips-rtrk during ReadMemStats #43712

Closed
@bcmills

Description

@bcmills
##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46ee1c]

runtime stack:
runtime.throw(0x6a72fd, 0x2a)
	/tmp/gobuilder-mips64/go/src/runtime/panic.go:1112 +0x60 fp=0x6fb81e04 sp=0x6fb81df0 pc=0x4461cc
runtime.sigpanic()
	/tmp/gobuilder-mips64/go/src/runtime/signal_unix.go:714 +0x3b4 fp=0x6fb81e1c sp=0x6fb81e04 pc=0x463048
runtime.deltimer(0xc80000, 0x0)
	/tmp/gobuilder-mips64/go/src/runtime/time.go:317 +0x164 fp=0x6fb81e30 sp=0x6fb81e20 pc=0x46ee1c
time.stopTimer(...)
	/tmp/gobuilder-mips64/go/src/runtime/time.go:219
runtime.wakeScavenger()
	/tmp/gobuilder-mips64/go/src/runtime/mgcscavenge.go:197 +0xa4 fp=0x6fb81e40 sp=0x6fb81e30 pc=0x42c274
runtime.sysmon()
	/tmp/gobuilder-mips64/go/src/runtime/proc.go:5191 +0x428 fp=0x6fb81e94 sp=0x6fb81e40 pc=0x457198
runtime.mstart1()
	/tmp/gobuilder-mips64/go/src/runtime/proc.go:1275 +0xf0 fp=0x6fb81ea4 sp=0x6fb81e94 pc=0x44c830
runtime.mstart()
	/tmp/gobuilder-mips64/go/src/runtime/proc.go:1240 +0x68 fp=0x6fb81eb8 sp=0x6fb81ea4 pc=0x44c710

…

goroutine 176543 [waiting]:
runtime.systemstack_switch()
	/tmp/gobuilder-mips64/go/src/runtime/asm_mipsx.s:159 +0x8 fp=0xcea978 sp=0xcea974 pc=0x48be64
runtime.stopTheWorld(0x6994d3, 0xe)
	/tmp/gobuilder-mips64/go/src/runtime/proc.go:984 +0xa0 fp=0xcea998 sp=0xcea978 pc=0x44be60
runtime.ReadMemStats(0xcea9dc)
	/tmp/gobuilder-mips64/go/src/runtime/mstats.go:473 +0x48 fp=0xcea9ac sp=0xcea998 pc=0x43e358
testing.AllocsPerRun(0x3e8, 0xcebf9c, 0x0, 0x0)
	/tmp/gobuilder-mips64/go/src/testing/allocs.go:28 +0xcc fp=0xcebf78 sp=0xcea9ac pc=0x50c340
runtime_test.TestStringConcatenationAllocs(0xd121c0)
	/tmp/gobuilder-mips64/go/src/runtime/malloc_test.go:140 +0x60 fp=0xcebfa4 sp=0xcebf78 pc=0x5c459c
testing.tRunner(0xd121c0, 0x6ae020)
	/tmp/gobuilder-mips64/go/src/testing/testing.go:1194 +0x130 fp=0xcebfe4 sp=0xcebfa4 pc=0x5176d0
runtime.goexit()
	/tmp/gobuilder-mips64/go/src/runtime/asm_mipsx.s:635 +0x4 fp=0xcebfe4 sp=0xcebfe4 pc=0x48e2ec
created by testing.(*T).Run
	/tmp/gobuilder-mips64/go/src/testing/testing.go:1239 +0x294

2021-01-14T21:55:29-eb33002/linux-mips-rtrk
2020-12-17T20:25:45-8fcf318/linux-mips-rtrk

CC @aclements @mknyszek @prattmic @mengzhuo; compare #43625.

Activity

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jan 15, 2021
added this to the Backlog milestone on Jan 15, 2021
bcmills

bcmills commented on Jan 15, 2021

@bcmills
ContributorAuthor

Both of those failures are during the current-cycle code freeze, as is the one in #43625, but I notice that #43625 was unmarked as release-blocker without further comment.

@golang/release, should this be milestoned to 1.16, given that it may be a regression from (or exposed by) the timer changes in 1.16? (Marking as release-blocker until that is answered, but I'm not particularly invested in the answer.)

modified the milestones: Backlog, Go1.16 on Jan 15, 2021
mengzhuo

mengzhuo commented on Jan 15, 2021

@mengzhuo
Contributor

I think this issue is not related to #43625

Might related to #35541 since tpp shouldn't be nil

atomic.Xadd(&tpp.deletedTimers, 1)

CC @cherrymui

cherrymui

cherrymui commented on Jan 15, 2021

@cherrymui
Member

I don't think this is related to #35541, at least not directly. There, the problem is likely a malformed pointer but not 0. (Of course a malformed pointer could lead to memory corruption which could lead to anything.)

I'm not really familiar with the timer code to tell for sure that tpp cannot be 0.

				// Must fetch t.pp before changing status,
				// as cleantimers in another goroutine
				// can clear t.pp of a timerDeleted timer.
				tpp := t.pp.ptr()
				if !atomic.Cas(&t.status, timerModifying, timerDeleted) {

This comment, and MIPS being a weak memory model machine, make me worried.

prattmic

prattmic commented on Jan 15, 2021

@prattmic
Member

I've been looking at the same thing. t.pp should be set before transition to timerModifying (and doesn't change on transition to timerModifiedLater), and the strongly-ordered CAS should act as a barrier, but I'm still going through to see if we missed a case.

ianlancetaylor

ianlancetaylor commented on Jan 20, 2021

@ianlancetaylor
Contributor

We only have two failing cases to look at, but they are very similar.

In both cases it is running the test GOMAXPROCS=2 runtime -cpu=1,2,4 -quick.

The test running is TestStringConcatenationAllocs. It calls testing.AllocsPerRun which calls runtime.ReadMemStats which calls stopTheWorld. In both stack traces the goroutine is waiting for a call to systemstack to complete. That call is to stopTheWorldWithSema. We have no stack trace for the call to stopTheWorldWithSema, so we don't know where it is.

In both failures sysmon wakes up while the world is stopping and decides to call wakeScavenger. wakeScavenger calls stopTimer(scavenge.timer), and that crashes because scavenge.timer.pp == 0 while scavenge.timer.status is either timerWaiting or timerModifiedLater. This should be impossible.

testing.AllocsPerRun starts by calling runtime.GOMAXPROCS(1). It seems likely that this is while the test is running with GOMAXPROCS set to 2 or 4. That means that we will have gone through (*p).destroy which calls moveTimers. This is interesting because if the timer status is timerWaiting, moveTimers sets t.pp = 0 without first adjusting the status. It then calls doaddtimer which sets t.pp to the new P. I don't see how this could possibly cause a problem, as there are quite a few atomic operations around, but it is a case where we have status == timerWaiting and t.pp = 0, which is the case that crashes.

gopherbot

gopherbot commented on Jan 20, 2021

@gopherbot
Contributor

Change https://golang.org/cl/284775 mentions this issue: runtime: don't adjust timer pp field in timerWaiting status

mknyszek

mknyszek commented on Jan 20, 2021

@mknyszek
Contributor

@ianlancetaylor I think you've figured it out: sysmon runs without a P and calls deltimer, and that call could happen while the world is stopped, while timers are being moved due to destroy. Since there is this window of time where t.pp == 0 while in timerWaiting in moveTimers and a deltimer could run concurrently and assume t.pp != nil after observing timerWaiting, yeah I think that's it.

I think your fix is exactly right, since then the deltimer will see timerMoving and will spin.

More broadly, maybe the scavenger really shouldn't use timers, or sysmon really shouldn't be the one kicking it. Sigh.

mknyszek

mknyszek commented on Jan 21, 2021

@mknyszek
Contributor

It turns out that this is also a problem in Go 1.15.

@gopherbot Please open a backport issue for Go 1.15.

gopherbot

gopherbot commented on Jan 21, 2021

@gopherbot
Contributor

Backport issue(s) opened: #43833 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

added
NeedsFixThe path to resolution is known, but the work has not been done.
and removed
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jan 21, 2021
dmitshur

dmitshur commented on Jan 21, 2021

@dmitshur
Member

@mknyszek Is it known whether it's also a problem for 1.14?

mknyszek

mknyszek commented on Jan 21, 2021

@mknyszek
Contributor

I don't believe so. wakeScavenger and the whole mechanism around it was only added in Go 1.15.

gopherbot

gopherbot commented on Jan 27, 2021

@gopherbot
Contributor

Change https://golang.org/cl/287092 mentions this issue: [release-branch.go1.15] runtime: don't adjust timer pp field in timerWaiting status

locked and limited conversation to collaborators on Jan 27, 2022
added a commit that references this issue on Apr 29, 2024
d2d155d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mengzhuo@mknyszek@prattmic@dmitshur@ianlancetaylor

        Issue actions

          runtime: SIGSEGV in runtime.deltimer on linux-mips-rtrk during ReadMemStats · Issue #43712 · golang/go