Skip to content

runtime: TestVDSO failures on linux-arm64-packet #35473

Closed
@bcmills

Description

@bcmills
--- FAIL: TestVDSO (71.53s)
    crash_test.go:95: testprog SignalInVDSO exit status: exit status 2
    crash_test.go:149: output:
        SIGQUIT: quit
        PC=0x6b4c8 m=1 sigcode=0
        
        goroutine 0 [idle]:
        runtime.usleep()
        	/workdir/go/src/runtime/sys_linux_arm64.s:148 +0x48
        runtime.sysmon()
        	/workdir/go/src/runtime/proc.go:4461 +0x9c
        runtime.mstart1()
        	/workdir/go/src/runtime/proc.go:1125 +0xb0
        runtime.mstart()
        	/workdir/go/src/runtime/proc.go:1090 +0x60
        
        goroutine 1 [running]:
        	goroutine running on other thread; stack unavailable
        
        goroutine 18 [sleep]:
        time.Sleep(0x5f5e100)
        	/workdir/go/src/runtime/time.go:247 +0xc0
        runtime/pprof.profileWriter(0x12c880, 0x40001b0018)
        	/workdir/go/src/runtime/pprof/pprof.go:765 +0x60
        created by runtime/pprof.StartCPUProfile
        	/workdir/go/src/runtime/pprof/pprof.go:750 +0x128

2019-11-08T21:27:51-b7d097a/linux-arm64-packet
2019-11-08T18:11:01-1fd3f8b/linux-arm64-packet
2019-11-08T16:20:17-4208dbe/linux-arm64-packet
2019-11-07T20:34:27-4751db9/linux-arm64-packet

See also #33574.

CC @ianlancetaylor @mengzhuo @nyuichi

Activity

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Nov 8, 2019
added this to the Go1.14 milestone on Nov 8, 2019
ianlancetaylor

ianlancetaylor commented on Nov 8, 2019

@ianlancetaylor
Contributor

As far as I can see this means that this loop in runtime/testdata/testprog/vdso.go

	t0 := time.Now()
	t1 := t0
	for t1.Sub(t0) < time.Second {
		t1 = time.Now()
	}

ran for more than one minute. Hmmm.

ianlancetaylor

ianlancetaylor commented on Nov 8, 2019

@ianlancetaylor
Contributor

Given that this just started to appear, I'm naturally suspicious of CL 203461 == 1b0b980.

CC @cherrymui

cherrymui

cherrymui commented on Nov 10, 2019

@cherrymui
Member

I think I have a guess:

  • a goroutine running in VDSO. It saves the g on the signal stack before entering VDSO.
  • a profiling signal comes. During the handling of the profiling signal, it calls nanotime, which saves g on the same signal stack before entering VDSO, and clears it after.
  • while the goroutine is still in VDSO, a preemption signal comes. Now sigFetchG fetches a nil G from the signal stack (as it is cleared in the previous step), then calls badsignal, then deadlocks in lockextra (as before CL http://golang.org/cl/202759).

I think we don't want to save G if we're already on the signal stack. This seems to make it work, running 1000 iterations without failure.

gopherbot

gopherbot commented on Nov 11, 2019

@gopherbot
Contributor

Change https://golang.org/cl/206397 mentions this issue: runtime: don't save G during VDSO if we're handling signal

locked and limited conversation to collaborators on Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.release-blocker

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ianlancetaylor@bcmills@gopherbot@cherrymui

        Issue actions

          runtime: TestVDSO failures on linux-arm64-packet · Issue #35473 · golang/go