Skip to content

runtime: WIP: fix infinite loop in lockextra on linux/arm #34979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nyuichi
Copy link
Contributor

@nyuichi nyuichi commented Oct 18, 2019

This commit fixes issue #34391, which is due to an incorrect patch
merged in CL 192937.

sigtrampgo is modified to record incoming signals in a globally shared
atomic bitmask when the G register is clobbered. When the execution
exits from vdso it checks if there are pending signals and in that
case it re-raises them to its own process.

Fixes #34391


This PR is work in progress.
Following the discussion in #34391, I implemented a patch to fix signal handling during VDSO on arm.
However, my patch is still incomplete due to the following concerns

  1. Some tests are still failing on my arm machine (raspberry pi3). Specifically os/signal and TestSIGUSR1InVDSO in runtime fail because of stack overflows:
...
fatal: morestack on g0
FAIL    os/signal       0.092s
...
--- FAIL: TestSIGUSR1InVDSO (20.67s)
    crash_test.go:95: testprog SIGUSR1InVDSO exit status: exit status 2
    crash_test.go:160: output:
        runtime: newstack sp=0xc09c24 stack=[0xc2a000, 0xc2a800]
                morebuf={pc:0x4fb44 sp:0xc09c24 lr:0x0}
                sched={pc:0x4f9ec sp:0xc09c24 lr:0x4fb44 ctxt:0x0}
        runtime: gp=0xc000e0, goid=1, gp->status=0x2
         runtime: split stack overflow: 0xc09c24 < 0xc2a000
        fatal error: runtime: split stack overflow

        runtime stack:
        runtime.throw(0xfc0a0, 0x1d)
                /home/pi/go/src/runtime/panic.go:774 +0x5c
        runtime.newstack()
                /home/pi/go/src/runtime/stack.go:1008 +0x728
        runtime.morestack()
                /home/pi/go/src/runtime/asm_arm.s:420 +0x60

        goroutine 1 [running]:
        runtime.sigAddPending(0xa)
                /home/pi/go/src/runtime/signal_unix.go:293 +0x6c fp=0xc09c24 sp=0xc09c24 pc=0x4f9ec
        runtime.sigtrampgo(0xa, 0xc09c88, 0xc09d08)
                /home/pi/go/src/runtime/signal_unix.go:351 +0x90 fp=0xc09c78 sp=0xc09c24 pc=0x4fb44
        runtime: unexpected return pc for runtime.sigtramp called from 0x7ed37668
        stack: frame={sp:0xc09c78, fp:0xc09c88} stack=[0xc2a000,0xc2a800)

        runtime.sigtramp(0x0, 0x0, 0x7721, 0x3e8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
                /home/pi/go/src/runtime/sys_linux_arm.s:467 +0x28 fp=0xc09c88 sp=0xc09c78 pc=0x67310

        goroutine 6 [runnable]:
        sync.(*RWMutex).RUnlock(0xc1415c)
                /home/pi/go/src/sync/rwmutex.go:62 +0x58
        os.(*Process).signal(0xc14150, 0x117738, 0x115d30, 0x0, 0x0)
                /home/pi/go/src/os/exec_unix.go:77 +0x140
        os.(*Process).Signal(...)
                /home/pi/go/src/os/exec.go:131
        main.signalSIGUSR1InVDSO.func1(0xc14150, 0xc1c100)
                /home/pi/go/src/runtime/testdata/testprog/vdso.go:66 +0x30
        created by main.signalSIGUSR1InVDSO
                /home/pi/go/src/runtime/testdata/testprog/vdso.go:64 +0x84


        wanted:
        success
FAIL
FAIL    runtime 274.555s
  1. Perfomance is pretty bad. As found in the above code snippet, runtime test takes about 4.5 minutes to finish. Overall time of ./all.bash is more than 1 hour. This should be caused by heavy contention on the new signal pending queue introduced in my patch.

  2. Assembly code is incomplete. I am unfamiliar with the calling convention of Go on arm and I think my assembly code to call sigClearPending from vdso callers can be incorrect or at least improvable.

Could you possibly review this and give some comments? (cc: @ianlancetaylor)

@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Oct 18, 2019
@gopherbot
Copy link
Contributor

This PR (HEAD: 1c401a7) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/201899 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Brad Fitzpatrick:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/201899.
After addressing review feedback, remember to publish your drafts!

This commit fixes issue golang#34391, which is due to an incorrect patch
merged in CL 192937.

sigtrampgo is modified to record incoming signals in a globally shared
atomic bitmask when the G register is clobbered. When the execution
exits from vdso it checks if there are pending signals and in that
case it re-raises them to its own process.
@gopherbot
Copy link
Contributor

This PR (HEAD: 9aeee0f) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/201899 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

(7 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/201899.
After addressing review feedback, remember to publish your drafts!

@nyuichi
Copy link
Contributor Author

nyuichi commented Oct 25, 2019

It seems that CL 202759 indeed fixed the issue on my 32bit arm environment. I am closing this.

@nyuichi nyuichi closed this Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runtime: infinite loop in lockextra on linux/arm
3 participants