Description
What version of Go are you using (go version
)?
$ go version 1.21.1, cross compiling on a Linux host to CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
FreeBSD, amd64
What did you do?
Unfortunately I can't share the original binary and I'm having trouble getting it to reproduce in a smaller program. I apologize, I know it's not ideal and might be intractable without - still working on a smaller repro.
We have a server process that runs on FreeBSD/amd64, does not use cgo. It was recently upgraded to go 1.21.1, and I enabled PGO to see what would happen (gathering a profile from the existing go 1.20.7 production deployment).
The produced binary panics immediately on startup. 1.21.1 without PGO works fine on FreeBSD/amd64, go 1.20.7 with PGO and the same profile on FreeBSD/amd64 works fine, and 1.21.1 with PGO works fine on both Linux/amd64 and macOS/arm64.
Backtrace follows:
(gdb) run Starting program: /var/svm/f [New LWP 363928 of process 68756] [New LWP 363929 of process 68756] [New LWP 363930 of process 68756] [New LWP 363931 of process 68756] fatal: morestack on g0 Thread 4 received signal SIGTRAP, Trace/breakpoint trapwarning: could not convert 'si_code' from the host encoding (ISO-8859-1) to UTF-32. This normally should not happen, please file a bug report. . Breakpoint. [Switching to LWP 363930 of process 68756] 0x0000000000478186 in ?? () (gdb) bt #0 0x0000000000478186 in ?? () #1 0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560 #2 0x000000000043a490 in runtime.netpoll (delay=, ~r0=...) at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121 #3 0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191 #4 0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582 #5 runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745 #6 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458 #7 0x0000000000000000 in ?? () (gdb) thread apply all bt Thread 5 (LWP 363931 of process 68756): #0 runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57 #1 0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174 #2 0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162 #3 0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161 #4 runtime.notesleep (n=0xc000100150) at .goroot/1.21.1/src/runtime/lock_futex.go:160 #5 0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632 #6 runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536 #7 0x000000000044667e in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3229 #8 0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582 #9 runtime.park_m (gp=0xc0001024e0) at .goroot/1.21.1/src/runtime/proc.go:3745 #10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458 #11 0x0000000000000000 in ?? () Thread 4 (LWP 363930 of process 68756): #0 0x0000000000478186 in ?? () #1 0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560 #2 0x000000000043a490 in runtime.netpoll (delay=, ~r0=...) at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121 #3 0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191 #4 0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582 #5 runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745 #6 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458 #7 0x0000000000000000 in ?? () Thread 3 (LWP 363929 of process 68756): #0 runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57 #1 0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174 #2 0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162 #3 0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161 #4 runtime.notesleep (n=0xc000080550) at .goroot/1.21.1/src/runtime/lock_futex.go:160 #5 0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632 #6 runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536 #7 0x000000000044570a in runtime.startlockedm (gp=) at .goroot/1.21.1/src/runtime/proc.go:2808 #8 0x0000000000448c13 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3628 #9 runtime.park_m (gp=0xc000006d00) at .goroot/1.21.1/src/runtime/proc.go:3745 #10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458 #11 0x0000000000000000 in ?? () Thread 2 (LWP 363928 of process 68756): #0 runtime.usleep () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:477 #1 0x000000000044d84b in runtime.sysmon () at .goroot/1.21.1/src/runtime/proc.go:5528 #2 0x00000000004434d3 in runtime.mstart1 () at .goroot/1.21.1/src/runtime/proc.go:1600 #3 0x0000000000443416 in runtime.mstart0 () at .goroot/1.21.1/src/runtime/proc.go:1557 #4 0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394 #5 0x0000000000479aae in runtime.thr_start () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:86 #6 0x0000000000000000 in ?? () Thread 1 (LWP 101416 of process 68756): #0 0x000000000042e4b1 in runtime.(*mheap).initSpan (h=0xc3d1a0 , s=0x8477665f0, typ=0x0, spanclass=0x4b, base=, npages=0x2) at .goroot/1.21.1/src/runtime/mheap.go:1404 #1 0x000000000042e1f3 in runtime.(*mheap).allocSpan (h=0xc3d1a0 , npages=0x2, typ=0x0, spanclass=0x4b, s=) at .goroot/1.21.1/src/runtime/mheap.go:1344 #2 0x0000000000419a7f in runtime.(*mcentral).grow.(*mheap).alloc.func1 () at .goroot/1.21.1/src/runtime/mheap.go:968 #3 0x000000000047650a in runtime.systemstack () at .goroot/1.21.1/src/runtime/asm_amd64.s:509 #4 0x00007fffffffe9c8 in ?? () #5 0x000000000047a93f in runtime.newproc (fn=0x47638f ) at :1 #6 0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394 #7 0x000000000047638f in runtime.rt0_go () at .goroot/1.21.1/src/runtime/asm_amd64.s:358 #8 0x0000000000000001 in ?? () #9 0x00007fffffffea18 in ?? () #10 0x0000000000000000 in ?? ()
Nothing there looks like user code to me. On some runs I do see a few things starting to get runtime.doInit()'d in the stacks (some compiled regex and so on), but this seems to panic very early. While I try to get a smaller repro, are there any things in the stack that jump out, or any suggestions on how to debug this?
Activity
[-]'fatal: morestack on g0' on FreeBSD amd64 with PGO[/-][+]cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO[/+]cherrymui commentedon Sep 7, 2023
Thanks for report!
This is interesting. The stack looks totally valid, not sure why it calls morestack... Could you print the SP at frame 2 (the
runtime.netpoll
frame, and perhaps other frames as well) in GDB, and also dump the content of the G structure pointed by R14 register (something likex/10a $r14
)? Thanks.elindsey commentedon Sep 7, 2023
I forgot to save the core last time, so this is a new execution but same backtrace. Let me know if that got all the info you were looking for!
GH was interpreting some of the <> as html tags, even in a pre block - so I put it in a gist. https://gist.github.com/elindsey/3959c40c20360d41a49f0bd3e6b5074b
cherrymui commentedon Sep 7, 2023
Thanks! The SP and stack look quite valid.
This is
g.stack.lo
andg.stack.hi
, i.e. the stack bounds. It has 8 KB in size, which matches https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=1941 (as this is a non-cgo program). 8 KB g0 stack looks rather small to me. Maybe due to PGO the stack frames are larger and just pushes it over the limit... Maybe we should increase the g0 stack size a bit...cherrymui commentedon Sep 7, 2023
@elindsey could you try if just increasing the g0 stack size to 16 KB would fix the issue? That is, apply this patch
And rebuild the program with the same profile. Thanks.
elindsey commentedon Sep 7, 2023
Bumping the stack size to 16KB did fix it - I'm no longer getting the crash on startup. 🙂
gopherbot commentedon Sep 8, 2023
Change https://go.dev/cl/526995 mentions this issue:
runtime: increase g0 stack size in non-cgo case
cherrymui commentedon Sep 8, 2023
@elindsey thanks for confirming!
Since this issue and #62120 are similar with the same fix, I'll use a single backport issue for both. See #62537. Thanks.
gopherbot commentedon Sep 8, 2023
Change https://go.dev/cl/527055 mentions this issue:
[release-branch.go1.21] runtime: increase g0 stack size in non-cgo case
elindsey commentedon Sep 8, 2023
Thank you very much @cherrymui!
[release-branch.go1.21] runtime: increase g0 stack size in non-cgo case
[release-branch.go1.21] runtime: increase g0 stack size in non-cgo case
runtime: increase g0 stack size in non-cgo case