Skip to content

cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO #62489

Closed
@elindsey

Description

@elindsey

What version of Go are you using (go version)?

$ go version
1.21.1, cross compiling on a Linux host to CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

FreeBSD, amd64

What did you do?

Unfortunately I can't share the original binary and I'm having trouble getting it to reproduce in a smaller program. I apologize, I know it's not ideal and might be intractable without - still working on a smaller repro.

We have a server process that runs on FreeBSD/amd64, does not use cgo. It was recently upgraded to go 1.21.1, and I enabled PGO to see what would happen (gathering a profile from the existing go 1.20.7 production deployment).

The produced binary panics immediately on startup. 1.21.1 without PGO works fine on FreeBSD/amd64, go 1.20.7 with PGO and the same profile on FreeBSD/amd64 works fine, and 1.21.1 with PGO works fine on both Linux/amd64 and macOS/arm64.

Backtrace follows:

(gdb) run
Starting program: /var/svm/f 
[New LWP 363928 of process 68756]
[New LWP 363929 of process 68756]
[New LWP 363930 of process 68756]
[New LWP 363931 of process 68756]
fatal: morestack on g0

Thread 4 received signal SIGTRAP, Trace/breakpoint trapwarning: could not convert 'si_code' from the host encoding (ISO-8859-1) to UTF-32.
This normally should not happen, please file a bug report.
.
Breakpoint.
[Switching to LWP 363930 of process 68756]
0x0000000000478186 in ?? ()
(gdb) bt
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...)
    at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, 
    tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 5 (LWP 363931 of process 68756):
#0  runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57
#1  0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174
#2  0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162
#3  0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161
#4  runtime.notesleep (n=0xc000100150) at .goroot/1.21.1/src/runtime/lock_futex.go:160
#5  0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632
#6  runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536
#7  0x000000000044667e in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3229
#8  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#9  runtime.park_m (gp=0xc0001024e0) at .goroot/1.21.1/src/runtime/proc.go:3745
#10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#11 0x0000000000000000 in ?? ()

Thread 4 (LWP 363930 of process 68756):
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...) at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()

Thread 3 (LWP 363929 of process 68756):
#0  runtime.sys_umtx_op () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:57
#1  0x000000000043a8b3 in runtime.futexsleep1 (addr=, val=0x0, ns=) at .goroot/1.21.1/src/runtime/os_freebsd.go:174
#2  0x000000000040c0fe in runtime.notesleep.futexsleep.func1 () at .goroot/1.21.1/src/runtime/os_freebsd.go:162
#3  0x000000000040c067 in runtime.futexsleep (ns=0xffffffffffffffff, addr=, val=) at .goroot/1.21.1/src/runtime/os_freebsd.go:161
#4  runtime.notesleep (n=0xc000080550) at .goroot/1.21.1/src/runtime/lock_futex.go:160
#5  0x0000000000444bca in runtime.mPark () at .goroot/1.21.1/src/runtime/proc.go:1632
#6  runtime.stopm () at .goroot/1.21.1/src/runtime/proc.go:2536
#7  0x000000000044570a in runtime.startlockedm (gp=) at .goroot/1.21.1/src/runtime/proc.go:2808
#8  0x0000000000448c13 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3628
#9  runtime.park_m (gp=0xc000006d00) at .goroot/1.21.1/src/runtime/proc.go:3745
#10 0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#11 0x0000000000000000 in ?? ()

Thread 2 (LWP 363928 of process 68756):
#0  runtime.usleep () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:477
#1  0x000000000044d84b in runtime.sysmon () at .goroot/1.21.1/src/runtime/proc.go:5528
#2  0x00000000004434d3 in runtime.mstart1 () at .goroot/1.21.1/src/runtime/proc.go:1600
#3  0x0000000000443416 in runtime.mstart0 () at .goroot/1.21.1/src/runtime/proc.go:1557
#4  0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394
#5  0x0000000000479aae in runtime.thr_start () at .goroot/1.21.1/src/runtime/sys_freebsd_amd64.s:86
#6  0x0000000000000000 in ?? ()

Thread 1 (LWP 101416 of process 68756):
#0  0x000000000042e4b1 in runtime.(*mheap).initSpan (h=0xc3d1a0 , s=0x8477665f0, typ=0x0, spanclass=0x4b, base=, npages=0x2) at .goroot/1.21.1/src/runtime/mheap.go:1404
#1  0x000000000042e1f3 in runtime.(*mheap).allocSpan (h=0xc3d1a0 , npages=0x2, typ=0x0, spanclass=0x4b, s=) at .goroot/1.21.1/src/runtime/mheap.go:1344
#2  0x0000000000419a7f in runtime.(*mcentral).grow.(*mheap).alloc.func1 () at .goroot/1.21.1/src/runtime/mheap.go:968
#3  0x000000000047650a in runtime.systemstack () at .goroot/1.21.1/src/runtime/asm_amd64.s:509
#4  0x00007fffffffe9c8 in ?? ()
#5  0x000000000047a93f in runtime.newproc (fn=0x47638f ) at :1
#6  0x0000000000476405 in runtime.mstart () at .goroot/1.21.1/src/runtime/asm_amd64.s:394
#7  0x000000000047638f in runtime.rt0_go () at .goroot/1.21.1/src/runtime/asm_amd64.s:358
#8  0x0000000000000001 in ?? ()
#9  0x00007fffffffea18 in ?? ()
#10 0x0000000000000000 in ?? ()

Nothing there looks like user code to me. On some runs I do see a few things starting to get runtime.doInit()'d in the stacks (some compiled regex and so on), but this seems to panic very early. While I try to get a smaller repro, are there any things in the stack that jump out, or any suggestions on how to debug this?

Activity

changed the title [-]'fatal: morestack on g0' on FreeBSD amd64 with PGO[/-] [+]cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO[/+] on Sep 7, 2023
cherrymui

cherrymui commented on Sep 7, 2023

@cherrymui
Member

Thanks for report!

(gdb) bt
#0  0x0000000000478186 in ?? ()
#1  0x0000000000476585 in runtime.morestack () at .goroot/1.21.1/src/runtime/asm_amd64.s:560
#2  0x000000000043a490 in runtime.netpoll (delay=, ~r0=...)
    at .goroot/1.21.1/src/runtime/netpoll_kqueue.go:121
#3  0x000000000044633f in runtime.findRunnable (gp=, inheritTime=, 
    tryWakeP=) at .goroot/1.21.1/src/runtime/proc.go:3191
#4  0x0000000000448c56 in runtime.schedule () at .goroot/1.21.1/src/runtime/proc.go:3582
#5  runtime.park_m (gp=0xc000006ea0) at .goroot/1.21.1/src/runtime/proc.go:3745
#6  0x000000000047648e in runtime.mcall () at .goroot/1.21.1/src/runtime/asm_amd64.s:458
#7  0x0000000000000000 in ?? ()

This is interesting. The stack looks totally valid, not sure why it calls morestack... Could you print the SP at frame 2 (the runtime.netpoll frame, and perhaps other frames as well) in GDB, and also dump the content of the G structure pointed by R14 register (something like x/10a $r14)? Thanks.

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Sep 7, 2023
added this to the Go1.22 milestone on Sep 7, 2023
elindsey

elindsey commented on Sep 7, 2023

@elindsey
Author

I forgot to save the core last time, so this is a new execution but same backtrace. Let me know if that got all the info you were looking for!

GH was interpreting some of the <> as html tags, even in a pre block - so I put it in a gist. https://gist.github.com/elindsey/3959c40c20360d41a49f0bd3e6b5074b

cherrymui

cherrymui commented on Sep 7, 2023

@cherrymui
Member

Thanks! The SP and stack look quite valid.

(gdb) x/10a $r14
0xc0000071e0:	0xc00008a000	0xc00008c000

This is g.stack.lo and g.stack.hi, i.e. the stack bounds. It has 8 KB in size, which matches https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=1941 (as this is a non-cgo program). 8 KB g0 stack looks rather small to me. Maybe due to PGO the stack frames are larger and just pushes it over the limit... Maybe we should increase the g0 stack size a bit...

cherrymui

cherrymui commented on Sep 7, 2023

@cherrymui
Member

@elindsey could you try if just increasing the g0 stack size to 16 KB would fix the issue? That is, apply this patch

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 9fd200ea32..afb33c1e8b 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -1543,7 +1543,7 @@ func mstart0() {
 		// but is somewhat arbitrary.
 		size := gp.stack.hi
 		if size == 0 {
-			size = 8192 * sys.StackGuardMultiplier
+			size = 16384 * sys.StackGuardMultiplier
 		}
 		gp.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
 		gp.stack.lo = gp.stack.hi - size + 1024
@@ -1939,7 +1939,7 @@ func allocm(pp *p, fn func(), id int64) *m {
 	if iscgo || mStackIsSystemAllocated() {
 		mp.g0 = malg(-1)
 	} else {
-		mp.g0 = malg(8192 * sys.StackGuardMultiplier)
+		mp.g0 = malg(16384 * sys.StackGuardMultiplier)
 	}
 	mp.g0.m = mp

And rebuild the program with the same profile. Thanks.

elindsey

elindsey commented on Sep 7, 2023

@elindsey
Author

Bumping the stack size to 16KB did fix it - I'm no longer getting the crash on startup. 🙂

gopherbot

gopherbot commented on Sep 8, 2023

@gopherbot
Contributor

Change https://go.dev/cl/526995 mentions this issue: runtime: increase g0 stack size in non-cgo case

self-assigned this
on Sep 8, 2023
cherrymui

cherrymui commented on Sep 8, 2023

@cherrymui
Member

@elindsey thanks for confirming!

Since this issue and #62120 are similar with the same fix, I'll use a single backport issue for both. See #62537. Thanks.

gopherbot

gopherbot commented on Sep 8, 2023

@gopherbot
Contributor

Change https://go.dev/cl/527055 mentions this issue: [release-branch.go1.21] runtime: increase g0 stack size in non-cgo case

elindsey

elindsey commented on Sep 8, 2023

@elindsey
Author

Thank you very much @cherrymui!

added a commit that references this issue on Feb 7, 2024
c6d550a
locked and limited conversation to collaborators on Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @elindsey@bcmills@gopherbot@cherrymui

      Issue actions

        cmd/compile: 'fatal: morestack on g0' on FreeBSD amd64 with PGO · Issue #62489 · golang/go