Description
Go version
go version go1.24.2 linux/amd64
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/arch/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/arch/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1459677643=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/arch/test/go.mod'
GOMODCACHE='/home/arch/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/arch/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/arch/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.2'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
Run the following go program on a recent Linux kernel (6.13):
package main
import (
"runtime"
"time"
"golang.org/x/sys/unix"
)
const i = 16*1024
const bs = 1024
const sl = 1*time.Millisecond
func main() {
for {
time.Sleep(sl)
go func() {
runtime.LockOSThread()
b := make([]byte, bs)
for range i {
_, err := unix.Getrandom(b, 0)
if err != nil { panic(err) }
}
}()
b := make([]byte, bs)
_, err := unix.Getrandom(b, 0)
if err != nil { panic(err) }
}
}
I am not sure but I suspect that having a 6.11+ kernel (where getrandom is optimized to use vdso) is important.
It's also possible that amd64
is important, but haven't tried on other arches on 6.11+ so not sure.
This is my full uname -a
output in case helpful:
Linux ip-172-31-34-47 6.13.8-1-ec2 #1 SMP PREEMPT_DYNAMIC Mon, 24 Mar 2025 21:00:24 +0000 x86_64 GNU/Linux
I did not build/run it in any special way, just:
go build -o main main.go && ./main
The machine I ran on had 4 cores, which might be relevant for triggering it quickly while also avoiding thread exhaustion
, as pointed out here.
- Possible that others may need to use
taskset
/GOMAXPROCS
or adjust some of the constants in the repro code to hit it consistently
What did you see happen?
After ~5ish seconds, it outputs Segmentation fault (core dumped)
, with the following core dump output:
PID: 854360 (main)
UID: 1000 (arch)
GID: 1000 (arch)
Signal: 11 (SEGV)
Timestamp: Wed 2025-04-02 20:09:55 UTC (10s ago)
Command Line: ./main
Executable: /home/arch/test/main
Control Group: /user.slice/user-1000.slice/session-25.scope
Unit: session-25.scope
Slice: user-1000.slice
Session: 25
Owner UID: 1000 (arch)
Boot ID: 8235bd622064418bb4c88fcfb47876ec
Machine ID: 51b81e352cd5447891aebaad822ce91e
Hostname: ip-172-31-34-47
Storage: /var/lib/systemd/coredump/core.main.1000.8235bd622064418bb4c88fcfb47876ec.854360.1743624595000000.zst (present)
Size on Disk: 144.4K
Message: Process 854360 (main) of user 1000 dumped core.
Stack trace of thread 854441:
#0 0x0000000000410238 runtime.mallocgcSmallNoscan (/home/arch/test/main + 0x10238)
#1 0x0000000000463bd9 runtime.mallocgc (/home/arch/test/main + 0x63bd9)
#2 0x0000000000466149 runtime.growslice (/home/arch/test/main + 0x66149)
#3 0x00000000004613d6 runtime.vgetrandomPutState (/home/arch/test/main + 0x613d6)
#4 0x000000000043a265 runtime.mdestroy (/home/arch/test/main + 0x3a265)
#5 0x0000000000439f1f runtime.mstart0 (/home/arch/test/main + 0x39f1f)
#6 0x0000000000468b65 runtime.mstart (/home/arch/test/main + 0x68b65)
#7 0x000000000046c8ef runtime.clone (/home/arch/test/main + 0x6c8ef)
Stack trace of thread 854440:
#0 0x00007efcdb520411 n/a (linux-vdso.so.1 + 0x1411)
#1 0x000000000046ca18 runtime.vgetrandom1 (/home/arch/test/main + 0x6ca18)
#2 0x000000c00019eb48 n/a (n/a + 0x0)
#3 0x0000000000467cd5 runtime.vgetrandom (/home/arch/test/main + 0x67cd5)
#4 0x00000000004738a6 golang.org/x/sys/unix.Getrandom (/home/arch/test/main + 0x738a6)
#5 0x0000000000473ea9 main.main.func1 (/home/arch/test/main + 0x73ea9)
#6 0x000000000046aa81 runtime.goexit (/home/arch/test/main + 0x6aa81)
Stack trace of thread 854361:
#0 0x000000000046c277 runtime.usleep (/home/arch/test/main + 0x6c277)
#1 0x0000000000443585 runtime.sysmon (/home/arch/test/main + 0x43585)
#2 0x0000000000439fd3 runtime.mstart1 (/home/arch/test/main + 0x39fd3)
#3 0x0000000000439f15 runtime.mstart0 (/home/arch/test/main + 0x39f15)
#4 0x0000000000468b65 runtime.mstart (/home/arch/test/main + 0x68b65)
#5 0x000000000046c8ef runtime.clone (/home/arch/test/main + 0x6c8ef)
#6 0x000000c000020000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
What did you expect to see?
It to not crash.
For more context:
Dagger and Docker have both been unable to update to any version of go 1.24 from 1.23 due to periodic segmentation faults.
Multiple stack traces shared by other users/debuggers have shown crash stack traces involving runtime.vgetrandomPutState
and runtime.growslice
, matching what I repro'd in isolation above:
I took a look at the relevant lines from the stack traces:
And got the theory that:
- eb6f2c2 is the culprit
- It involved specific code paths followed when
- A goroutine's P is being destroyed due to
runtime.LockOSThread
being held at goexit - The
vgetrandomAlloc.states
slice was appended to such that it triggeredgrowslice
and thus tried to malloc, but at a point of the m/p lifecycle where that's not allowed (or just doesn't work for some other reason) - The use of
runtime.LockOSThread
is particularly relevant since it potentially explains why dagger/docker hit this so quickly but seemingly no other reports have surfaced; dagger/docker are some of the rare users of that API (due to doing container-y things)
- A goroutine's P is being destroyed due to
I am very very far from a go runtime expert, so not at all sure if the above is correct but it lead me to the repro code above that does indeed seem to consistently trigger this, whether by coincidence or not 🤷♂️
cc @zx2c4