Skip to content

runtime: segmentation fault from vgetrandomPutState and runtime.growslice w/ runtime.OSLockThread #73141

Closed
@sipsma

Description

@sipsma

Go version

go version go1.24.2 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/arch/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/arch/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1459677643=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/arch/test/go.mod'
GOMODCACHE='/home/arch/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/arch/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/arch/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.2'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

Run the following go program on a recent Linux kernel (6.13):

package main

import (
    "runtime"
    "time"

    "golang.org/x/sys/unix"
)

const i = 16*1024
const bs = 1024
const sl = 1*time.Millisecond

func main() {
    for {
        time.Sleep(sl)
        go func() {
            runtime.LockOSThread()
            b := make([]byte, bs)
            for range i {
		_, err := unix.Getrandom(b, 0)
                if err != nil { panic(err) }
            }
        }()
        b := make([]byte, bs)
        _, err := unix.Getrandom(b, 0)
        if err != nil { panic(err) }
    }
}

I am not sure but I suspect that having a 6.11+ kernel (where getrandom is optimized to use vdso) is important.

It's also possible that amd64 is important, but haven't tried on other arches on 6.11+ so not sure.

This is my full uname -a output in case helpful:

Linux ip-172-31-34-47 6.13.8-1-ec2 #1 SMP PREEMPT_DYNAMIC Mon, 24 Mar 2025 21:00:24 +0000 x86_64 GNU/Linux

I did not build/run it in any special way, just:

go build -o main main.go && ./main

The machine I ran on had 4 cores, which might be relevant for triggering it quickly while also avoiding thread exhaustion, as pointed out here.

  • Possible that others may need to use taskset/GOMAXPROCS or adjust some of the constants in the repro code to hit it consistently

What did you see happen?

After ~5ish seconds, it outputs Segmentation fault (core dumped), with the following core dump output:

           PID: 854360 (main)
           UID: 1000 (arch)
           GID: 1000 (arch)
        Signal: 11 (SEGV)
     Timestamp: Wed 2025-04-02 20:09:55 UTC (10s ago)
  Command Line: ./main
    Executable: /home/arch/test/main
 Control Group: /user.slice/user-1000.slice/session-25.scope
          Unit: session-25.scope
         Slice: user-1000.slice
       Session: 25
     Owner UID: 1000 (arch)
       Boot ID: 8235bd622064418bb4c88fcfb47876ec
    Machine ID: 51b81e352cd5447891aebaad822ce91e
      Hostname: ip-172-31-34-47
       Storage: /var/lib/systemd/coredump/core.main.1000.8235bd622064418bb4c88fcfb47876ec.854360.1743624595000000.zst (present)
  Size on Disk: 144.4K
       Message: Process 854360 (main) of user 1000 dumped core.

                Stack trace of thread 854441:
                #0  0x0000000000410238 runtime.mallocgcSmallNoscan (/home/arch/test/main + 0x10238)
                #1  0x0000000000463bd9 runtime.mallocgc (/home/arch/test/main + 0x63bd9)
                #2  0x0000000000466149 runtime.growslice (/home/arch/test/main + 0x66149)
                #3  0x00000000004613d6 runtime.vgetrandomPutState (/home/arch/test/main + 0x613d6)
                #4  0x000000000043a265 runtime.mdestroy (/home/arch/test/main + 0x3a265)
                #5  0x0000000000439f1f runtime.mstart0 (/home/arch/test/main + 0x39f1f)
                #6  0x0000000000468b65 runtime.mstart (/home/arch/test/main + 0x68b65)
                #7  0x000000000046c8ef runtime.clone (/home/arch/test/main + 0x6c8ef)

                Stack trace of thread 854440:
                #0  0x00007efcdb520411 n/a (linux-vdso.so.1 + 0x1411)
                #1  0x000000000046ca18 runtime.vgetrandom1 (/home/arch/test/main + 0x6ca18)
                #2  0x000000c00019eb48 n/a (n/a + 0x0)
                #3  0x0000000000467cd5 runtime.vgetrandom (/home/arch/test/main + 0x67cd5)
                #4  0x00000000004738a6 golang.org/x/sys/unix.Getrandom (/home/arch/test/main + 0x738a6)
                #5  0x0000000000473ea9 main.main.func1 (/home/arch/test/main + 0x73ea9)
                #6  0x000000000046aa81 runtime.goexit (/home/arch/test/main + 0x6aa81)

                Stack trace of thread 854361:
                #0  0x000000000046c277 runtime.usleep (/home/arch/test/main + 0x6c277)
                #1  0x0000000000443585 runtime.sysmon (/home/arch/test/main + 0x43585)
                #2  0x0000000000439fd3 runtime.mstart1 (/home/arch/test/main + 0x39fd3)
                #3  0x0000000000439f15 runtime.mstart0 (/home/arch/test/main + 0x39f15)
                #4  0x0000000000468b65 runtime.mstart (/home/arch/test/main + 0x68b65)
                #5  0x000000000046c8ef runtime.clone (/home/arch/test/main + 0x6c8ef)
                #6  0x000000c000020000 n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

What did you expect to see?

It to not crash.


For more context:

Dagger and Docker have both been unable to update to any version of go 1.24 from 1.23 due to periodic segmentation faults.

Multiple stack traces shared by other users/debuggers have shown crash stack traces involving runtime.vgetrandomPutState and runtime.growslice, matching what I repro'd in isolation above:

I took a look at the relevant lines from the stack traces:

And got the theory that:

  1. eb6f2c2 is the culprit
  2. It involved specific code paths followed when
    • A goroutine's P is being destroyed due to runtime.LockOSThread being held at goexit
    • The vgetrandomAlloc.states slice was appended to such that it triggered growslice and thus tried to malloc, but at a point of the m/p lifecycle where that's not allowed (or just doesn't work for some other reason)
    • The use of runtime.LockOSThread is particularly relevant since it potentially explains why dagger/docker hit this so quickly but seemingly no other reports have surfaced; dagger/docker are some of the rare users of that API (due to doing container-y things)

I am very very far from a go runtime expert, so not at all sure if the above is correct but it lead me to the repro code above that does indeed seem to consistently trigger this, whether by coincidence or not 🤷‍♂️

cc @zx2c4

Metadata

Metadata

Assignees

Labels

CriticalA critical problem that affects the availability or correctness of production systems built using GoNeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions