Description
Go version
go version go1.23.4 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='mipsle'
GOBIN=''
GOCACHE='/workspace/.cache/go-build/acap-acap/mipsle'
GOENV='/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/gomodcache'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.4'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/.config/go/telemetry'
GCCGO='gccgo'
GOMIPS='softfloat'
AR='mipsisa32r2el-axis-linux-gnu-ar'
CC='mipsisa32r2el-axis-linux-gnu-gcc -mel -mabi=32 -msoft-float -mno-synci --sysroot=/opt/axis/acapsdk/2.12.0/sysroots/mips32r2el-nf-poky-linux'
CXX='mipsisa32r2el-axis-linux-gnu-g++ -mel -mabi=32 -msoft-float -march=34kc -msynci --sysroot=/opt/axis/acapsdk/2.12.0/sysroots/mips32r2el-nf-poky-linux'
CGO_ENABLED='1'
GOMOD='/workspace/go.mod'
GOWORK=''
CGO_CFLAGS='-Os -pipe -feliminate-unused-debug-types -mel -mabi=32 -msoft-float -mno-synci --sysroot=/opt/axis/acapsdk/2.12.0/sysroots/mips32r2el-nf-poky-linux -I/opt/axis/acapsdk/2.12.0//sysroots/mips32r2el-nf-poky-linux/usr/include/axsdk -I/opt/axis/acapsdk/2.12.0//sysroots/mips32r2el-nf-poky-linux/usr/include/glib-2.0 -I/opt/axis/acapsdk/2.12.0//sysroots/mips32r2el-nf-poky-linux/usr/lib/glib-2.0/include'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='--sysroot=/opt/axis/acapsdk/2.12.0//sysroots/mips32r2el-nf-poky-linux -Wl,-O1 -Wl,--as-needed'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-msoft-float -mno-synci --sysroot=/opt/axis/acapsdk/2.12.0/sysroots/mips32r2el-nf-poky-linux -I . -fPIC -mabi=32 -march=mips32 -msoft-float -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build4227944803=/tmp/go-build -gno-record-gcc-switches'
Background
At Axis Communications we make network cameras and also software that run in these cameras. Axis cameras have Linux as OS and have been made with mipsle, armv7hf, and arm64 hardware. I'm part of one team that builds a server in Go that can run on computers and on Axis cameras. We build this server for Linux, Windows, and Mac OS and for amd64, x86, arm64, armv7hf, and mipsle hardware. This server software is installed and is running in thousands of instances.
In the same department there's another team that maintain an earlier generation of a solution that accomplish a similar thing. It is independent code from the server my team does. This earlier solution run on Linux in Axis cameras on arm64, armv7hf, and mipsle hardware and also run as a server.
Since both of these pieces of software run as a server they're supposed to run all the time and respond to request. Both server binaries are normally compressed with UPX on mips hardware. We have built and deployed these servers for several years and have never seen the issue reported in this bug report. The source code is proprietary and can't be shared.
What did you do?
We upgraded the version of Go that we build our server software with from 1.22.9 to 1.23.4.
What did you see happen?
A while after updating to using Go 1.23 and deploying our software, we started getting reports that our server had problems in Axis cameras with mips hardware. The problems we have observed for both of the server software are:
- Stopped sending data to cloud services.
- Stopped logging to syslog.
- Stopped responding to requests from cloud services.
And for the server we make in our team, also:
- Stopped resonding to local APIs (the earlier generation server doesn't have a local API).
- When running top, our server uses 70% to 90% CPU. (I'm not sure if this has been checked for the older generation server).
We now know that it takes several hours to a few days for these problems to happen after the process starts.
These problems have only been observed when running on Linux in an Axis camera with mipsle hardware and when building with Go 1.23.x. Not any other combination of OS, hardware or Go version.
As a next step we built a firmware version with strace and gdb and built the server software with debug info and didn't compress the binary with UPX. After a while the problems happened and we examined the process with strace and gdb.
strace output:
strace: Process 19765 attached with 11 threads
[pid 19787] futex(0x2080ad8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 19778] futex(0x2080fd8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 19775] futex(0x1749bec, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 19774] futex(0x174a1e0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 19773] futex(0x2048fd8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 19765] futex(0x173ec90, FUTEX_WAIT_PRIVATE, 0, NULL
gdb output:
(gdb) thread apply all backtrace
Thread 11 (LWP 19897):
#0 internal/runtime/atomic.spinLock () at internal/runtime/atomic/atomic_mipsx.s:287
#1 0x004c11e8 in internal/runtime/atomic.lockAndCheck (addr=0x203e260) at internal/runtime/atomic/atomic_mipsx.go:45
#2 0x004c1db8 in internal/runtime/atomic.Load64 (addr=0x203e260, val=<optimized out>) at internal/runtime/atomic/atomic_mipsx.go:91
#3 0x004c178c in internal/runtime/atomic.(*Uint64).Load (u=0x203e260, ~r0=<optimized out>) at internal/runtime/atomic/types.go:309
#4 0x00432704 in runtime.(*limiterEvent).stop (e=0x203e260, typ=1 '\001', now=119587286975287) at runtime/mgclimit.go:452
#5 0x0042fe34 in runtime.gcBgMarkWorker (ready=0x248a340) at runtime/mgc.go:1518
#6 0x0042fafc in runtime.gcBgMarkStartWorkers.gowrap1 () at runtime/mgc.go:1328
#7 0x004bac70 in runtime.goexit () at runtime/asm_mipsx.s:664
Thread 10 (LWP 19793):
#0 internal/runtime/atomic.spinLock () at internal/runtime/atomic/atomic_mipsx.s:287
#1 0x004c11e8 in internal/runtime/atomic.lockAndCheck (addr=0x203b1c8) at internal/runtime/atomic/atomic_mipsx.go:45
#2 0x004c1db8 in internal/runtime/atomic.Load64 (addr=0x203b1c8, val=<optimized out>) at internal/runtime/atomic/atomic_mipsx.go:91
#3 0x004c1324 in internal/runtime/atomic.(*Int64).Load (i=0x203b1c8, ~r0=<optimized out>) at internal/runtime/atomic/types.go:74
#4 0x00491278 in runtime.(*timers).wakeTime (ts=0x203b1a0, ~r0=<optimized out>) at runtime/time.go:877
#5 0x00491388 in runtime.(*timers).check (ts=0x203b1a0, now=0, rnow=<optimized out>, pollUntil=<optimized out>, ran=<optimized out>) at runtime/time.go:899
#6 0x00469f70 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at runtime/proc.go:3270
#7 0x0046c568 in runtime.schedule () at runtime/proc.go:3995
#8 0x0046cc58 in runtime.goschedImpl (gp=0x22fc7e8, preempted=true) at runtime/proc.go:4136
#9 0x0046cdbc in runtime.gopreempt_m (gp=0x22fc7e8) at runtime/proc.go:4153
#10 0x004b86fc in runtime.mcall () at runtime/asm_mipsx.s:141
#11 0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
Thread 9 (LWP 19789):
#0 internal/runtime/atomic.spinLock () at internal/runtime/atomic/atomic_mipsx.s:287
#1 0x004c11e8 in internal/runtime/atomic.lockAndCheck (addr=0x174b240 <runtime.gcController+160>) at internal/runtime/atomic/atomic_mipsx.go:45
#2 0x004c1c48 in internal/runtime/atomic.Xadd64 (addr=0x174b240 <runtime.gcController+160>, delta=2068, new=<optimized out>) at internal/runtime/atomic/atomic_mipsx.go:55
#3 0x004c145c in internal/runtime/atomic.(*Int64).Add (i=0x174b240 <runtime.gcController+160>, delta=2068, ~r0=<optimized out>) at internal/runtime/atomic/types.go:109
#4 0x00435f78 in runtime.gcDrain (gcw=0x203bc74, flags=2) at runtime/mgcmark.go:1236
#5 0x00435bc4 in runtime.gcDrainMarkWorkerDedicated (gcw=0x203bc74, untilPreempt=false) at runtime/mgcmark.go:1112
#6 0x0043028c in runtime.gcBgMarkWorker.func2 () at runtime/mgc.go:1504
#7 0x004b87a8 in runtime.systemstack () at runtime/asm_mipsx.s:188
#8 0x004b8688 in runtime.mstart () at runtime/asm_mipsx.s:89
Thread 8 (LWP 19787):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x2080ad8, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b39c in runtime.notesleep (n=0x2080ad8) at runtime/lock_futex.go:170
#3 0x00466fec in runtime.mPark () at runtime/proc.go:1865
#4 0x00468fe8 in runtime.stopm () at runtime/proc.go:2885
#5 0x00469ee8 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at runtime/proc.go:3622
#6 0x0046c568 in runtime.schedule () at runtime/proc.go:3995
#7 0x0046cab4 in runtime.park_m (gp=0x23c8128) at runtime/proc.go:4102
#8 0x004b86fc in runtime.mcall () at runtime/asm_mipsx.s:141
#9 0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
Thread 7 (LWP 19778):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x2080fd8, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b39c in runtime.notesleep (n=0x2080fd8) at runtime/lock_futex.go:170
#3 0x00466fec in runtime.mPark () at runtime/proc.go:1865
#4 0x00468fe8 in runtime.stopm () at runtime/proc.go:2885
#5 0x00469ee8 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at runtime/proc.go:3622
#6 0x0046c568 in runtime.schedule () at runtime/proc.go:3995
#7 0x0046d290 in runtime.goexit0 (gp=0x23caea8) at runtime/proc.go:4268
#8 0x004b86fc in runtime.mcall () at runtime/asm_mipsx.s:141
#9 0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
Thread 6 (LWP 19775):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x1749bec <runtime.newmHandoff+12>, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b39c in runtime.notesleep (n=0x1749bec <runtime.newmHandoff+12>) at runtime/lock_futex.go:170
#3 0x00468e88 in runtime.templateThread () at runtime/proc.go:2863
#4 0x00466ec4 in runtime.mstart1 () at runtime/proc.go:1834
#5 0x00466dd8 in runtime.mstart0 () at runtime/proc.go:1791
#6 0x004b8688 in runtime.mstart () at runtime/asm_mipsx.s:89
Thread 5 (LWP 19774):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x174a1e0 <runtime.sig>, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b6c4 in runtime.notetsleep_internal (n=0x174a1e0 <runtime.sig>, ns=-1, ~r0=<optimized out>) at runtime/lock_futex.go:193
#3 0x0041b81c in runtime.notetsleepg (n=0x174a1e0 <runtime.sig>, ns=-1, ~r0=<optimized out>) at runtime/lock_futex.go:247
#4 0x004b4410 in os/signal.signal_recv (~r0=<optimized out>) at runtime/sigqueue.go:152
#5 0x009644a8 in os/signal.loop () at os/signal/signal_unix.go:23
#6 0x004bac70 in runtime.goexit () at runtime/asm_mipsx.s:664
Thread 4 (LWP 19773):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x2048fd8, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b39c in runtime.notesleep (n=0x2048fd8) at runtime/lock_futex.go:170
#3 0x00466fec in runtime.mPark () at runtime/proc.go:1865
#4 0x00468fe8 in runtime.stopm () at runtime/proc.go:2885
#5 0x00469ee8 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at runtime/proc.go:3622
#6 0x0046c568 in runtime.schedule () at runtime/proc.go:3995
#7 0x0046cab4 in runtime.park_m (gp=0x2082fc8) at runtime/proc.go:4102
#8 0x004b86fc in runtime.mcall () at runtime/asm_mipsx.s:141
#9 0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
Thread 3 (LWP 19771):
#0 internal/runtime/atomic.spinLock () at internal/runtime/atomic/atomic_mipsx.s:287
#1 0x004c11e8 in internal/runtime/atomic.lockAndCheck (addr=0x174b240 <runtime.gcController+160>) at internal/runtime/atomic/atomic_mipsx.go:45
#2 0x004c1c48 in internal/runtime/atomic.Xadd64 (addr=0x174b240 <runtime.gcController+160>, delta=1392, new=<optimized out>) at internal/runtime/atomic/atomic_mipsx.go:55
#3 0x004c145c in internal/runtime/atomic.(*Int64).Add (i=0x174b240 <runtime.gcController+160>, delta=1392, ~r0=<optimized out>) at internal/runtime/atomic/types.go:109
#4 0x00436110 in runtime.gcDrain (gcw=0x203cf74, flags=7) at runtime/mgcmark.go:1256
#5 0x00435b54 in runtime.gcDrainMarkWorkerIdle (gcw=0x203cf74) at runtime/mgcmark.go:1102
#6 0x004301bc in runtime.gcBgMarkWorker.func2 () at runtime/mgc.go:1508
#7 0x004b87a8 in runtime.systemstack () at runtime/asm_mipsx.s:188
#8 0x004b8688 in runtime.mstart () at runtime/asm_mipsx.s:89
Thread 2 (LWP 19770):
#0 internal/runtime/atomic.spinLock () at internal/runtime/atomic/atomic_mipsx.s:287
#1 0x004c11e8 in internal/runtime/atomic.lockAndCheck (addr=0x1740d68 <runtime.sched+8>) at internal/runtime/atomic/atomic_mipsx.go:45
#2 0x004c1db8 in internal/runtime/atomic.Load64 (addr=0x1740d68 <runtime.sched+8>, val=<optimized out>) at internal/runtime/atomic/atomic_mipsx.go:91
#3 0x004c1324 in internal/runtime/atomic.(*Int64).Load (i=0x1740d68 <runtime.sched+8>, ~r0=<optimized out>) at internal/runtime/atomic/types.go:74
#4 0x00472c28 in runtime.sysmon () at runtime/proc.go:6124
#5 0x00466ec4 in runtime.mstart1 () at runtime/proc.go:1834
#6 0x00466dd8 in runtime.mstart0 () at runtime/proc.go:1791
#7 0x004b8688 in runtime.mstart () at runtime/asm_mipsx.s:89
Thread 1 (LWP 19765):
#0 runtime.futex () at runtime/sys_linux_mipsx.s:380
#1 0x0045a0a8 in runtime.futexsleep (addr=0x173ec90 <runtime.m0+208>, val=0, ns=-1) at runtime/os_linux.go:69
#2 0x0041b39c in runtime.notesleep (n=0x173ec90 <runtime.m0+208>) at runtime/lock_futex.go:170
#3 0x00466fec in runtime.mPark () at runtime/proc.go:1865
#4 0x004698e4 in runtime.stoplockedm () at runtime/proc.go:3140
#5 0x0046c4c4 in runtime.schedule () at runtime/proc.go:3974
#6 0x0046cab4 in runtime.park_m (gp=0x2082a28) at runtime/proc.go:4102
#7 0x004b86fc in runtime.mcall () at runtime/asm_mipsx.s:141
#8 0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
Only one of the server software have been examined with strace and gdb (the one that my team makes). For the other we have only observed that the symptoms from the outside seems similar (the older solution).
After reproducing the problem when building with Go 1.23.4 we rolled back the Go version to 1.22 (we used 1.22.11) for both servers and haven't noticed any problems in production.
We have also reproduced the problem using Go 1.23.0 in a test environment.
What did you expect to see?
Our process not hanging.
Other relevant info
We build the two servers with inling disabled, except for a few select packages where we have noticed it makes a big difference in performance on mips. The purpose of disabling inlining is to get a smaller size of the executable.
gcflags for mipsle:
-gcflags="all=-l" -gcflags="crypto/...=-l=false" -gcflags="vendor/golang.org/x/crypto/...=-l=false" -gcflags="math/...=-l=false"
Speculation/guesses
Since two independent pieces of software display the same kind of problems on mips hardware when using Go 1.23 to build with, but not with earlier Go versions, my guess is that there is some bug in the Go runtime for mips and that this was merged during the development of Go 1.23.
We have looked at the git diff comparing the 1.22.0 and 1.23.0 tags and looked at the changes containing the string mips. We found two commits that look like candidates to look more closely at:
9623a35 runtime/internal/atomic: add mips operators for And/Or
ff0bc46 runtime: add crash stack support for mips/mipsle
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
seankhliao commentedon Feb 6, 2025
cc @golang/mips
gabyhelp commentedon Feb 6, 2025
Related Issues
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
hansgylling commentedon Feb 7, 2025
The initial issue had the wrong values for
go env
. I realized why after I made the issue, but have fixed it now.[-]runtime: Process hangs when built with Go 1.23 for mips hardware[/-][+]runtime: process hangs when built with Go 1.23 for mips hardware[/+]seankhliao commentedon Feb 9, 2025
Given that you've identified potential commits, is it possible for you to build Go with either of the commits reverted and test it?
hallgren commentedon Feb 12, 2025
I rebuilt the application based on go1.23.6 with reverted 9623a35, branch.
After some hours the running application stopped responding.
gdb output:
mknyszek commentedon Feb 12, 2025
In triage: this is a long shot, but in these weird cases it's always good to rule out async preemption (not that anything changed there either recently, though it could be kernel related). Try
GODEBUG=asyncpreemptoff=1
, maybe.mknyszek commentedon Feb 12, 2025
We're at a loss here. How hard would it be to try bisecting between Go 1.22 and Go 1.23?
31 remaining items