Description
I've noticed a recurring pattern in cmd/go
test failures on NetBSD builders, and I believe that it indicates a bug in either os.Process
or the kernel itself on that platform.
The cmd/go
tests start processes running “in the background” using os/exec
, and use a waitOrStop
function to terminate any remaining processes at the conclusion of each test.
The waitOrStop
function starts a goroutine, then blocks on cmd.Wait()
. The background goroutine blocks until either the call to cmd.Wait
completes or the Context
is canceled, then sends a signal to the process. If that signal fails with os: process already finished
, then we assume that the process actually has already finished, and the background goroutine simply blocks until the call to cmd.Wait
(inevitably) returns.
That all happens here:
Lines 1164 to 1206 in a05a7d4
What I'm seeing on (some of?) the NetBSD builders is that after cmd.Process.Signal
fails with process already finished
, the call to cmd.Wait
continues to block, seemingly forever.
The relevant goroutine traces are:
goroutine 1536 [syscall, 8 minutes]:
syscall.Syscall6(0x42d7, 0x9117f20, 0x0, 0x8f1a5a0, 0x0, 0x0, 0x0)
/tmp/workdir/go/src/syscall/asm_unix_386.s:44 +0x5 fp=0x9117ecc sp=0x9117ec8 pc=0x80bc1e5
syscall.wait4(0x42d7, 0x9117f20, 0x0, 0x8f1a5a0)
/tmp/workdir/go/src/syscall/zsyscall_netbsd_386.go:35 +0x5b fp=0x9117f04 sp=0x9117ecc pc=0x80b8f3b
syscall.Wait4(0x42d7, 0x9117f44, 0x0, 0x8f1a5a0)
/tmp/workdir/go/src/syscall/syscall_bsd.go:145 +0x3b fp=0x9117f28 sp=0x9117f04 pc=0x80b6abb
os.(*Process).wait(0x8e98150)
/tmp/workdir/go/src/os/exec_unix.go:44 +0xf2 fp=0x9117f60 sp=0x9117f28 pc=0x80e2762
os.(*Process).Wait(...)
/tmp/workdir/go/src/os/exec.go:132
os/exec.(*Cmd).Wait(0x8c7bc30)
/tmp/workdir/go/src/os/exec/exec.go:507 +0x4d fp=0x9117f9c sp=0x9117f60 pc=0x817eedd
cmd/go_test.waitOrStop({0x874073c, 0x8dc2300}, 0x8c7bc30, {0x873f89c, 0x869bd38}, 0x6472661e3)
/tmp/workdir/go/src/cmd/go/script_test.go:1203 +0x11a fp=0x9117fc0 sp=0x9117f9c pc=0x85800ea
cmd/go_test.(*testScript).startBackground.func1()
/tmp/workdir/go/src/cmd/go/script_test.go:1143 +0x5f fp=0x9117ff0 sp=0x9117fc0 pc=0x857ff8f
runtime.goexit()
/tmp/workdir/go/src/runtime/asm_386.s:1311 +0x1 fp=0x9117ff4 sp=0x9117ff0 pc=0x80ab241
created by cmd/go_test.(*testScript).startBackground
/tmp/workdir/go/src/cmd/go/script_test.go:1142 +0x390
goroutine 1537 [chan send, 8 minutes]:
runtime.gopark(0x869b798, 0x8dbc430, 0xf, 0x16, 0x2)
/tmp/workdir/go/src/runtime/proc.go:366 +0xf6 fp=0x90e8f04 sp=0x90e8ef0 pc=0x807f376
runtime.chansend(0x8dbc400, 0x87302f0, 0x1, 0x8580314)
/tmp/workdir/go/src/runtime/chan.go:258 +0x2f4 fp=0x90e8f48 sp=0x90e8f04 pc=0x804db54
runtime.chansend1(0x8dbc400, 0x87302f0)
/tmp/workdir/go/src/runtime/chan.go:144 +0x24 fp=0x90e8f60 sp=0x90e8f48 pc=0x804d854
cmd/go_test.waitOrStop.func1()
/tmp/workdir/go/src/cmd/go/script_test.go:1176 +0x184 fp=0x90e8ff0 sp=0x90e8f60 pc=0x8580314
runtime.goexit()
/tmp/workdir/go/src/runtime/asm_386.s:1311 +0x1 fp=0x90e8ff4 sp=0x90e8ff0 pc=0x80ab241
created by cmd/go_test.waitOrStop
/tmp/workdir/go/src/cmd/go/script_test.go:1165 +0x10e
Note that goroutine 1537 is blocked at script_test.go:1176
, which is the send on errc
after cmd.Process.Signal
fails with os: process already finished
.
Goroutine 1536 is blocked at the call to cmd.Wait
, which is itself blocked on syscall.Wait4
.
The failure rate with these symptoms is fairly high: something on the order of 20 failures per month.
greplogs --dashboard -md -l -e '(?m)panic: test timed out.*(?:.*\n)*.*\[syscall, .* minutes\]:\n(?:.+\n\t.+\n)*syscall\.Wait.*\n\t.+\n(?:.+\n\t.+\n)*cmd/go_test\.waitOrStop'
2021-10-04T22:46:23-17674e2/netbsd-386-9_0
2021-10-04T18:15:09-9f8d558/netbsd-386-9_0
2021-09-30T19:56:06-eb9f090/netbsd-386-9_0
2021-09-29T15:23:27-aeb4fba/netbsd-amd64-9_0
2021-09-28T17:18:36-ff7b041/netbsd-amd64-9_0
2021-09-28T15:26:21-583eeaa/netbsd-386-9_0
2021-09-27T18:57:20-3d795ea/netbsd-amd64-9_0
2021-09-22T16:24:17-74ba70b/netbsd-amd64-9_0
2021-09-22T15:00:53-91c2318/netbsd-amd64-9_0
2021-09-21T20:39:31-48cf96c/netbsd-386-9_0
2021-09-20T23:04:13-d7e3e44/netbsd-arm64-bsiegert
2021-09-20T00:13:47-a83a558/netbsd-amd64-9_0
2021-09-17T19:32:44-74e384f/netbsd-amd64-9_0
2021-09-16T23:57:40-8d2a9c3/netbsd-386-9_0
2021-09-16T19:38:19-bcdc61d/netbsd-amd64-9_0
2021-09-10T17:11:39-5a4b9f9/netbsd-amd64-9_0
2021-09-09T16:32:28-a53e3d5/netbsd-amd64-9_0
2021-09-08T11:57:03-9295723/netbsd-386-9_0
2021-09-07T03:56:13-6226020/netbsd-386-9_0
2021-09-04T10:58:11-5ec298d/netbsd-386-9_0
2021-08-31T16:43:46-6815235/netbsd-386-9_0
2021-08-30T22:07:49-b06cfe9/netbsd-386-9_0
2021-08-27T05:13:44-2c60a99/netbsd-386-9_0
2021-08-24T22:23:12-54cdef1/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-arm64-bsiegert
2021-08-22T21:43:43-1958582/netbsd-386-9_0
2021-08-20T03:25:17-c92c2c9/netbsd-386-9_0
2021-08-19T20:50:13-65074a4/netbsd-amd64-9_0
2021-08-18T21:19:22-c2bd9ee/netbsd-386-9_0
2021-08-18T20:11:28-165ebd8/netbsd-386-9_0
2021-08-17T16:22:15-cf12b0d/netbsd-386-9_0
2021-08-17T15:00:04-3001b0a/netbsd-386-9_0
2021-08-17T04:37:32-a304273/netbsd-386-9_0
2021-08-17T01:29:37-1951afc/netbsd-386-9_0
2021-08-16T18:44:38-56a919f/netbsd-386-9_0
2021-08-16T18:44:32-ff36d11/netbsd-amd64-9_0
2021-08-16T13:38:52-a192ef8/netbsd-386-9_0
2021-08-12T17:43:16-39634e7/netbsd-386-9_0
2021-08-09T20:06:35-f1dce31/netbsd-386-9_0
2021-08-06T16:51:12-70546f6/netbsd-386-9_0
2021-07-28T03:27:13-b39e0f4/netbsd-386-9_0
2021-07-15T20:39:22-0941dbc/netbsd-amd64-9_0
2021-06-29T16:57:13-3463852/netbsd-386-9_0
2021-06-28T20:51:30-956c81b/netbsd-386-9_0
2021-06-24T03:45:33-a9bb382/netbsd-386-9_0
2021-06-15T20:59:42-d77f4c0/netbsd-amd64-9_0
2021-06-11T20:31:30-16b5d76/netbsd-386-9_0
2021-06-09T17:11:44-df35ade/netbsd-386-9_0
2021-06-09T15:09:13-139e935/netbsd-386-9_0
2021-05-21T17:43:46-4fda54c/netbsd-arm64-bsiegert
2021-05-21T17:35:47-8876b9b/netbsd-386-9_0
2021-05-17T16:02:12-b1aff42/netbsd-386-9_0
2021-05-12T15:23:09-0388670/netbsd-386-9_0
2021-05-12T02:04:57-1a0ea1a/netbsd-386-9_0
2021-05-11T18:22:54-9b84814/netbsd-386-9_0
2021-05-10T13:16:56-2870259/netbsd-386-9_0
2021-05-06T16:00:55-6c591f7/netbsd-386-9_0
2020-09-19T05:13:19-ccf581f/netbsd-386-9_0