runtime/trace: TestTraceCPUProfile failures #55317

gopherbot · 2022-09-21T19:00:20Z

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

Issue created automatically to collect these failures.

Example (log):

SIGQUIT: quit
PC=0x86d24 m=7 sigcode=0

trap    0x0
error   0x0
oldmask 0x0
r0      0x2fbadc
r1      0x80
r2      0x0
r3      0x0
...
	/workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88
runtime.stopTheWorldGC({0x1c6ecd, 0xc})
	/workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054
runtime.StopTrace()
	/workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8
runtime/trace.Stop()
	/workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c
runtime/trace_test.TestTraceCPUProfile(0x1c04e10)
	/workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec
testing.tRunner(0x1c04e10, 0x1d3994)

— watchflakes

The text was updated successfully, but these errors were encountered:

gopherbot · 2022-09-21T19:00:52Z

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2022-08-18 13:16 linux-arm-aws go@38edd9bd runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit
PC=0x86d24 m=7 sigcode=0

trap    0x0
error   0x0
oldmask 0x0
r0      0x2fbadc
r1      0x80
r2      0x0
r3      0x0
...
	/workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88
runtime.stopTheWorldGC({0x1c6ecd, 0xc})
	/workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054
runtime.StopTrace()
	/workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8
runtime/trace.Stop()
	/workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c
runtime/trace_test.TestTraceCPUProfile(0x1c04e10)
	/workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec
testing.tRunner(0x1c04e10, 0x1d3994)

2022-09-02 19:09 windows-arm64-10 go@55ca6a20 runtime/trace.TestTraceCPUProfile (log)

fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x0 addr=0x0 pc=0x7ff6f49eb8e4]

runtime stack:
runtime.throw({0x7ff6f4b4c65b?, 0x0?})
	C:/workdir/go/src/runtime/panic.go:1047 +0x40 fp=0xea793ff1b0 sp=0xea793ff180 pc=0x7ff6f49c73a0
runtime.sigpanic()
	C:/workdir/go/src/runtime/signal_windows.go:251 +0x220 fp=0xea793ff200 sp=0xea793ff1b0 pc=0x7ff6f49dcc90
runtime.gentraceback(0x4000041520?, 0x4000024000?, 0xea793ff5a8?, 0x40003c4000, 0x0, 0xea793ff5a8, 0x40, 0x0, 0x0?, 0x6)
	C:/workdir/go/src/runtime/traceback.go:162 +0x304 fp=0xea793ff540 sp=0xea793ff210 pc=0x7ff6f49eb8e4
...
	C:/workdir/go/src/runtime/trace/trace_test.go:774 +0x74 fp=0x40002759a0 sp=0x4000275970 pc=0x7ff6f4af3934
runtime/trace_test.cpuHog1(0x0?)
	C:/workdir/go/src/runtime/trace/trace_test.go:766 +0x24 fp=0x40002759c0 sp=0x40002759a0 pc=0x7ff6f4af3894
runtime/trace_test.cpuHogger(0x7ff6f4b51038, 0x7ff6f4cdfec8, 0x5f5e100)
	C:/workdir/go/src/runtime/trace/trace_test.go:753 +0x54 fp=0x4000275a10 sp=0x40002759c0 pc=0x7ff6f4af37e4
runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100)
	C:/workdir/go/src/runtime/trace/trace_test.go:618 +0x1f8 fp=0x4000275af0 sp=0x4000275a10 pc=0x7ff6f4af3688
runtime/trace_test.TestTraceCPUProfile(0x400040d380)
	C:/workdir/go/src/runtime/trace/trace_test.go:619 +0x12c fp=0x4000275f60 sp=0x4000275af0 pc=0x7ff6f4af283c
testing.tRunner(0x400040d380, 0x7ff6f4b50fc0)

— watchflakes

rhysh · 2022-09-22T17:47:04Z

As a first pass:

The failure on linux-arm-aws is a timeout, and looks like a conflict between these two goroutines:

goroutine 32 [waiting]:
runtime.systemstack_switch()
	/workdir/go/src/runtime/asm_arm.s:274 +0x4 fp=0x183acdc sp=0x183acd8 pc=0x84088
runtime.stopTheWorld({0x1c6ecd, 0xc})
	/workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88
runtime.stopTheWorldGC({0x1c6ecd, 0xc})
	/workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054
runtime.StopTrace()
	/workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8
runtime/trace.Stop()
	/workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c
runtime/trace_test.TestTraceCPUProfile(0x1c04e10)
	/workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec
testing.tRunner(0x1c04e10, 0x1d3994)
	/workdir/go/src/testing/testing.go:1446 +0x118 fp=0x183afe0 sp=0x183af98 pc=0xf3654
testing.(*T).Run.func1()
	/workdir/go/src/testing/testing.go:1493 +0x30 fp=0x183afec sp=0x183afe0 pc=0xf44ac
runtime.goexit()
	/workdir/go/src/runtime/asm_arm.s:824 +0x4 fp=0x183afec sp=0x183afec pc=0x85d14
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:1493 +0x3a0

goroutine 82 [running]:
	goroutine running on other thread; stack unavailable
created by runtime/trace.Start
	/workdir/go/src/runtime/trace/trace.go:128 +0xd4

The goroutine created on src/runtime/trace/trace.go:128 is the one that calls runtime.ReadTrace:

go/src/runtime/trace/trace.go

Lines 128 to 136 in 38edd9b

    
           go func() { 
        
           	for { 
        
           		data := runtime.ReadTrace() 
        
           		if data == nil { 
        
           			break 
        
           		} 
        
           		w.Write(data) 
        
           	} 
        
           }()

The failure on windows-arm64-10 is a crash, fatal error: unexpected signal during runtime execution / [signal 0xc0000005 code=0x0 addr=0x0 pc=0x7ff6f49eb8e4], coming from runtime.gentraceback as called by runtime.sigprof:

go/src/runtime/traceback.go

Line 162 in 55ca6a2

if flags&_TraceJumpStack != 0 && gp == gp.m.g0 && gp.m.curg != nil {

gopherbot · 2023-10-25T15:04:45Z

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2023-10-14 22:30 linux-arm-aws go@bc9dc8d4 runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit
PC=0x8eefc m=0 sigcode=0

trap    0x0
error   0x0
oldmask 0x0
r0      0x329170
r1      0x80
r2      0x0
r3      0x0
...
	/workdir/go/src/runtime/trace/trace_test.go:755 +0x40 fp=0x1c3bc88 sp=0x1c3bc4c pc=0x1a5538
runtime/trace_test.TestTraceCPUProfile.func1.1({0x22a338, 0x1c520d8})
	/workdir/go/src/runtime/trace/trace_test.go:615 +0x34 fp=0x1c3bc9c sp=0x1c3bc88 pc=0x1a5444
runtime/pprof.Do({0x22a2f8, 0x339860}, {{0x1f74f50, 0x1, 0x1}}, 0x1c3bd10)
	/workdir/go/src/runtime/pprof/runtime.go:51 +0xb8 fp=0x1c3bcd8 sp=0x1c3bc9c pc=0x14e21c
runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100)
	/workdir/go/src/runtime/trace/trace_test.go:614 +0x21c fp=0x1c3bd44 sp=0x1c3bcd8 pc=0x1a5380
runtime/trace_test.TestTraceCPUProfile(0x1eae0f0)
	/workdir/go/src/runtime/trace/trace_test.go:621 +0x190 fp=0x1c3bf94 sp=0x1c3bd44 pc=0x1a4260
testing.tRunner(0x1eae0f0, 0x1f9968)

— watchflakes

gopherbot · 2023-10-30T17:31:56Z

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2023-10-30 17:08 linux-arm-aws go@ecda959b runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit
PC=0x8ef88 m=2 sigcode=0

trap    0x0
error   0x0
oldmask 0x0
r0      0x3294f0
r1      0x80
r2      0x2
r3      0x0
...
	/workdir/go/src/runtime/trace/trace_test.go:776 +0x60 fp=0x23b9c8c sp=0x23b9c7c pc=0x1a5bdc
runtime/trace_test.cpuHog1(0x0)
	/workdir/go/src/runtime/trace/trace_test.go:768 +0x24 fp=0x23b9c9c sp=0x23b9c8c pc=0x1a5b5c
runtime/trace_test.cpuHogger(0x1f9bbc, 0x339990, 0x5f5e100)
	/workdir/go/src/runtime/trace/trace_test.go:755 +0x40 fp=0x23b9cd8 sp=0x23b9c9c pc=0x1a5aa4
runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100)
	/workdir/go/src/runtime/trace/trace_test.go:620 +0x240 fp=0x23b9d44 sp=0x23b9cd8 pc=0x1a5910
runtime/trace_test.TestTraceCPUProfile(0x2542c00)
	/workdir/go/src/runtime/trace/trace_test.go:621 +0x190 fp=0x23b9f94 sp=0x23b9d44 pc=0x1a47cc
testing.tRunner(0x2542c00, 0x1f9af0)

— watchflakes

gopherbot · 2023-11-17T16:13:07Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-10 15:50 windows-arm64-11 go@9d914015 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.64s)
    --- FAIL: TestTraceCPUProfile/Default (2.62s)
        reader_test.go:65: unexpected error while reading the trace: expected a proc but didn't have one
        trace_test.go:567: found bad trace; dumping to test log...
        trace_test.go:568: Trace Go1.22
            EventBatch gen=1 m=2144 time=23479457578 size=65458
            ProcStart dt=1 p=2 p_seq=2
            ProcStop dt=1
            ProcStart dt=1 p=2 p_seq=3
            ProcStop dt=1
...
            GoStart dt=1 g=1 g_seq=3938
            GoStop dt=1 reason_string=20 stack=14
            GoStart dt=1 g=1 g_seq=3939
            GoStop dt=1 reason_string=20 stack=14
            GoStart dt=1 g=1 g_seq=3940
            GoStop dt=1 reason_string=20 stack=14
            GoStart dt=1 g=1 g_seq=3941
            GoStop dt=1 reason_st

... log truncated ...

2023-11-10 15:51 windows-arm64-11 go@3b303fa9 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.70s)
    --- FAIL: TestTraceCPUProfile/Default (2.68s)
        reader_test.go:65: unexpected error while reading the trace: per-M streams are out-of-order
        trace_test.go:567: found bad trace; dumping to test log...
        trace_test.go:568: Trace Go1.22
            EventBatch gen=1 m=9936 time=23496448275 size=65455
            ProcStart dt=1 p=0 p_seq=3
            ProcStop dt=1
            ProcStart dt=1 p=0 p_seq=4
            ProcStop dt=1
...
            GoStop dt=1 reason_string=20 stack=16
            GoStart dt=1 g=1 g_seq=19041
            GoStop dt=1 reason_string=20 stack=16
            GoStart dt=1 g=1 g_seq=19042
            GoStop dt=1 reason_string=20 stack=16
            GoStart dt=1 g=1 g_seq=19043
            GoStop dt=1 reason_string=20 stack=16


... log truncated ...

2023-11-10 16:18 windows-arm64-11 go@abf84221 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.63s)
    --- FAIL: TestTraceCPUProfile/Default (2.62s)
        reader_test.go:65: unexpected error while reading the trace: per-M streams are out-of-order
        trace_test.go:569: found bad trace; dumping to test log...
        trace_test.go:570: Trace Go1.22
            EventBatch gen=1 m=1420 time=27049777477 size=65457
            ProcStart dt=1 p=0 p_seq=1
            ProcStop dt=1
            ProcStart dt=1 p=0 p_seq=2
            ProcStop dt=1
...
            ProcStop dt=1
            ProcStart dt=1 p=1 p_seq=2698
            ProcStop dt=1
            ProcStart dt=1 p=1 p_seq=2699
            ProcStop dt=1
            ProcStart dt=1 p=1 p_seq=2700
            ProcStop dt=1
            ProcStart dt=1 p=

... log truncated ...

2023-11-13 22:12 freebsd-arm64-dmgk go@773039ed TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.69s)
    --- FAIL: TestTraceCPUProfile/Default (4.65s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 29 in profile, 21 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 18 in profile, 12 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 4 in profile, 3 in trace
        trace_test.go:195: CPU profile included 1 samples at stack "runtime/internal/atomic.(*Uint32).CompareAndSwap:236|runtime.casgstatus:1135|runtime.execute:3051|runtime.schedule:3862|runtime.goschedImpl:4009|runtime.gosched_m:4014|runtime.mcall:192" not present in execution trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 1: "runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192"
        trace_test.go:204: 1: "runtime.duffzero:8|runtime.traceStack:40|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192"
        trace_test.go:204: 1: "runtime/internal/atomic.(*Uint32).Load:194|runtime.(*traceSchedResourceState).statusWasTraced:191|runtime.traceLocker.eventWriter:113|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192"
...
            GoStop dt=89 reason_string=20 stack=13
            GoStart dt=47 g=1 g_seq=13463
            GoStop dt=87 reason_string=20 stack=13
            GoStart dt=48 g=1 g_seq=13464
            GoStop dt=87 reason_string=20 stack=13
            GoStart dt=45 g=1 g_seq=13465
            GoStop dt=88 reason_string=20 stack=13
            GoStart dt=48 g=

... log truncated ...

2023-11-14 16:45 linux-386-longtest go@e14b96cb internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.64s)
    --- FAIL: TestTraceCPUProfile/Stress (0.72s)
        trace_test.go:550: stderr: panic: runtime.plainError("send on closed channel")
            fatal error: panic on system stack

            runtime stack:
            runtime.throw({0x8155fbb, 0x15})
            	/workdir/go/src/runtime/panic.go:1016 +0x4d fp=0xffed8918 sp=0xffed8904 pc=0x807d26d
            panic({0x81405c0, 0x817cf80})
            	/workdir/go/src/runtime/panic.go:727 +0x30c fp=0xffed896c sp=0xffed8918 pc=0x807ca8c
...
            runtime.findRunnable()
            	/workdir/go/src/runtime/proc.go:3198 +0x3e0 fp=0xffed8bc8 sp=0xffed8b1c pc=0x80853a0
            runtime.schedule()
            	/workdir/go/src/runtime/proc.go:3812 +0xbd fp=0xffed8be4 sp=0xffed8bc8 pc=0x808709d
            runtime.goschedImpl(0xa408120, 0x0)
            	/workdir/go/src/runtime/proc.go:4009 +0x14d fp=0xffed8bfc sp=0xffed8be4 pc=0x808794d
            runtime.gosched_m(0xa408120)
            	/workdir/go/src/runtime/proc.go:4014 +0x26 fp=0xffed8c08 sp=0xffed8bfc pc=0x80879a6
            runtime.mcall(0x225220)
            	/workdir/go/src/runtime/asm_386.s:329 +0x44 fp=0xffed8c10 sp=0xffed8c08 pc=0x80af9f4

2023-11-15 20:08 freebsd-arm64-dmgk go@2380862c TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.27s)
    --- FAIL: TestTraceCPUProfile/Default (4.23s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 28 in profile, 14 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 2 in profile, 1 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 20 in profile, 9 in trace
        trace_test.go:195: CPU profile included 1 samples at stack "aeshashbody:711|runtime.(*traceMap).put:62|runtime.(*traceStackTable).put:95|runtime.traceStack:79|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace
        trace_test.go:195: CPU profile included 1 samples at stack "runtime.traceStack:70|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 9: "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
...
            GoStart dt=50 g=1 g_seq=13593
            GoStop dt=89 reason_string=20 stack=12
            GoStart dt=50 g=1 g_seq=13594
            GoStop dt=89 reason_string=20 stack=12
            GoStart dt=49 g=1 g_seq=13595
            GoStop dt=92 reason_string=20 stack=12
            GoStart dt=50 g=1 g_seq=13596
            GoStop

... log truncated ...

2023-11-16 19:26 openbsd-386-72 go@d573a8a8 runtime/trace.TestTraceCPUProfile (log)

fatal error: runtime: out of memory

runtime stack:
runtime.throw({0x81ffb4e, 0x16})
	/tmp/workdir/go/src/runtime/panic.go:1023 +0x4d fp=0x12e44704 sp=0x12e446f0 pc=0x8082ddd
runtime.sysMapOS(0xad000000, 0x800000)
	/tmp/workdir/go/src/runtime/mem_bsd.go:81 +0x142 fp=0x12e44730 sp=0x12e44704 pc=0x805ef32
runtime.sysMap(0xad000000, 0x800000, 0x834b118)
	/tmp/workdir/go/src/runtime/mem.go:155 +0x3f fp=0x12e44740 sp=0x12e44730 pc=0x805ed0f
runtime.(*mheap).grow(0x833dde0, 0x375)
...
	/tmp/workdir/go/src/internal/trace/parser.go:269 +0x152b fp=0x6dcd8ba0 sp=0x6dcd8a6c pc=0x816775b
internal/trace.parse({0x8239b70, 0x6dde00f0}, {0x0, 0x0})
	/tmp/workdir/go/src/internal/trace/parser.go:104 +0x27 fp=0x6dcd8bec sp=0x6dcd8ba0 pc=0x8165f87
internal/trace.Parse({0x8239b70, 0x6dde00f0}, {0x0, 0x0})
	/tmp/workdir/go/src/internal/trace/parser.go:91 +0x37 fp=0x6dcd8c1c sp=0x6dcd8bec pc=0x8165e67
runtime/trace_test.parseTrace(0x6dca8300, {0x8239b70, 0x6dde00f0})
	/tmp/workdir/go/src/runtime/trace/trace_test.go:154 +0x47 fp=0x6dcd8d54 sp=0x6dcd8c1c pc=0x81b11c7
runtime/trace_test.TestTraceCPUProfile(0x6dca8300)
	/tmp/workdir/go/src/runtime/trace/trace_test.go:699 +0x461 fp=0x6dcd8f98 sp=0x6dcd8d54 pc=0x81b4361
testing.tRunner(0x6dca8300, 0x820a06c)

— watchflakes

rhysh · 2023-11-17T18:16:06Z

This latest set of failures seem related to the v2 tracer; those on arm64 are in the internal/trace/v2 package (right after the ok internal/trace line).

The linux-386-longtest failure looks like a logical race in the v2 tracer's wakeableSleep mechanism, near

go/src/runtime/trace2.go

Line 908 in e14b96c

s.(*wakeableSleep).wake()

CC @mknyszek for those.

The openbsd-386-72 failure is for the v1 runtime/trace.TestTraceCPUProfile test, running out of memory while parsing the trace.

go/src/internal/trace/parser.go

Line 269 in d573a8a

events = append(events, ev)

is events = append(events, ev), where ev is a type rawEvent struct {int; byte; []uint64; []string}, which on GOARCH=386 would be 8 words of 4 bytes each, 32 bytes in total. The failed allocation is runtime.growslice(0xac500000, 0x2c301, 0x2c300, 0x1, 0x81e4340), growing that slice beyond 180992 entries. That's 5.8 MB (growing to 0x375 pages of 8 kiB each, 7.3 MB) which doesn't seem large by itself, but maybe 180k+ entries in the test trace is unusually large? I don't have a baseline for that. There are nice log lines in mheap.grow when the call to mheap.sysAlloc fails, but the crash here is in sysMap (which throws directly in sysMapOS) so the log doesn't include context on the total heap size.

bcmills · 2023-11-17T18:47:01Z

the crash here is in sysMap (which throws directly in sysMapOS) so the log doesn't include context on the total heap size.

Compare:

gopherbot · 2023-11-20T16:32:15Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-20 15:10 windows-amd64-longtest go@0709f1bb internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.67s)
    --- FAIL: TestTraceCPUProfile/Stress (2.38s)
        trace_test.go:109: failed to parse CPU profile summary in stderr: runtime: failing to flush
            	full:
            runtime: failing to flush
            runtime: m=3
            main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	3
            main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	16
            runtime.mcall:436	1
            runtime.stdcall2:1000|runtime.semasleep:696|runtime.lock2:88|runtime.lockWithRank:24|runtime.lock:35|runtime.(*traceMap).put.func1:72|runtime.(*traceMap).put:71|runtime.(*traceStackTable).put:95|runtime.traceStack:79|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:458	1
            runtime.execute:3108|runtime.schedule:3918|runtime.goschedImpl:4065|runtime.gosched_m:4070|runtime.mcall:458	1
            main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	1

— watchflakes

mknyszek · 2023-11-20T17:00:26Z

Sorry for not replying sooner. The linux-386-longtest failure was fixed by https://go.dev/cl/542335.

The new failure is fun -- that may be an actual failure at runtime from the new tracer. I'll try to reproduce.

As a side-note, the runtime/trace tests barely use the new tracer at all (TestTraceCPUProfile doesn't). If there's a failure there, it's not related to that specifically.

mknyszek · 2023-11-20T17:15:30Z

Oh! No, that new failure is not real actually. I put a low-quality optimistic "deadlock detector" but it turns out there was no deadlock in this case at all. The test just didn't expect the additional output.

I'll put the "deadlock detector" behind a constant. It's for debugging anyway.

gopherbot · 2023-11-21T17:04:24Z

Change https://go.dev/cl/544235 mentions this issue: runtime: disable trace v2 optimistic deadlock debugger

The v2 execution tracer has a rudimentary deadlock detector, but it's based on an arbitrary threshold that an actually get hit even if there's no deadlock. This ends up breaking tests sometimes, and it would be bad if this just appeared in production logs. Put this 'deadlock detector' behind a flag. For #55317. Change-Id: I286f0c05b3ac9600f4f2f9696065cac8bbd25f00 Reviewed-on: https://go-review.googlesource.com/c/go/+/544235 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>

gopherbot · 2023-11-21T23:25:34Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-21 22:23 freebsd-arm64-dmgk go@ff05cdbd TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.16s)
    --- FAIL: TestTraceCPUProfile/Default (2.13s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 25 in profile, 4 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 19 in profile, 1 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 4 in profile, 1 in trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 1: "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "runtime.releasem:586|runtime.traceRelease:238|runtime.execute:3133|runtime.schedule:3922|runtime.goschedImpl:4069|runtime.gosched_m:4074|runtime.mcall:192"
...
            GoStop dt=73 reason_string=20 stack=14
            GoStart dt=7 g=1 g_seq=13726
            GoStop dt=73 reason_string=20 stack=14
            GoStart dt=7 g=1 g_seq=13727
            GoStop dt=74 reason_string=20 stack=14
            GoStart dt=8 g=1 g_seq=13728
            GoStop dt=74 reason_string=20 stack=14
            GoStart dt=6 g=1 g_seq=13729

... log truncated ...

— watchflakes

gopherbot · 2023-11-24T12:49:25Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-21 17:24 freebsd-arm64-dmgk go@0cb45bac TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.36s)
    --- FAIL: TestTraceCPUProfile/Default (4.31s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 29 in profile, 14 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 20 in profile, 9 in trace
        trace_test.go:195: CPU profile included 1 samples at stack "runtime.acquirem:577|runtime.traceAcquireEnabled:193|runtime.traceAcquire:176|runtime.execute:3124|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" not present in execution trace
        trace_test.go:195: CPU profile included 1 samples at stack "runtime.releasem:584|runtime.traceRelease:238|runtime.execute:3132|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" not present in execution trace
        trace_test.go:195: CPU profile included 2 samples at stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271" not present in execution trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 1: "runtime/internal/atomic.(*Uint32).Load:194|runtime.(*traceSchedResourceState).statusWasTraced:191|runtime.traceLocker.eventWriter:113|runtime.traceLocker.GoStart:398|runtime.execute:3131|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192"
        trace_test.go:204: 1: "runtime.vdsoClockGettime:81|runtime.nanotime1:101|runtime.nanotime:19|runtime.traceClockNow:56|runtime.traceWriter.event:146|runtime.traceEventWriter.write:131|runtime.traceLocker.GoStart:399|runtime.execute:3131|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192"
...
            GoStop dt=81 reason_string=20 stack=12
            GoStart dt=53 g=1 g_seq=12759
            GoStop dt=78 reason_string=20 stack=12
            GoStart dt=51 g=1 g_seq=12760
            GoStop dt=83 reason_string=20 stack=12
            GoStart dt=53 g=1 g_seq=12761
            GoStop dt=78 reason_string=20 stack=12
            GoStart dt=49 g=1 g_seq=12

... log truncated ...

— watchflakes

gopherbot · 2023-11-27T16:43:28Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-23 17:33 freebsd-arm64-dmgk go@0c7e5d3b internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.13s)
    --- FAIL: TestTraceCPUProfile/Default (2.13s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 26 in profile, 16 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 3 in profile, 2 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 16 in profile, 8 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 4 in profile, 3 in trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 1: "runtime/internal/atomic.(*Uint32).CompareAndSwap:236|runtime.casgstatus:1138|runtime.goschedImpl:4062|runtime.gosched_m:4086|runtime.mcall:192"
        trace_test.go:204: 2: "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "runtime.mcall:179"
        trace_test.go:204: 8: "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "runtime.casgstatus:1121|runtime.goschedImpl:4062|runtime.gosched_m:4086|runtime.mcall:192"
        trace_test.go:204: 3: "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:206: CPU profile:
            main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	16
            runtime.casgstatus:1121|runtime.goschedImpl:4062|runtime.gosched_m:4086|runtime.mcall:192	1
            main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	4
            runtime/internal/atomic.(*Uint32).CompareAndSwap:236|runtime.casgstatus:1138|runtime.goschedImpl:4062|runtime.gosched_m:4086|runtime.mcall:192	1
            main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	3
            runtime.mcall:179	1
        trace_test.go:570: found bad trace; dumping to test log...
        trace_test.go:579: text trace too large to dump (7221666 bytes)

— watchflakes

gopherbot · 2023-12-01T20:59:11Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 19:58 netbsd-386-9_3 go@40f6fbf1 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.77s)
    --- FAIL: TestTraceCPUProfile/Default (2.77s)
        trace_test.go:551: stderr: SIGSEGV: segmentation violation
            PC=0x80a2971 m=3 sigcode=1 addr=0x0

            goroutine 0 gp=0x8406b48 m=3 mp=0x8440808 [idle]:
            runtime.readgstatus(...)
            	/tmp/workdir/go/src/runtime/proc.go:1056
            runtime.traceWriter.writeProcStatusForP({{0x8440808, 0x3}, 0x0}, 0x8432008, 0x0)
            	/tmp/workdir/go/src/runtime/trace2status.go:91 +0x61 fp=0x8455ea0 sp=0x8455e78 pc=0x80a2971
...
            esi    0xa3b0
            ebp    0x82212e0
            esp    0x8455e78
            eip    0x80a2971
            eflags 0x10246
            cs     0x17
            fs     0xab
            gs     0xb3
            exit status 2
        trace_test.go:553: exit status 1

— watchflakes

gopherbot · 2023-12-01T22:43:13Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 21:24 linux-ppc64-sid-buildlet go@ecb9d9b9 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.04s)
    --- FAIL: TestTraceCPUProfile/Default (4.04s)
        trace_test.go:551: stderr: SIGSEGV: segmentation violation
            PC=0x729a0 m=3 sigcode=1 addr=0x0

            goroutine 0 gp=0xc000003040 m=3 mp=0xc000040e08 [idle]:
            runtime.readgstatus(...)
            	/workdir/go/src/runtime/proc.go:1056
            runtime.traceWriter.writeProcStatusForP({{0xc000040e08?, 0x0?}, 0x0?}, 0x0?, 0x0?)
            	/workdir/go/src/runtime/trace2status.go:91 +0x110 fp=0xc000057cb0 sp=0xc000057c68 pc=0x729a0
...
            r22  0xc0000543a0	r23  0xc000057da8
            r24  0x0	r25  0x2016a0
            r26  0x22	r27  0x0
            r28  0x0	r29  0x0
            r30  0xc000003040	r31  0x0
            pc   0x729a0	ctr  0x0
            link 0x7a1b4	xer  0x0
            ccr  0x22420080	trap 0x300
            exit status 2
        trace_test.go:553: exit status 1

— watchflakes

gopherbot · 2023-12-02T08:04:27Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 19:30 netbsd-arm64-bsiegert go@70c7fb75 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (7.58s)
    --- FAIL: TestTraceCPUProfile/Default (7.57s)
        trace_test.go:551: stderr: SIGSEGV: segmentation violation
            PC=0x6b440 m=5 sigcode=1 addr=0x0

            goroutine 0 gp=0x40000844e0 m=5 mp=0x4000080708 [idle]:
            runtime.readgstatus(...)
            	/var/gobuilder/buildlet/go/src/runtime/proc.go:1056
            runtime.traceWriter.writeProcStatusForP({{0x4000080708?, 0x0?}, 0x0?}, 0x0?, 0x0?)
            	/var/gobuilder/buildlet/go/src/runtime/trace2status.go:91 +0xf0 fp=0x4000091ce0 sp=0x4000091ca0 pc=0x6b440
...
            r26     0x136df8
            r27     0x0
            r28     0x40000844e0
            r29     0x4000091c98
            lr      0x71888
            sp      0x4000091ca0
            pc      0x6b440
            fault   0x0
            exit status 2
        trace_test.go:553: exit status 1

— watchflakes

mknyszek · 2024-01-09T19:58:16Z

Yeah, you're right. I frequently forget that other users might run go test all; I was mainly just thinking about test coverage loss on our side.

I think I can just make the test more resilient to lost samples. It seems like in all these failures at least some of the samples show up in the trace. I figure we can just make the test pass if at least one of these samples shows up in the trace.

This only verifies that functionality works at a baseline, not how well it works, but given that the functionality is best-effort anyway, I think that's about as good as we can make it for now.

mknyszek · 2024-01-11T20:08:19Z

Taking a step back, I think this actually represents a regression in behavior vs. the old tracer, but it was really hard to tell because of how best-effort this functionality is.

Specifically, the old tracer was subtly ensuring that there was a CPU profile buffer flush sometime close to when a trace would end, because the trace reader would get woken up a whole bunch. In the new tracer the CPU profile buffer is translated into the execution trace via a separate goroutine. This has in general been positive for understanding the best-effort-ness of this behavior, but what we lost was this property that a flush was likely to happen close in time to when the trace ended.

My proposal to restore this behavior is just to wake the trace CPU profile reader goroutine before we're about to change generations. It does not ensure that a flush happens, but makes it far more likely. It's still best-effort, but a bit more predictable and will provide more understandable results.

Overall, I'm not really that happy with the fact that this is so best-effort still. Early on in the trace design I tried to write this trace CPU profile reader goroutine to actually block on the CPU profile buffer getting filled, making it much more precise. But because of the problems on Darwin with CPU profiles we encountered earlier this year, I had to work around that.

In the future it would be good to consider blocking in traceAdvance until we can ensure a recent flush of the CPU profile buffer happened. I'm reluctant to introduce that now just because that's how you introduce deadlocks in a new release, but I'll file an issue to track that.

I'll send a CL with the better-effort wakeup which should hopefully resolve most, if not all of the failures seen in this issue.

gopherbot · 2024-01-11T22:13:51Z

Change https://go.dev/cl/555495 mentions this issue: runtime: make a much better effort to emit CPU profile in a trace

mknyszek · 2024-01-11T22:13:56Z

@prattmic convinced me to try to do the right thing here, and eliminate all parts of the tracer making this CPU profile -> trace process best-effort. https://go.dev/cl/555495 implements this. I was originally skeptical given that we're pretty late in the release cycle, but I figure at the very least we can have this on hand and decide whether it's worth submitting now. If it's not, I can land it for next cycle and I'll send the wakeup patch (which is like, 5 lines of code) which will certainly not make the situation any worse.

gopherbot · 2024-01-13T12:06:01Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-01-12 16:19 freebsd-arm64-dmgk go@e58e8139 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.09s)
    --- FAIL: TestTraceCPUProfile/Default (2.09s)
        trace_test.go:177: execution trace did not include all CPU profile samples; 27 in profile, 9 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 22 in profile, 8 in trace
        trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 2 in profile, 1 in trace
        trace_test.go:195: CPU profile included 1 samples at stack "aeshashbody:597|runtime.(*traceMap).put:62|runtime.(*traceStackTable).put:95|runtime.traceStack:79|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace
        trace_test.go:195: CPU profile included 1 samples at stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271" not present in execution trace
        trace_test.go:195: CPU profile included 1 samples at stack "runtime.releasem:588|runtime.traceRelease:238|runtime.execute:3129|runtime.schedule:3918|runtime.goschedImpl:4065|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace
        trace_test.go:202: execution trace CPU samples:
        trace_test.go:204: 8: "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:204: 1: "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"
        trace_test.go:206: CPU profile:
            main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	22
            main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	2
            main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271	1
            runtime.releasem:588|runtime.traceRelease:238|runtime.execute:3129|runtime.schedule:3918|runtime.goschedImpl:4065|runtime.gosched_m:4070|runtime.mcall:192	1
            aeshashbody:597|runtime.(*traceMap).put:62|runtime.(*traceStackTable).put:95|runtime.traceStack:79|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192	1
        trace_test.go:570: found bad trace; dumping to test log...
        trace_test.go:579: text trace too large to dump (7190038 bytes)

— watchflakes

gopherbot · 2024-01-23T18:59:37Z

Change https://go.dev/cl/557838 mentions this issue: [release-branch.go1.22] runtime: make a much better effort to emit CPU profile in a trace

…U profile in a trace Currently the new execution tracer's handling of CPU profile samples is very best-effort. The same CPU profile buffer is used across generations, leading to a high probability that CPU samples will bleed across generations. Also, because the CPU profile buffer (not the trace buffer the samples get written into) isn't guaranteed to be flushed when we close out a generation, nor when tracing stops. This has led to test failures, but can more generally just lead to lost samples. In general, lost samples are considered OK. The CPU profile buffer is only read from every 100 ms, so if it fills up too much before then, old samples will get overwritten. The tests already account for this, and in that sense the CPU profile samples are already best-effort. But with actual CPU profiles, this is really the only condition under which samples are dropped. This CL aims to align CPU profiles better with traces by eliminating all best-effort parts of the implementation aside from the possibility of dropped samples from a full buffer. To achieve this, this CL adds a second CPU profile buffer and has the SIGPROF handler pick which CPU profile buffer to use based on the generation, much like every other part of the tracer. The SIGPROF handler then reads the trace generation, but not before ensuring it can't change: it grabs its own thread's trace seqlock. It's possible that a SIGPROF signal lands while this seqlock is already held by the thread. Luckily this is detectable and the SIGPROF handler can simply elide the locking if this happens (the tracer will already wait until all threads exit their seqlock critical section). Now that there are two CPU profile buffers written to, the read side needs to change. Instead of calling traceAcquire/traceRelease for every single CPU sample event, the trace CPU profile reader goroutine holds this conceptual lock over the entirety of flushing a buffer. This means it can pick the CPU profile buffer for the current generation to flush. With all this machinery in place, we're now at a point where all CPU profile samples get divided into either the previous generation or the current generation. This is good, since it means that we're able to emit profile samples into the correct generation, avoiding surprises in the final trace. All that's missing is to flush the CPU profile buffer from the previous generation, once the runtime has moved on from that generation. That is, when the generation counter updates, there may yet be CPU profile samples sitting in the last generation's buffer. So, traceCPUFlush now first flushes the CPU profile buffer, followed by any trace buffers containing CPU profile samples. The end result of all this is that no sample gets left behind unless it gets overwritten in the CPU profile buffer in the first place. CPU profile samples in the trace will now also get attributed to the right generation, since the SIGPROF handler now participates in the tracer's synchronization across trace generations. Fixes #55317. Change-Id: I47719fad164c544eef0bb12f99c8f3c15358e344 Reviewed-on: https://go-review.googlesource.com/c/go/+/555495 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> (cherry picked from commit f5e475e) Reviewed-on: https://go-review.googlesource.com/c/go/+/557838 Reviewed-by: Cherry Mui <[email protected]>

gopherbot · 2024-02-06T19:01:41Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-02-02 16:26 openbsd-386-72 go@2fdad8af runtime/trace.TestTraceCPUProfile (log)

fatal error: runtime: out of memory

runtime stack:
runtime.throw({0x81fbb69, 0x16})
	/tmp/workdir/go/src/runtime/panic.go:1077 +0x4d fp=0x493334cc sp=0x493334b8 pc=0x80826ed
runtime.sysMapOS(0x97c00000, 0x400000)
	/tmp/workdir/go/src/runtime/mem_bsd.go:81 +0x144 fp=0x493334f8 sp=0x493334cc pc=0x805e894
runtime.sysMap(0x97c00000, 0x400000, 0x8340958)
	/tmp/workdir/go/src/runtime/mem.go:155 +0x3f fp=0x49333508 sp=0x493334f8 pc=0x805e66f
runtime.(*mheap).grow(0x8333640, 0x42)
...
	/tmp/workdir/go/src/internal/trace/parser.go:269 +0x156f fp=0x5a841b90 sp=0x5a841a5c pc=0x816434f
internal/trace.parse({0x82348e0, 0x5a886588}, {0x0, 0x0})
	/tmp/workdir/go/src/internal/trace/parser.go:104 +0x27 fp=0x5a841bdc sp=0x5a841b90 pc=0x8162b27
internal/trace.Parse({0x82348e0, 0x5a886588}, {0x0, 0x0})
	/tmp/workdir/go/src/internal/trace/parser.go:91 +0x37 fp=0x5a841c0c sp=0x5a841bdc pc=0x8162a07
runtime/trace_test.parseTrace(0x5a8b03c0, {0x82348e0, 0x5a886588})
	/tmp/workdir/go/src/runtime/trace/trace_test.go:146 +0x47 fp=0x5a841d48 sp=0x5a841c0c pc=0x81ade97
runtime/trace_test.TestTraceCPUProfile(0x5a8b03c0)
	/tmp/workdir/go/src/runtime/trace/trace_test.go:676 +0x473 fp=0x5a841f98 sp=0x5a841d48 pc=0x81b10a3
testing.tRunner(0x5a8b03c0, 0x82060b8)

— watchflakes

Currently the new execution tracer's handling of CPU profile samples is very best-effort. The same CPU profile buffer is used across generations, leading to a high probability that CPU samples will bleed across generations. Also, because the CPU profile buffer (not the trace buffer the samples get written into) isn't guaranteed to be flushed when we close out a generation, nor when tracing stops. This has led to test failures, but can more generally just lead to lost samples. In general, lost samples are considered OK. The CPU profile buffer is only read from every 100 ms, so if it fills up too much before then, old samples will get overwritten. The tests already account for this, and in that sense the CPU profile samples are already best-effort. But with actual CPU profiles, this is really the only condition under which samples are dropped. This CL aims to align CPU profiles better with traces by eliminating all best-effort parts of the implementation aside from the possibility of dropped samples from a full buffer. To achieve this, this CL adds a second CPU profile buffer and has the SIGPROF handler pick which CPU profile buffer to use based on the generation, much like every other part of the tracer. The SIGPROF handler then reads the trace generation, but not before ensuring it can't change: it grabs its own thread's trace seqlock. It's possible that a SIGPROF signal lands while this seqlock is already held by the thread. Luckily this is detectable and the SIGPROF handler can simply elide the locking if this happens (the tracer will already wait until all threads exit their seqlock critical section). Now that there are two CPU profile buffers written to, the read side needs to change. Instead of calling traceAcquire/traceRelease for every single CPU sample event, the trace CPU profile reader goroutine holds this conceptual lock over the entirety of flushing a buffer. This means it can pick the CPU profile buffer for the current generation to flush. With all this machinery in place, we're now at a point where all CPU profile samples get divided into either the previous generation or the current generation. This is good, since it means that we're able to emit profile samples into the correct generation, avoiding surprises in the final trace. All that's missing is to flush the CPU profile buffer from the previous generation, once the runtime has moved on from that generation. That is, when the generation counter updates, there may yet be CPU profile samples sitting in the last generation's buffer. So, traceCPUFlush now first flushes the CPU profile buffer, followed by any trace buffers containing CPU profile samples. The end result of all this is that no sample gets left behind unless it gets overwritten in the CPU profile buffer in the first place. CPU profile samples in the trace will now also get attributed to the right generation, since the SIGPROF handler now participates in the tracer's synchronization across trace generations. Fixes golang#55317. Change-Id: I47719fad164c544eef0bb12f99c8f3c15358e344 Reviewed-on: https://go-review.googlesource.com/c/go/+/555495 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>

gopherbot · 2024-04-22T21:01:46Z

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-04-22 15:29 netbsd-arm64-bsiegert go@734fd7a9 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (12.18s)
    --- FAIL: TestTraceCPUProfile/Default (12.18s)
        trace_test.go:573: stderr: SIGSEGV: segmentation violation
            PC=0x7099c m=2 sigcode=1 addr=0xa

            goroutine 0 gp=0x4000004a80 m=2 mp=0x4000040808 [idle]:
            runtime.fpTracebackPCs(...)
            	/var/gobuilder/buildlet/go/src/runtime/tracestack.go:256
            runtime.traceStack(0x1?, 0x4fe34?, 0x3)
            	/var/gobuilder/buildlet/go/src/runtime/tracestack.go:113 +0x28c fp=0x4000059eb0 sp=0x4000059a40 pc=0x7099c
...
            r26     0x115228
            r27     0x202000
            r28     0x4000004a80
            r29     0x4000059a38
            lr      0x707a0
            sp      0x4000059a40
            pc      0x7099c
            fault   0xa
            exit status 2
        trace_test.go:575: exit status 1

— watchflakes

mknyszek · 2025-05-08T21:10:35Z

This hasn't happened in a while, and even then, only on secondary ports. (I think flaky failures there are now captured in newer watchflakes bugs anyway). Closing.

gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 21, 2022

gopherbot added this to Test Flakes Sep 21, 2022

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 21, 2022

rsc moved this to Active in Test Flakes Sep 21, 2022

mknyszek added this to Go Compiler / Runtime Sep 28, 2022

prattmic self-assigned this Oct 5, 2022

prattmic moved this to Todo in Go Compiler / Runtime Oct 5, 2022

seankhliao added this to the Backlog milestone Nov 19, 2022

This comment was marked as resolved.

Sign in to view

bcmills mentioned this issue Nov 16, 2023

runtime/trace: TestTraceCPUProfile failures #64203

Closed

This comment was marked as resolved.

Sign in to view

bcmills mentioned this issue Nov 17, 2023

runtime/trace: TestTraceCPUProfile failures #64205

Closed

mknyszek assigned mknyszek and unassigned prattmic Jan 12, 2024

mknyszek moved this from Todo to In Progress in Go Compiler / Runtime Jan 12, 2024

gopherbot closed this as completed in f5e475e Jan 19, 2024

github-project-automation bot moved this from Active to Done in Test Flakes Jan 19, 2024

github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime Jan 19, 2024

dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 22, 2024

dmitshur modified the milestones: Backlog, Go1.23 Jan 22, 2024

gopherbot reopened this Feb 6, 2024

github-project-automation bot moved this from Done to In Progress in Go Compiler / Runtime Feb 6, 2024

gopherbot modified the milestones: Go1.23, Go1.24 Aug 13, 2024

gopherbot modified the milestones: Go1.24, Go1.25 Feb 11, 2025

mknyszek closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2025

github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/trace: TestTraceCPUProfile failures #55317

runtime/trace: TestTraceCPUProfile failures #55317

gopherbot commented Sep 21, 2022 •

edited by bcmills

Loading

gopherbot commented Sep 21, 2022

rhysh commented Sep 22, 2022

gopherbot commented Oct 25, 2023

gopherbot commented Oct 30, 2023

This comment was marked as resolved.

This comment was marked as resolved.

gopherbot commented Nov 17, 2023

rhysh commented Nov 17, 2023

bcmills commented Nov 17, 2023

gopherbot commented Nov 20, 2023

mknyszek commented Nov 20, 2023 •

edited

Loading

mknyszek commented Nov 20, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 24, 2023

gopherbot commented Nov 27, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 2, 2023

mknyszek commented Jan 9, 2024

mknyszek commented Jan 11, 2024

gopherbot commented Jan 11, 2024

mknyszek commented Jan 11, 2024

gopherbot commented Jan 13, 2024

gopherbot commented Jan 23, 2024

gopherbot commented Feb 6, 2024

gopherbot commented Apr 22, 2024

mknyszek commented May 8, 2025

runtime/trace: TestTraceCPUProfile failures #55317

runtime/trace: TestTraceCPUProfile failures #55317

Comments

gopherbot commented Sep 21, 2022 • edited by bcmills Loading

gopherbot commented Sep 21, 2022

rhysh commented Sep 22, 2022

gopherbot commented Oct 25, 2023

gopherbot commented Oct 30, 2023

This comment was marked as resolved.

This comment was marked as resolved.

gopherbot commented Nov 17, 2023

rhysh commented Nov 17, 2023

bcmills commented Nov 17, 2023

gopherbot commented Nov 20, 2023

mknyszek commented Nov 20, 2023 • edited Loading

mknyszek commented Nov 20, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 24, 2023

gopherbot commented Nov 27, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 1, 2023

gopherbot commented Dec 2, 2023

mknyszek commented Jan 9, 2024

mknyszek commented Jan 11, 2024

gopherbot commented Jan 11, 2024

mknyszek commented Jan 11, 2024

gopherbot commented Jan 13, 2024

gopherbot commented Jan 23, 2024

gopherbot commented Feb 6, 2024

gopherbot commented Apr 22, 2024

mknyszek commented May 8, 2025

gopherbot commented Sep 21, 2022 •

edited by bcmills

Loading

mknyszek commented Nov 20, 2023 •

edited

Loading