Skip to content

runtime/pprof: TestLabelSystemstack due to sample with no location #51550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bcmills opened this issue Mar 8, 2022 · 30 comments
Open

runtime/pprof: TestLabelSystemstack due to sample with no location #51550

bcmills opened this issue Mar 8, 2022 · 30 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 8, 2022

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestLabelSystemstack"

This one is a doozy. The failure message comes from here:
https://cs.opensource.google/go/go/+/master:src/runtime/pprof/pprof_test.go;l=1535;drc=18c2033ba587ce63fc9f2d6f52b8bb2e395c561f

That seems to imply that the sample was labeled but its stack was empty(!!), which does seem to be the case in the dump of collected profiles:

        1: labels: map[key:[value]]

(attn @prattmic, CC @golang/runtime)

--- FAIL: TestLabelSystemstack (0.32s)
    pprof_test.go:524: total 85 CPU profile samples collected:
        1: 0x866eef (_ZN6__tsanL14MemoryRangeSetEPNS_11ThreadStateEyyyy.isra.39.part.40:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
        2: 0x86a6bc (_ZN6__tsan7MetaMap9FreeRangeEPNS_9ProcessorEyy:0) 0x8e1ec0 (runtime._System:4432) labels: map[key:[value]]
        
        1: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a96ad (runtime.notewakeup:161) 0x8dce48 (runtime.startm:2324) 0x8ded1e (runtime.injectglist.func1:3076 runtime.injectglist:3100) 0x8bf5b6 (runtime.wakeScavenger:222) 0x8bf6d6 (runtime.bgscavenge.func1:268) 0x8f6d21 (runtime.runOneTimer:867) 0x8f6ad2 (runtime.runtimer:775) 0x8df2cf (runtime.checkTimers:3286) 0x8de484 (runtime.stealWork:2868) 0x8dda35 (runtime.findrunnable:2599) 0x8defd8 (runtime.schedule:3187) 0x8df52c (runtime.park_m:3336) 0x905f89 (runtime.mcall:425) labels: map[]
        
        13: 0x907ee6 (runtime.procyield:733) 0x8a9475 (runtime.lock2:69) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x866ef7 (_ZN6__tsanL14MemoryRangeSetEPNS_11ThreadStateEyyyy.isra.39.part.40:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
        1: 0x8bf0d0 (runtime.(*gcControllerState).commit:1089) 0x8bf195 (runtime.(*gcControllerState).setGCPercent:1246) 0x8bf204 (runtime.setGCPercent.func1:1256) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        5: 0x8a9402 (runtime.lock2:61) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        27: 0x8d2f84 (runtime.stdcall2:1099) 0x8d204e (runtime.semasleep:819) 0x8a94f7 (runtime.lock2:89) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) labels: map[key:[value]]
        
        4: 0x8a9419 (runtime.lock2:63) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        11: 0x8e1f00 (runtime._ExternalCode:4433) 0x8e1ec0 (runtime._System:4432) labels: map[key:[value]]
        
        10: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a95f5 (runtime.unlock2:117) 0x8bf238 (runtime.unlockWithRank:31 runtime.unlock:97 runtime.setGCPercent.func1:1259) 0x90600d (runtime.systemstack:469) labels: map[key:[value]]
        
        1: 0x8d3004 (runtime.stdcall3:1108) 0x8b5ba4 (runtime.sysUnused:33) 0x8c098a (runtime.(*pageAlloc).scavengeRangeLocked:775) 0x8c07cd (runtime.(*pageAlloc).scavengeOneFast:726) 0x8c0324 (runtime.(*pageAlloc).scavengeOne:637) 0x8bfcfc (runtime.(*pageAlloc).scavenge.func1:454) 0x90600d (runtime.systemstack:469) labels: map[]
        
        3: 0x907ee4 (runtime.procyield:732) 0x8a9475 (runtime.lock2:69) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x8d2037 (runtime.semasleep:819) 0x8a94f7 (runtime.lock2:89) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0xa8de9a (runtime/pprof.labelHog:1549) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a95f5 (runtime.unlock2:117) 0x8c07e7 (runtime.unlockWithRank:31 runtime.unlock:97 runtime.(*pageAlloc).scavengeOneFast:727) 0x8c0324 (runtime.(*pageAlloc).scavengeOne:637) 0x8bfcfc (runtime.(*pageAlloc).scavenge.func1:454) 0x90600d (runtime.systemstack:469) labels: map[]
        
        1: labels: map[key:[value]]
        
        1: 0x89a3a8 (.text$_ZN6__tsan14DenseSlabAllocINS_10ClockBlockELy65536ELy1024EE6RefillEPNS_19DenseSlabAllocCacheE:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
    pprof_test.go:595: runtime.systemstack;key=value: 64
    pprof_test.go:1535: Sample labeled got true want false: 
FAIL
FAIL	runtime/pprof	19.664s

greplogs --dashboard -md -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$'

2022-03-05T08:36:13-55a60ca/windows-amd64-race

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 8, 2022
@bcmills bcmills added this to the Go1.19 milestone Mar 8, 2022
@prattmic
Copy link
Member

prattmic commented Mar 8, 2022

This is indeed interesting. I suspect that a sample with no location is not a new issue, just that this test is one of few to care.

Unfortunately, from this information, we can't tell if this is an issue with the runtime internals failing to collect even a single stack frame, or with proto encoding failing to emit locations from frames.

For this test, the best workaround is to skip samples with no locations. But I would like a reproducer with which to investigate the underlying issue, and notably I'd like to know if this is limited to windows. So for now I'd like to leave the test enabled to gather more data. I am also running a more strict version of this test (fail on any samples with no location) on a windows-amd64-race gomote overnight to see if I get anything.

@prattmic prattmic changed the title runtime/pprof: TestLabelSystemstack failure with " runtime/pprof: TestLabelSystemstack due to sample with no location Mar 8, 2022
@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. OS-Windows and removed release-blocker labels Mar 8, 2022
@gopherbot
Copy link
Contributor

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

Repository owner moved this from Todo to Done in Go Compiler / Runtime Apr 8, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Apr 11, 2022

No hits since then, but no reason to suspect that it is fixed either.

greplogs --dashboard -md -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$' --since=2022-03-08

@bcmills bcmills reopened this Apr 11, 2022
@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 11, 2022
@cuonglm
Copy link
Member

cuonglm commented Sep 2, 2022

Just hit this once today in https://storage.googleapis.com/go-build-log/d8188e3c/linux-amd64-nounified_2330af42.log when testing CL https://go-review.googlesource.com/c/go/+/426156. Maybe the CL is the culprit, but not sure either, so post it here and re-run the trybot to see if the problem disappear.

@cuonglm
Copy link
Member

cuonglm commented Sep 2, 2022

Just hit this once today in https://storage.googleapis.com/go-build-log/d8188e3c/linux-amd64-nounified_2330af42.log when testing CL https://go-review.googlesource.com/c/go/+/426156. Maybe the CL is the culprit, but not sure either, so post it here and re-run the trybot to see if the problem disappear.

Re-run trybot successfully, so sounds like the test still flaky somehow.

@bcmills
Copy link
Contributor Author

bcmills commented Sep 6, 2022

greplogs -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$' --since=2022-03-08
2022-09-06T15:49:33-e3885c4/linux-amd64-longtest
2022-04-29T02:01:27-e7c56fe/windows-amd64-longtest

@bcmills bcmills reopened this Sep 6, 2022
Repository owner moved this from Done to In Progress in Go Compiler / Runtime Sep 6, 2022
@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 6, 2022
Repository owner moved this from In Progress to Done in Go Compiler / Runtime Sep 6, 2022
@bcmills bcmills modified the milestones: Go1.19, Go1.20 Sep 6, 2022
@bcmills bcmills reopened this Sep 6, 2022
Repository owner moved this from Done to In Progress in Go Compiler / Runtime Sep 6, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/460695 mentions this issue: runtime/pprof: document possibility of empty stacks

@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Jan 5, 2023
gopherbot pushed a commit that referenced this issue Jan 9, 2023
I spent quite a while determining the cause of empty stacks in
profiles and reasoning out why this is okay. There isn't a great place
to record this knowledge, but a documentation comment on
appendLocsForStack is better than nothing.

Updates #51550.

Change-Id: I2eefc6ea31f1af885885c3d96199319f45edb4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/460695
Reviewed-by: Felix Geisendörfer <[email protected]>
Run-TryBot: Austin Clements <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
@github-project-automation github-project-automation bot moved this from Todo to Done in Go Compiler / Runtime Jan 9, 2023
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestLabelSystemstack"
2023-01-23 15:51 openbsd-amd64-71 go@e22bd234 runtime/pprof.TestLabelSystemstack (log)
SIGQUIT: quit
PC=0x24a24c2ba m=9 sigcode=0

rax    0x4
rbx    0x25ea93920
rcx    0x24a24c2ba
rdx    0x0
rdi    0x3
rsi    0x0
rbp    0x25ea93870
...
	/tmp/workdir/go/src/runtime/chan.go:583 +0x49d fp=0xc000070d10 sp=0xc000070c80 pc=0x4063dd
runtime.chanrecv1(0x0?, 0xc000018148?)
	/tmp/workdir/go/src/runtime/chan.go:442 +0x18 fp=0xc000070d38 sp=0xc000070d10 pc=0x405f18
runtime/pprof.StopCPUProfile()
	/tmp/workdir/go/src/runtime/pprof/pprof.go:839 +0xa5 fp=0xc000070d78 sp=0xc000070d38 pc=0x544dc5
runtime/pprof.testCPUProfile(0xc00028c4e0, 0xc0000ac3b0?, 0xc000070ea8)
	/tmp/workdir/go/src/runtime/pprof/pprof_test.go:466 +0x1df fp=0xc000070e10 sp=0xc000070d78 pc=0x55325f
runtime/pprof.TestLabelSystemstack(0xc00028c4e0)
	/tmp/workdir/go/src/runtime/pprof/pprof_test.go:1860 +0xc5 fp=0xc000070f70 sp=0xc000070e10 pc=0x55c225
testing.tRunner(0xc00028c4e0, 0x5c0270)

watchflakes

@gopherbot gopherbot reopened this Jan 25, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in Go Compiler / Runtime Jan 25, 2023
@cherrymui
Copy link
Member

The failure is #57585 .

@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime Jan 25, 2023
@gopherbot gopherbot reopened this Feb 24, 2023
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestLabelSystemstack"
2023-02-14 17:25 darwin-amd64-10_15 go@5c3e11bd runtime/pprof.TestLabelSystemstack (log)
--- FAIL: TestLabelSystemstack (0.24s)
    pprof_test.go:433: uname -a: Darwin Gophers-iMac-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64
    pprof_test.go:541: total 18 CPU profile samples collected:
        1: 0x1054e6b (runtime.usleep:250) 0x10450c4 (runtime.runqgrab:5985) 0x104521c (runtime.runqsteal:6020) 0x103ded1 (runtime.stealWork:2961) 0x103d318 (runtime.findRunnable:2670) 0x103e83d (runtime.schedule:3214) 0x103ed6c (runtime.park_m:3363) 0x1066402 (runtime.mcall:448) labels: map[]

        1: 0x1055726 (runtime.libcCall:49) 0x1055553 (runtime.pthread_cond_wait:450) 0x10325ac (runtime.semasleep:66) 0x100b324 (runtime.notesleep:181) 0x103c36b (runtime.mPark:1457 runtime.stopm:2247) 0x103ce49 (runtime.gcstopm:2505) 0x103d06e (runtime.findRunnable:2571) 0x103e83d (runtime.schedule:3214) 0x103ed6c (runtime.park_m:3363) 0x1066402 (runtime.mcall:448) labels: map[]

        1: 0x1055373 (runtime.kevent:401) 0x103218b (runtime.netpoll:127) 0x103d28f (runtime.findRunnable:2647) 0x103e83d (runtime.schedule:3214) 0x103ed6c (runtime.park_m:3363) 0x1066402 (runtime.mcall:448) labels: map[]

        1: 0x1055726 (runtime.libcCall:49) 0x1055553 (runtime.pthread_cond_wait:450) 0x10325ac (runtime.semasleep:66) 0x100b324 (runtime.notesleep:181) 0x103c36b (runtime.mPark:1457 runtime.stopm:2247) 0x103da07 (runtime.findRunnable:2874) 0x103e83d (runtime.schedule:3214) 0x103ed6c (runtime.park_m:3363) 0x1066402 (runtime.mcall:448) labels: map[]
...
        1: 0x1054e6b (runtime.usleep:250) 0x100b00a (runtime.osyield:351 runtime.lock2:70) 0x10210d5 (runtime.lockWithRank:22 runtime.lock:35 runtime.setGCPercent.func1:1329) 0x1066488 (runtime.systemstack:492) labels: map[key:[value]]

        1: 0x100b1dc (runtime.unlock2:126) 0x1021132 (runtime.unlockWithRank:32 runtime.unlock:96 runtime.setGCPercent.func1:1332) 0x1066488 (runtime.systemstack:492) 0x1062b25 (runtime/debug.setGCPercent:1328) 0x11503b1 (runtime/debug.SetGCPercent:96 runtime/pprof.labelHog:1904) 0x115059b (runtime/pprof.parallelLabelHog.func1:1917) labels: map[key:[value]]

        1: 0x1055726 (runtime.libcCall:49) 0x1054fbe (runtime.nanotime1:285) 0x103a30f (runtime.nanotime:19 runtime.casgstatus:1017) 0x103edde (runtime.goschedImpl:3372) 0x103eff0 (runtime.gosched_m:3386) 0x1066402 (runtime.mcall:448) labels: map[key:[value]]

        1: labels: map[key:[value]]

    pprof_test.go:612: runtime.systemstack;key=value: 7
    pprof_test.go:1887: Sample labeled got true want false: 

watchflakes

@bcmills
Copy link
Contributor Author

bcmills commented Feb 24, 2023

The above failure was on release-branch.go1.19. How do we feel about backporting the fix?

@prattmic
Copy link
Member

prattmic commented Mar 8, 2023

@gopherbot Please backport to 1.19. This is a fix for a flaky test, which is also flaky on 1.19.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #58939 (for 1.19).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/474618 mentions this issue: [release-branch.go1.19] runtime/pprof: improve output of TestLabelSystemstack

gopherbot pushed a commit that referenced this issue Mar 15, 2023
…temstack

The current output of TestLabelSystemstack is a bit cryptic. This CL
improves various messages and hopefully simplifies the logic in the
test.

Simplifying the logic leads to three changes in possible outcomes,
which I verified by running the logic before and after this change
through all 2^4 possibilities (https://go.dev/play/p/bnfb-OQCT4j):

1. If a sample both must be labeled and must not be labeled, the test
now reports that explicitly rather than giving other confusing output.

2. If a sample must not be labeled but is, the current logic will
print two identical error messages. The new logic prints only one.

3. If the test finds no frames at all that it recognizes, but the
sample is labeled, it will currently print a confusing "Sample labeled
got true want false" message. The new logic prints nothing. We've seen
this triggered by empty stacks in profiles.

Fixes #51550. This bug was caused by case 3 above, where it was
triggered by a profile label on an empty stack. It's valid for empty
stacks to appear in a profile if we sample a goroutine just as it's
exiting (and that goroutine may have a profile label), so the test
shouldn't fail in this case.

For #58939.

Change-Id: I1593ec4ac33eced5bb89572a3ba7623e56f2fb3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/460516
Run-TryBot: Austin Clements <[email protected]>
Reviewed-by: Felix Geisendörfer <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
(cherry picked from commit d9f23cf)
Reviewed-on: https://go-review.googlesource.com/c/go/+/474618
Run-TryBot: Michael Pratt <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
@gopherbot gopherbot reopened this Apr 18, 2024
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestLabelSystemstack"
2024-04-03 15:35 go1.21-linux-arm release-branch.go1.21@d8392e69 runtime/pprof.TestLabelSystemstack [ABORT] (log)
=== RUN   TestLabelSystemstack
SIGQUIT: quit
PC=0x8b78c m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex()
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/sys_linux_arm.s:416 +0x1c fp=0xffab15a8 sp=0xffab15a8 pc=0x8b78c
runtime.futexsleep(0x3080d0, 0x0, 0xffffffffffffffff)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/os_linux.go:69 +0x60 fp=0xffab15d0 sp=0xffab15a8 pc=0x4b670
runtime.notesleep(0x3080d0)
...
r9      0x308750
r10     0x307b48
fp      0x6
ip      0x4
sp      0xffab15a8
lr      0x4b670
pc      0x8b78c
cpsr    0xa0800010
fault   0x0
*** Test killed with quit: ran too long (10m0s).

watchflakes

@dmitshur dmitshur modified the milestones: Go1.20, Backlog May 30, 2024
@dmitshur dmitshur moved this from Done to Active in Test Flakes May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Status: Active
Development

No branches or pull requests

9 participants