Closed
Description
Just saw this TestGdbPython flake on linux-amd64:
https://storage.googleapis.com/go-build-log/149e81d1/linux-amd64_fd28318f.log
--- FAIL: TestGdbPython (3.49s)
runtime-gdb_test.go:59: gdb version 7.7
runtime-gdb_test.go:193: gdb output: Loading Go Runtime support.
Loaded Script
Yes /tmp/workdir/go/src/runtime/runtime-gdb.py
Breakpoint 1 at 0x47c190: file /tmp/workdir/go/src/fmt/print.go, line 263.
Breakpoint 1, fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
263 func Println(a ...interface{}) (n int, err error) {
BEGIN info goroutines
* 1 running runtime.systemstack_switch
* 2 running runtime.forcegchelper
3 waiting runtime.gopark
4 runnable runtime.runfinq
END
#1 0x00000000004828a0 in main.main () at /tmp/go-build461513624/main.go:14
14 fmt.Println("hi")
BEGIN print mapvar
$1 = map[string]string = {["ghi"] = "jkl", ["abc"] = "def"}
END
BEGIN print strvar
$2 = "abc"
END
BEGIN info locals
mapvar = map[string]string = {["ghi"] = "jkl", ["abc"] = "def"}
slicevar = []string = {"def"}
strvar = "abc"
END
#0 fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
263 func Println(a ...interface{}) (n int, err error) {
BEGIN goroutine 1 bt
#0 fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
#1 0x00000000004828a0 in main.main () at /tmp/go-build461513624/main.go:14
END
BEGIN goroutine 2 bt
No such goroutine: 2
END
Breakpoint 2 at 0x4828cd: file /tmp/go-build461513624/main.go, line 18.
hi
Breakpoint 2, main.main () at /tmp/go-build461513624/main.go:19
19 } // END_OF_PROGRAM
BEGIN goroutine 1 bt at the end
#0 main.main () at /tmp/go-build461513624/main.go:19
END
runtime-gdb_test.go:258: goroutine 2 bt failed: No such goroutine: 2
FAIL
FAIL runtime 28.524s
/cc @aclements
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
aclements commentedon Apr 3, 2018
@hyangah, is this similar to the bug you fixed recently about getting the state of goroutines?
hyangah commentedon Apr 4, 2018
@aclements do you mean https://go-review.googlesource.com/c/go/+/49691?
I don't know. Maybe.
If when the test passes the gdb output should look like the following
Note the difference in the output of 'info goroutine' (2 running goroutines vs 1 running goroutine).
Is there any way to reliably reproduce the failure case?
I tried to gomote and test in the linux-amd64 buildlet, but failed to reproduce the failure with -count=1000 (too ~450s). Not from my linux either. I tried larger than 1000 in gomote and the run was SIGQUIT.
bcmills commentedon Aug 27, 2019
Here's a flake with very similar output on
linux-ppc64le-buildlet
:https://build.golang.org/log/f5c124d74c9e6a71da5614b8b13db9328ec08910
[-]runtime: TestGdbPython flake on linux-amd64[/-][+]runtime: TestGdbPython flaky on linux[/+]20 remaining items
gopherbot commentedon Mar 31, 2020
Change https://golang.org/cl/226558 mentions this issue:
test: deflaking measures for runtime gdb test
test: deflaking measures for runtime gdb test
josharian commentedon Apr 14, 2020
This happened again: https://storage.googleapis.com/go-build-log/72c918bb/linux-amd64-race_026d95f9.log
Still no goroutine 2. Looks like there actually is a goroutine 2, doing bgsweep. Does gdb prevent us from doing bt on a runtime goroutine?
laboger commentedon Apr 14, 2020
In the log:
I think each #0 is the top (bottom?) of the stack for a goroutine. Note at the top it still says No goroutine 2. The bgsweep is part of the stack for another goroutine, and based what was shown above, that was goroutine 3.
It still seems that goroutine 2 has exited by the time the second bt is attempted.
josharian commentedon Apr 14, 2020
What then is it even trying to test? Perhaps we should just delete the bt 2? We cannot reliably identify goroutines just by their number. I guess the alternative is to do some python scripting to parse ‘bt all’, identify the goroutine of interest, and bt it. Or we could parse ‘bt all’ and backtrace all live goroutines.
I’m mostly inclined to delete ‘bt 2’. Opinions?
thanm commentedon Apr 14, 2020
Agree on the analysis; I am find with deleting the bt 2. Parsing "bt all" or "info goroutines" to find a specific goroutine seems like overkill.
laboger commentedon Apr 14, 2020
My assumption is that the purpose of the 'bt 2' was just to test the backtrace output. If you leave in the 'bt all' and remove 'bt 2' that should test it? I honestly don't know why goroutines would come and go but if it is gone that's not an error with gdb python but an expectation of the test.
bcmills commentedon May 11, 2020
Still flaky after CL 226558, unfortunately:
2020-05-08T00:07:39-f0cea84/linux-386-387
josharian commentedon May 11, 2020
Yep. See the last few comments above, which include a plan for moving forward. I’m AFK now but feel free to send a CL. Should be a simple one.
gopherbot commentedon May 14, 2020
Change https://golang.org/cl/233942 mentions this issue:
runtime: remove flaky "goroutine 2 bt" from gdb test