Skip to content

runtime: TestGdbPython flaky on linux #24616

Closed
@bradfitz

Description

@bradfitz

Just saw this TestGdbPython flake on linux-amd64:

https://storage.googleapis.com/go-build-log/149e81d1/linux-amd64_fd28318f.log

--- FAIL: TestGdbPython (3.49s)
	runtime-gdb_test.go:59: gdb version 7.7
	runtime-gdb_test.go:193: gdb output: Loading Go Runtime support.
		Loaded  Script                                                                 
		Yes     /tmp/workdir/go/src/runtime/runtime-gdb.py                             
		Breakpoint 1 at 0x47c190: file /tmp/workdir/go/src/fmt/print.go, line 263.
		
		Breakpoint 1, fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
		263	func Println(a ...interface{}) (n int, err error) {
		BEGIN info goroutines
		* 1 running  runtime.systemstack_switch
		* 2 running  runtime.forcegchelper
		  3 waiting  runtime.gopark
		  4 runnable runtime.runfinq
		END
		#1  0x00000000004828a0 in main.main () at /tmp/go-build461513624/main.go:14
		14		fmt.Println("hi")
		BEGIN print mapvar
		$1 = map[string]string = {["ghi"] = "jkl", ["abc"] = "def"}
		END
		BEGIN print strvar
		$2 = "abc"
		END
		BEGIN info locals
		mapvar = map[string]string = {["ghi"] = "jkl", ["abc"] = "def"}
		slicevar =  []string = {"def"}
		strvar = "abc"
		END
		#0  fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
		263	func Println(a ...interface{}) (n int, err error) {
		BEGIN goroutine 1 bt
		#0  fmt.Println (a=..., err=..., n=<optimized out>) at /tmp/workdir/go/src/fmt/print.go:263
		#1  0x00000000004828a0 in main.main () at /tmp/go-build461513624/main.go:14
		END
		BEGIN goroutine 2 bt
		No such goroutine:  2
		END
		Breakpoint 2 at 0x4828cd: file /tmp/go-build461513624/main.go, line 18.
		hi
		
		Breakpoint 2, main.main () at /tmp/go-build461513624/main.go:19
		19	}  // END_OF_PROGRAM
		BEGIN goroutine 1 bt at the end
		#0  main.main () at /tmp/go-build461513624/main.go:19
		END
		
	runtime-gdb_test.go:258: goroutine 2 bt failed: No such goroutine:  2
FAIL
FAIL	runtime	28.524s

/cc @aclements

Activity

added
TestingAn issue that has been verified to require only test changes, not just a test failure.
NeedsFixThe path to resolution is known, but the work has not been done.
on Mar 30, 2018
added this to the Go1.11 milestone on Mar 30, 2018
aclements

aclements commented on Apr 3, 2018

@aclements
Member

@hyangah, is this similar to the bug you fixed recently about getting the state of goroutines?

hyangah

hyangah commented on Apr 4, 2018

@hyangah
Contributor

@aclements do you mean https://go-review.googlesource.com/c/go/+/49691?
I don't know. Maybe.

If when the test passes the gdb output should look like the following

--- PASS: TestGdbPython (0.44s)
	runtime-gdb_test.go:59: gdb version 7.7
	runtime-gdb_test.go:193: gdb output: Loading Go Runtime support.
		Loaded  Script                                                                 
		Yes     /tmp/workdir/go/src/runtime/runtime-gdb.py                             
		Breakpoint 1 at 0x47c1c0: file /tmp/workdir/go/src/fmt/print.go, line 263.
		
		Breakpoint 1, fmt.Println (a=..., err=..., n=) at /tmp/workdir/go/src/fmt/print.go:263
		263	func Println(a ...interface{}) (n int, err error) {
		BEGIN info goroutines
		* 1 running  runtime.systemstack_switch
		  2 waiting  runtime.gopark
		  17 waiting  runtime.gopark
		  33 runnable runtime.runfinq
		END
		#1  0x00000000004828d0 in main.main () at /tmp/go-build668958384/main.go:14
		14		fmt.Println("hi")
		BEGIN print mapvar
		$1 = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
		END
		BEGIN print strvar
		$2 = "abc"
		END
		BEGIN info locals
		mapvar = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
		slicevar =  []string = {"def"}
		strvar = "abc"
		END
		#0  fmt.Println (a=..., err=..., n=) at /tmp/workdir/go/src/fmt/print.go:263
		263	func Println(a ...interface{}) (n int, err error) {
		BEGIN goroutine 1 bt
		#0  fmt.Println (a=..., err=..., n=) at /tmp/workdir/go/src/fmt/print.go:263
		#1  0x00000000004828d0 in main.main () at /tmp/go-build668958384/main.go:14
		END
		BEGIN goroutine 2 bt
		#0  runtime.gopark (lock=0x528af0 , reason="force gc (idle)", traceEv=20 '\024', traceskip=1, unlockf=) at /tmp/workdir/go/src/runtime/proc.go:292
		#1  0x0000000000428c5e in runtime.goparkunlock (lock=, reason=..., traceEv=, traceskip=) at /tmp/workdir/go/src/runtime/proc.go:297
		#2  0x00000000004289ea in runtime.forcegchelper () at /tmp/workdir/go/src/runtime/proc.go:248
		#3  0x000000000044ee01 in runtime.goexit () at /tmp/workdir/go/src/runtime/asm_amd64.s:1385
		#4  0x0000000000000000 in ?? ()
		END
		Breakpoint 2 at 0x4828fd: file /tmp/go-build668958384/main.go, line 18.
		hi
		
		Breakpoint 2, main.main () at /tmp/go-build668958384/main.go:19
		19	}  // END_OF_PROGRAM
		BEGIN goroutine 1 bt at the end
		#0  main.main () at /tmp/go-build668958384/main.go:19
		END

Note the difference in the output of 'info goroutine' (2 running goroutines vs 1 running goroutine).

Is there any way to reliably reproduce the failure case?
I tried to gomote and test in the linux-amd64 buildlet, but failed to reproduce the failure with -count=1000 (too ~450s). Not from my linux either. I tried larger than 1000 in gomote and the run was SIGQUIT.

modified the milestones: Go1.11, Go1.12 on Jul 7, 2018
modified the milestones: Go1.12, Go1.13 on Feb 12, 2019
modified the milestones: Go1.13, Go1.14 on Jul 8, 2019
bcmills

bcmills commented on Aug 27, 2019

@bcmills
Contributor

Here's a flake with very similar output on linux-ppc64le-buildlet:
https://build.golang.org/log/f5c124d74c9e6a71da5614b8b13db9328ec08910

changed the title [-]runtime: TestGdbPython flake on linux-amd64[/-] [+]runtime: TestGdbPython flaky on linux[/+] on Aug 27, 2019

20 remaining items

gopherbot

gopherbot commented on Mar 31, 2020

@gopherbot
Contributor

Change https://golang.org/cl/226558 mentions this issue: test: deflaking measures for runtime gdb test

josharian

josharian commented on Apr 14, 2020

@josharian
Contributor

This happened again: https://storage.googleapis.com/go-build-log/72c918bb/linux-amd64-race_026d95f9.log

Still no goroutine 2. Looks like there actually is a goroutine 2, doing bgsweep. Does gdb prevent us from doing bt on a runtime goroutine?

laboger

laboger commented on Apr 14, 2020

@laboger
Contributor

In the log:

BEGIN goroutine all bt
        #0  main.main () at /workdir/tmp/go-build562481541/main.go:18
        No such goroutine:  2
        #0  runtime.gopark (unlockf={void (runtime.g *, void *, bool *)} 0xc00002b7a8, lock=0x56bf20 <runtime.sweep>, reason=12 '\f', traceEv=20 '\024', traceskip=1) at /workdir/go/src/runtime/proc.go:307
        #1  0x000000000041fcee in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /workdir/go/src/runtime/proc.go:312
        #2  runtime.bgsweep (c=0xc000014070) at /workdir/go/src/runtime/mgcsweep.go:71
        #3  0x000000000045f8e1 in runtime.goexit () at /workdir/go/src/runtime/asm_amd64.s:1374
        #4  0x000000c000014070 in ?? ()
        #5  0x0000000000000000 in ?? ()
        #0  runtime.gopark (unlockf={void (runtime.g *, void *, bool *)} 0xc00002bf78, lock=0x56bee0 <runtime.scavenge>, reason=13 '\r', traceEv=20 '\024', traceskip=1) at /workdir/go/src/runtime/proc.go:307
        #1  0x000000000041e3c2 in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /workdir/go/src/runtime/proc.go:312
        #2  runtime.bgscavenge (c=0xc000014070) at /workdir/go/src/runtime/mgcscavenge.go:238
        #3  0x000000000045f8e1 in runtime.goexit () at /workdir/go/src/runtime/asm_amd64.s:1374
        #4  0x000000c000014070 in ?? ()
        #5  0x0000000000000000 in ?? ()
        #0  runtime.runfinq () at /workdir/go/src/runtime/mfinal.go:161
        #1  0x000000000045f8e1 in runtime.goexit () at /workdir/go/src/runtime/asm_amd64.s:1374
        #2  0x0000000000000000 in ?? ()
        #0  main.main.func1 () at /workdir/tmp/go-build562481541/main.go:7
        #1  0x000000000045f8e1 in runtime.goexit () at /workdir/go/src/runtime/asm_amd64.s:1374
        #2  0x0000000000000000 in ?? ()
        END

I think each #0 is the top (bottom?) of the stack for a goroutine. Note at the top it still says No goroutine 2. The bgsweep is part of the stack for another goroutine, and based what was shown above, that was goroutine 3.

It still seems that goroutine 2 has exited by the time the second bt is attempted.

josharian

josharian commented on Apr 14, 2020

@josharian
Contributor

What then is it even trying to test? Perhaps we should just delete the bt 2? We cannot reliably identify goroutines just by their number. I guess the alternative is to do some python scripting to parse ‘bt all’, identify the goroutine of interest, and bt it. Or we could parse ‘bt all’ and backtrace all live goroutines.

I’m mostly inclined to delete ‘bt 2’. Opinions?

thanm

thanm commented on Apr 14, 2020

@thanm
Contributor

Agree on the analysis; I am find with deleting the bt 2. Parsing "bt all" or "info goroutines" to find a specific goroutine seems like overkill.

laboger

laboger commented on Apr 14, 2020

@laboger
Contributor

My assumption is that the purpose of the 'bt 2' was just to test the backtrace output. If you leave in the 'bt all' and remove 'bt 2' that should test it? I honestly don't know why goroutines would come and go but if it is gone that's not an error with gdb python but an expectation of the test.

bcmills

bcmills commented on May 11, 2020

@bcmills
Contributor

Still flaky after CL 226558, unfortunately:
2020-05-08T00:07:39-f0cea84/linux-386-387

--- FAIL: TestGdbPython (0.62s)
    runtime-gdb_test.go:71: gdb version 7.7
    runtime-gdb_test.go:249: gdb output: Loading Go Runtime support.
        Loaded  Script                                                                 
        Yes     /workdir/go/src/runtime/runtime-gdb.py                                 
        Breakpoint 1 at 0x80cc592: file /workdir/tmp/go-build244785334/main.go, line 16.
        hi
        
        Breakpoint 1, main.main () at /workdir/tmp/go-build244785334/main.go:18
        18		gslice = slicevar
        BEGIN info goroutines
        * 1 running  syscall.Syscall
        * 2 running  runtime.gopark
          3 waiting  runtime.gopark
          4 waiting  runtime.gopark
          5 runnable runtime.runfinq
          6 runnable main.main.func1
        END
        BEGIN print mapvar
        $1 = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
        END
        BEGIN print strvar
        $2 = "abc"
        END
        BEGIN info locals
        mapvar = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
        strvar = "abc"
        slicevar =  []string
        END
        BEGIN goroutine 1 bt
        #0  main.main () at /workdir/tmp/go-build244785334/main.go:18
        END
        BEGIN goroutine 2 bt
        No such goroutine:  2
        END
        BEGIN goroutine all bt
        #0  main.main () at /workdir/tmp/go-build244785334/main.go:18
        No such goroutine:  2
        #0  runtime.gopark (unlockf={void (runtime.g *, void *, bool *)} 0x84277d4, lock=0x816eb40 <runtime.sweep>, reason=12 '\f', traceEv=20 '\024', traceskip=1) at /workdir/go/src/runtime/proc.go:307
        #1  0x0806584f in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /workdir/go/src/runtime/proc.go:312
        #2  runtime.bgsweep (c=0x844c000) at /workdir/go/src/runtime/mgcsweep.go:163
        #3  0x0809df41 in runtime.goexit () at /workdir/go/src/runtime/asm_386.s:1333
        #4  0x0844c000 in ?? ()
        #0  runtime.gopark (unlockf={void (runtime.g *, void *, bool *)} 0x8427f9c, lock=0x816eae0 <runtime.scavenge>, reason=13 '\r', traceEv=20 '\024', traceskip=1) at /workdir/go/src/runtime/proc.go:307
        #1  0x08063b2e in runtime.goparkunlock (lock=<optimized out>, reason=<optimized out>, traceEv=<optimized out>, traceskip=<optimized out>) at /workdir/go/src/runtime/proc.go:312
        #2  runtime.bgscavenge (c=0x844c000) at /workdir/go/src/runtime/mgcscavenge.go:260
        #3  0x0809df41 in runtime.goexit () at /workdir/go/src/runtime/asm_386.s:1333
        #4  0x0844c000 in ?? ()
        #0  runtime.runfinq () at /workdir/go/src/runtime/mfinal.go:161
        #1  0x0809df41 in runtime.goexit () at /workdir/go/src/runtime/asm_386.s:1333
        #2  0x00000000 in ?? ()
        #0  0x080cc610 in main.main.func1 ()
        #1  0x0809df41 in runtime.goexit () at /workdir/go/src/runtime/asm_386.s:1333
        #2  0x00000000 in ?? ()
        END
        No breakpoint at main.go:15.
        Breakpoint 2 at 0x80cc5ba: file /workdir/tmp/go-build244785334/main.go, line 19.
        
        Breakpoint 2, main.main () at /workdir/tmp/go-build244785334/main.go:20
        20	}  // END_OF_PROGRAM
        BEGIN goroutine 1 bt at the end
        #0  main.main () at /workdir/tmp/go-build244785334/main.go:20
        END
        
    runtime-gdb_test.go:100: malformed backtrace at line 0: No such goroutine:  2
FAIL
FAIL	runtime	31.765s
josharian

josharian commented on May 11, 2020

@josharian
Contributor

Yep. See the last few comments above, which include a plan for moving forward. I’m AFK now but feel free to send a CL. Should be a simple one.

gopherbot

gopherbot commented on May 14, 2020

@gopherbot
Contributor

Change https://golang.org/cl/233942 mentions this issue: runtime: remove flaky "goroutine 2 bt" from gdb test

locked and limited conversation to collaborators on May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.OS-LinuxTestingAn issue that has been verified to require only test changes, not just a test failure.help wanted

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@josharian@rsc@andybons@aclements

        Issue actions

          runtime: TestGdbPython flaky on linux · Issue #24616 · golang/go