Skip to content

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

Open
@josharian

Description

@josharian

At b84e0bc, on darwin/amd64:

$ go test -run=TestDWARF/testprog -count=10 cmd/link
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (2.03s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.50s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.78s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.52s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.96s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.83s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.98s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#02 (1.99s)
        dwarf_test.go:96: exit status 255
FAIL
FAIL	cmd/link	47.577s
FAIL

I've also seen

--- FAIL: TestDWARF (0.22s)
    --- FAIL: TestDWARF/testprog (3.00s)
        dwarf_test.go:96: exit status 255
FAIL

The failures are intermittent.

cc @dr2chase

Activity

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on May 24, 2019
added this to the Go1.13 milestone on May 24, 2019
cherrymui

cherrymui commented on May 24, 2019

@cherrymui
Member

Happened for me as well, with exactly same error. I commented on CL https://go-review.googlesource.com/c/go/+/170377

odeke-em

odeke-em commented on Jun 23, 2019

@odeke-em
Member

Running on c11f6c4

go version devel +c11f6c4929 Fri Jun 21 21:24:47 2019 +0000 darwin/amd64

uname

uname -a
Darwin Emmanuels-MacBook-Pro-2.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

On the first try I get a failure but this time spectacularly with a 1.2MB log whose beginning is something like

--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprogcgo#01 (2.33s)
        dwarf_test.go:96: symbols testprogcgo.exe: exit status 255: /var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [arm64, 0.004386 seconds]:
                A829102B-EE24-329C-874B-9BAAE2337C79 /private/var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [AOUT, PIE, FaultedFromDisk]
                    0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
                    0x0000000100000000 ( 0xbc000) __TEXT SEGMENT
                        0x0000000100000000 (  0x4c90) MACH_HEADER
                        0x0000000100004c90 ( 0xb67a4) __TEXT __text
                            0x0000000100004c90 (    0x68) runtime.text [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004cf8 (    0x40) internal/cpu.Initialize [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004d38 (   0x620) internal/cpu.processOptions [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005358 (    0x50) internal/cpu.indexByte [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x00000001000053a8 (   0x680) internal/cpu.doinit [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a28 (    0x10) type..hash.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a38 (    0x10) type..eq.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a48 (    0x50) type..hash.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a98 (    0x60) type..eq.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005af8 (    0x70) type..hash.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005b68 (    0xa0) type..eq.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c08 (    0x50) type..hash.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c58 (    0x50) type..eq.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005ca8 (    0x70) type..hash.[24]internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
....
<FRAMES_INTENTIONALLY_ELIDED_BY_MYSELF_SINCE_IT_IS_A_LARGE_FILE>
                            0x00000001001e3d20 (    0x10) syscall.envOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d30 (    0x10) time.localOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d40 (    0x10) runtime.writeBarrier [NameNList, MangledNameNList, NList]
                            0x00000001001e3d50 (    0x20) syscall.ForkLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3d70 (    0x30) syscall.envLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3da0 (    0x20) runtime.hashkey [NameNList, MangledNameNList, NList]
                            0x00000001001e3dc0 (    0x40) runtime.newmHandoff [NameNList, MangledNameNList, NList]
                            0x00000001001e3e00 (    0x40) runtime.sig [NameNList, MangledNameNList, NList]
                            0x00000001001e3e40 (    0x40) runtime.execLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3e80 (    0x40) runtime.debug [NameNList, MangledNameNList, NList]
                            0x00000001001e3ec0 (    0x40) runtime.finptrmask [NameNList, MangledNameNList, NList]
                            0x00000001001e3f00 (    0x80) runtime.aeskeysched [NameNList, MangledNameNList, NList]
                            0x00000001001e3f80 (    0x80) runtime.handlingSig [NameNList, MangledNameNList, NList]
                            0x00000001001e4000 (    0xa0) internal/cpu.X86 [NameNList, MangledNameNList, NList]
                            0x00000001001e40a0 (    0xa0) internal/cpu.ARM64 [NameNList, MangledNameNList, NList]
                            0x00000001001e4140 (    0xa0) runtime.gcController [NameNList, MangledNameNList, NList]
                            0x00000001001e41e0 (   0x100) runtime.fwdSig [NameNList, MangledNameNList, NList]
                            0x00000001001e42e0 (   0x200) runtime.printBacklog [NameNList, MangledNameNList, NList]
                            0x00000001001e44e0 (   0x400) runtime.zeroVal [NameNList, MangledNameNList, NList]
                            0x00000001001e48e0 (  0x1758) runtime.memstats [NameNList, MangledNameNList, NList]
                    0x00000001001e8000 ( 0x7c000) __LINKEDIT SEGMENT
            symbols[65885]: [fatal]  child process status could not be determined; exit code unavailable.
FAIL
exit status 1
FAIL    cmd/link        47.568s

with the output at https://gist.github.com/odeke-em/85bb51ded80d9239a97c435f875b04c5

aclements

aclements commented on Jun 27, 2019

@aclements
Member

Some notes:

As far as I can tell, this has never happened on the dashboard based on greplogs -E 'dwarf_test.*(exit status 255|symbols)' -dashboard.

CL 171819 added better logging for this (previously we didn't print the symbols output). That's why the original report just has "exit status 255", but @odeke-em's report has much more information.

Unfortunately, it seems like the symbols tool may just be flaky. Just searching around for that error reveals other people seeing it occasionally in various macOS development tools outside the context of Go.

Could someone with a Mac try just running symbols (or maybe the test) in a loop to see how often it flakes?

dr2chase

dr2chase commented on Jun 27, 2019

@dr2chase
Contributor

It may require Darwin 18.5, which is one back from current.
I've tried running it in a loop with luck at seeing the bug, will try some more.
(Maybe a busy Mac helps?)

dr2chase

dr2chase commented on Jun 27, 2019

@dr2chase
Contributor

About 2000 repetitions on Darwin 18.6 (aka Mojave, macOS 10.14.5), with no failure, on a busy-ish machine. Still running.

added
TestingAn issue that has been verified to require only test changes, not just a test failure.
on Jul 16, 2019
ianlancetaylor

ianlancetaylor commented on Aug 2, 2019

@ianlancetaylor
Contributor

In the output above the symbols program completes, but then reports an error symbols[65885]: [fatal] child process status could not be determined; exit code unavailable. I can see the error string if I run strings on /usr/bin/symbols, so it's definitely coming from there. I can't find any reports of this error on the Interwebs (the top search result for me is in fact this issue). I don't know why the symbols program would start a child process at all. I don't know why it would be unable to determine the status of that child process. I can't find the source code of the symbols program; I don't know if it is available at all.

Moving to 1.14 but if we can't figure out a reproducer, or get the source code, I doubt there is much that we can do.

modified the milestones: Go1.13, Go1.14 on Aug 2, 2019
ianlancetaylor

ianlancetaylor commented on Nov 15, 2019

@ianlancetaylor
Contributor

Is this still happening for anyone?

9 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-DarwinTestingAn issue that has been verified to require only test changes, not just a test failure.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @josharian@9nut@dmitshur@dr2chase@aclements

        Issue actions

          cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 · Issue #32218 · golang/go