Skip to content

cmd/dist: SIGSEGV during bootstrap on aix-ppc64 #58857

Closed
@dmitshur

Description

@dmitshur

Starting with CL 462035 (CC @randall77), the aix-ppc64 builder is failing during bootstrap:

Building Go cmd/dist using /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go1.4. (go1.17.13 aix/ppc64)
Building Go toolchain1 using /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
unexpected fault address 0x200732d00
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x32 addr=0x200732d00 pc=0x10004e748]

goroutine 1 [running, locked to thread]:
runtime.throw({0x100482525?, 0x0?})
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/panic.go:1075 +0x40 fp=0xa00010000064d70 sp=0xa00010000064d30 pc=0x10003c1b0
runtime.sigpanic()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/signal_unix.go:851 +0x1e8 fp=0xa00010000064db8 sp=0xa00010000064d70 pc=0x100055f78
runtime.doInit1(0x200732d00?)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:6470 +0x38 fp=0xa00010000064f18 sp=0xa00010000064dd8 pc=0x10004e748
runtime.doInit(...)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:6465
runtime.main()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:197 +0x160 fp=0xa00010000064fc0 sp=0xa00010000064f18 pc=0x10003eb20
runtime.goexit()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/asm_ppc64x.s:902 +0x4 fp=0xa00010000064fc0 sp=0xa00010000064fc0 pc=0x100074784
go tool dist: FAILED: /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/go_bootstrap install cmd/asm cmd/cgo cmd/compile cmd/link: exit status 2

(https://build.golang.org/log/24c90ab924d95afb99d89eda9035e2d24b727755)

CC @golang/aix, @golang/ppc64.

Activity

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Mar 4, 2023
added this to the Backlog milestone on Mar 4, 2023
randall77

randall77 commented on Mar 4, 2023

@randall77
Contributor

This looks like some sort of problem with xcoff/aix. The code appears to be doing the right thing. It's trying to read a field of internal/bytealg..inittask, just a random symbol in the data section of the go_bootstrap binary. that should work.

Could an aix person take a look?

pmur

pmur commented on Mar 6, 2023

@pmur
Contributor

@ayappanec can you investigate this?

ayappanec

ayappanec commented on Mar 7, 2023

@ayappanec

Looking into it.

changed the title [-]cmd/dist: "unexpected fault address 0x200732d00" during bootstrap[/-] [+]cmd/dist: SIGSEGV during bootstrap on aix-ppc64[/+] on Mar 7, 2023
ayappanec

ayappanec commented on Mar 8, 2023

@ayappanec

Thread 1 received signal SIGSEGV, Segmentation fault.
[Switching to process 14287352]
0x000000010004e748 in runtime.doInit1 (t=0x20072fc00) at /opt/freeware/lib/golang/src/runtime/proc.go:6470
(gdb) where
#0 0x000000010004e748 in runtime.doInit1 (t=0x20072fc00) at /opt/freeware/lib/golang/src/runtime/proc.go:6470
#1 0x000000010003eb20 in runtime.doInit (ts=...) at /opt/freeware/lib/golang/src/runtime/proc.go:6465
#2 runtime.main () at /opt/freeware/lib/golang/src/runtime/proc.go:197
#3 0x0000000100074784 in runtime.goexit () at /opt/freeware/lib/golang/src/runtime/asm_ppc64x.s:902

Looks like the initTask is not properly loaded.
It's failing at
func doInit1(t *initTask) {
switch t.state {

pmur

pmur commented on Mar 9, 2023

@pmur
Contributor

I looked into this a little myself. The issue stems from pointers placed in go:runtime.inittasks (in the text section) to things like internal/bytealg..inittask (in the data section).

My understanding is AIX can't relocate addresses inside the text section, so those symbols need to live in the data section. I am curious how AIX avoids this issue with other rodata like symbols (I assume there are others).

randall77

randall77 commented on Mar 17, 2023

@randall77
Contributor

We could put both go:runtime.inittasks and all the *..inittask symbols in the data section if that would help.
I think you would just need to modify inittask.go:141 to put the slice backing store in SDATA instead of SRODATA.
Can you see if that helps?

We certainly have pointers in the other direction, from read-write to read-only. For example, var s string = "foo" would have s stored in DATA (which is read-write) and its backing store pointer points into SSTRING (which is read-only). Those presumably work, so I'm a bit mystified why the other direction doesn't work.

I'm confused because it was faulting at 0x200732d00. That's the exact same address that nm reports the internal/bytealg..inittask symbol lives. That seems exactly right according to what the code is supposed to be doing. Is the binary loaded into some other address range, and that's why that address isn't accessible? That would then be the loader, not the linker, that can't relocate the reference from text into data?

(The original CL has been reverted for other reasons, so to try it at tip you have to patch the CL back in.)
(I hope to resubmit some time next week.)

pmur

pmur commented on Mar 17, 2023

@pmur
Contributor

Moving these to SDATA should resolve the issue on AIX, though it breaks for other reasons.

Building Go cmd/dist using /home/murp/go-aix-ppc64-bootstrap/. (go1.20rc1 aix/ppc64)
Building Go toolchain1 using /home/murp/go-aix-ppc64-bootstrap/.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
runtime.gcdata: missing Go type information for global symbol go:runtime.inittasks: size 16
runtime.gcdata: missing Go type information for global symbol go:main.inittasks: size 792

As for the nm output. The linker "places" the data section at offset 0x200000000, but that is not really where it lives at runtime. It gets loaded into an arbitrary place (I don't know if it is truly random). Thus, AIX always has to indirect through the TOC pointer to figure out where anything in the data section lives. As far as I know, the loader cannot perform those TOC based relocations to pointers in the text section.

I suspect there aren't many, if any, RODATA things which have pointers into the DATA section. If there are, I don't know how they are handled. xcoff linking should probably generate errors if it encounters address relocations from rodata into data.

randall77

randall77 commented on Mar 17, 2023

@randall77
Contributor

runtime.gcdata: missing Go type information for global symbol go:runtime.inittasks: size 16

That looks like the GC is wanting to know where the pointers are.
Try SNOPTRDATA instead, all the pointers point to static targets so the GC doesn't need to see them.

pmur

pmur commented on Mar 17, 2023

@pmur
Contributor

Putting those into SNOPTRDATA seems to get things going again on AIX. Thanks.

randall77

randall77 commented on Mar 18, 2023

@randall77
Contributor

Excellent, I will update my patch when I resubmit.
Thanks for the help.

randall77

randall77 commented on Mar 23, 2023

@randall77
Contributor

New CL is mailed with the SRODATA->SNOPTRDATA fix for aix, so closing.

added
NeedsFixThe path to resolution is known, but the work has not been done.
and removed
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on May 22, 2023
modified the milestones: Backlog, Go1.21 on May 22, 2023
locked and limited conversation to collaborators on May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @dmitshur@ayappanec@randall77@gopherbot@pmur

      Issue actions

        cmd/dist: SIGSEGV during bootstrap on aix-ppc64 · Issue #58857 · golang/go