Closed
Description
Starting with CL 462035 (CC @randall77), the aix-ppc64 builder is failing during bootstrap:
Building Go cmd/dist using /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go1.4. (go1.17.13 aix/ppc64)
Building Go toolchain1 using /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
unexpected fault address 0x200732d00
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x32 addr=0x200732d00 pc=0x10004e748]
goroutine 1 [running, locked to thread]:
runtime.throw({0x100482525?, 0x0?})
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/panic.go:1075 +0x40 fp=0xa00010000064d70 sp=0xa00010000064d30 pc=0x10003c1b0
runtime.sigpanic()
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/signal_unix.go:851 +0x1e8 fp=0xa00010000064db8 sp=0xa00010000064d70 pc=0x100055f78
runtime.doInit1(0x200732d00?)
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:6470 +0x38 fp=0xa00010000064f18 sp=0xa00010000064dd8 pc=0x10004e748
runtime.doInit(...)
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:6465
runtime.main()
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:197 +0x160 fp=0xa00010000064fc0 sp=0xa00010000064f18 pc=0x10003eb20
runtime.goexit()
/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/asm_ppc64x.s:902 +0x4 fp=0xa00010000064fc0 sp=0xa00010000064fc0 pc=0x100074784
go tool dist: FAILED: /ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/pkg/tool/aix_ppc64/go_bootstrap install cmd/asm cmd/cgo cmd/compile cmd/link: exit status 2
(https://build.golang.org/log/24c90ab924d95afb99d89eda9035e2d24b727755)
CC @golang/aix, @golang/ppc64.
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
randall77 commentedon Mar 4, 2023
This looks like some sort of problem with xcoff/aix. The code appears to be doing the right thing. It's trying to read a field of
internal/bytealg..inittask
, just a random symbol in the data section of thego_bootstrap
binary. that should work.Could an aix person take a look?
pmur commentedon Mar 6, 2023
@ayappanec can you investigate this?
ayappanec commentedon Mar 7, 2023
Looking into it.
[-]cmd/dist: "unexpected fault address 0x200732d00" during bootstrap[/-][+]cmd/dist: SIGSEGV during bootstrap on aix-ppc64[/+]ayappanec commentedon Mar 8, 2023
Thread 1 received signal SIGSEGV, Segmentation fault.
[Switching to process 14287352]
0x000000010004e748 in runtime.doInit1 (t=0x20072fc00) at /opt/freeware/lib/golang/src/runtime/proc.go:6470
(gdb) where
#0 0x000000010004e748 in runtime.doInit1 (t=0x20072fc00) at /opt/freeware/lib/golang/src/runtime/proc.go:6470
#1 0x000000010003eb20 in runtime.doInit (ts=...) at /opt/freeware/lib/golang/src/runtime/proc.go:6465
#2 runtime.main () at /opt/freeware/lib/golang/src/runtime/proc.go:197
#3 0x0000000100074784 in runtime.goexit () at /opt/freeware/lib/golang/src/runtime/asm_ppc64x.s:902
Looks like the initTask is not properly loaded.
It's failing at
func doInit1(t *initTask) {
switch t.state {
pmur commentedon Mar 9, 2023
I looked into this a little myself. The issue stems from pointers placed in go:runtime.inittasks (in the text section) to things like internal/bytealg..inittask (in the data section).
My understanding is AIX can't relocate addresses inside the text section, so those symbols need to live in the data section. I am curious how AIX avoids this issue with other rodata like symbols (I assume there are others).
randall77 commentedon Mar 17, 2023
We could put both
go:runtime.inittasks
and all the*..inittask
symbols in the data section if that would help.I think you would just need to modify
inittask.go:141
to put the slice backing store inSDATA
instead ofSRODATA
.Can you see if that helps?
We certainly have pointers in the other direction, from read-write to read-only. For example,
var s string = "foo"
would haves
stored inDATA
(which is read-write) and its backing store pointer points intoSSTRING
(which is read-only). Those presumably work, so I'm a bit mystified why the other direction doesn't work.I'm confused because it was faulting at 0x200732d00. That's the exact same address that
nm
reports theinternal/bytealg..inittask
symbol lives. That seems exactly right according to what the code is supposed to be doing. Is the binary loaded into some other address range, and that's why that address isn't accessible? That would then be the loader, not the linker, that can't relocate the reference from text into data?(The original CL has been reverted for other reasons, so to try it at tip you have to patch the CL back in.)
(I hope to resubmit some time next week.)
pmur commentedon Mar 17, 2023
Moving these to SDATA should resolve the issue on AIX, though it breaks for other reasons.
As for the
nm
output. The linker "places" the data section at offset 0x200000000, but that is not really where it lives at runtime. It gets loaded into an arbitrary place (I don't know if it is truly random). Thus, AIX always has to indirect through the TOC pointer to figure out where anything in the data section lives. As far as I know, the loader cannot perform those TOC based relocations to pointers in the text section.I suspect there aren't many, if any, RODATA things which have pointers into the DATA section. If there are, I don't know how they are handled. xcoff linking should probably generate errors if it encounters address relocations from rodata into data.
randall77 commentedon Mar 17, 2023
That looks like the GC is wanting to know where the pointers are.
Try
SNOPTRDATA
instead, all the pointers point to static targets so the GC doesn't need to see them.pmur commentedon Mar 17, 2023
Putting those into
SNOPTRDATA
seems to get things going again on AIX. Thanks.randall77 commentedon Mar 18, 2023
Excellent, I will update my patch when I resubmit.
Thanks for the help.
randall77 commentedon Mar 23, 2023
New CL is mailed with the SRODATA->SNOPTRDATA fix for aix, so closing.