Open
Description
What version of Go are you using (go version
)?
master
as of the build
$ go version devel +dd150176c3 Fri Jul 3 03:31:29 2020 +0000
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/steeve/Library/Caches/go-build" GOENV="/Users/steeve/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/steeve/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/steeve/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/Users/steeve/code/github.com/znly/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/Users/steeve/code/github.com/znly/go/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/steeve/code/github.com/znly/go/src/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bs/51dlb_nn5k35xq9qfsxv9wc00000gr/T/go-build842228435=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
Following @cherrymui's comment on #39524, I figured I tried to check why lots of our backtraces on iOS stop at runtime.asmcgocall
.
Since I wanted to reproduce it on my computer and lldb
manges to properly backtrace, I figured I'd give libunwind
a try, since this is was iOS uses when a program crashes.
Unfortunately libunwind
didn't manage to walk the stack past CGo generated _Cfunc_
functions.
Given this program:
package main
/*
#cgo CFLAGS: -O0
#include <libunwind.h>
#include <stdio.h>
void backtrace() {
unw_cursor_t cursor;
unw_context_t context;
// Initialize cursor to current frame for local unwinding.
unw_getcontext(&context);
unw_init_local(&cursor, &context);
// Unwind frames one by one, going up the frame stack.
while (unw_step(&cursor) > 0) {
unw_word_t offset, pc;
unw_get_reg(&cursor, UNW_REG_IP, &pc);
if (pc == 0) {
break;
}
printf("0x%llx:", pc);
char sym[256];
if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
printf(" (%s+0x%llx)\n", sym, offset);
} else {
printf(" -- error: unable to obtain symbol name for this frame\n");
}
}
}
void two() {
printf("two\n");
backtrace();
}
void one() {
printf("one\n");
two();
}
*/
import "C"
//go:noinline
func goone() {
C.one()
}
func main() {
goone()
}
It prints:
one1
two2
0x40617fe: (two+0x1e)
0x406182e: (one+0x1e)
0x406168b: (_cgo_7c45d1c2feef_Cfunc_one+0x1b)
I tried doing Go(1) -> C(1) -> Go(2) -> C(2) and backtrace, and it only unwinds C(2).
Also, I tried to make set asmcgocall
to have a 16 bytes stack, hoping that the generated frame pointer would help, but it didn't.
What did you expect to see?
The complete backtrace.
What did you see instead?
A backtrace for C functions only.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Triage Backlog
Milestone
Relationships
Development
No branches or pull requests
Activity
cherrymui commentedon Jul 4, 2020
This is different from #39524 . We switch stacks at Go/C boundaries. Go code runs on goroutine stacks (typically small), whereas C code runs on system stacks (typically large). Since they are not on the same stack, I would not expect any stack unwinding tool to work. Not sure if there is anything we could do.
Maybe we could use the frame pointer to "fake" it? Not sure this is a good idea...
steeve commentedon Jul 4, 2020
Indeed. Thinking about it however, it doesn't feel that hard to do. Either by locally modifying asmcgocall, or, in a more ambitious way, via
go:systemstack
. Do you think that could work, at least in theory?steeve commentedon Jul 4, 2020
Also, weirdly enough, lldb is able to do it, without dwarf.
steeve commentedon Jul 5, 2020
I am also realizing this is very different from #39524 indeed. But in some way, since
runtime.libCCall
usesasmcgocall
, it could also allow for unwinding of gourtines blocked in things likepthread
functions.ianlancetaylor commentedon Jul 5, 2020
If you want to unwind from C++ back into Go you may want to try github.com/ianlancetaylor/cgosymbolizer. Although that will only help from the Go side, not the C++ side.
In principle we could hand write unwind information for
asmcgocall
. The unwind information is basically DWARF, and it should be powerful enough to express whatasmcgocall
does.steeve commentedon Jul 5, 2020
@ianlancetaylor thank you. The issue, on the iOS side, is that unwinding is done locally, on the device (presumably with libunwind), without DWARF. DWARF is only added later to symbolicate the crashes.
That said, it could be useful for Android (which uses breakpad with minidumps)
@cherrymui I tried that forsaken piece of code to, in order to call the backtrace method without cgo, and alas, the unwinding still stops before it somehow. This is based on the rustgo article:
cherrymui commentedon Jul 6, 2020
@steeve Sorry, I'm not sure exactly what you're planning to do, and why
go:systemstack
is relevant here.go:systemstack
only enforces the marked function must run on the system stack (i.e. cannot run on a goroutine stack). It doesn't change how stack switches work.Also, on what architecture? You mentioned iOS (presumably ARM64), but also AMD64 in your
go env
.That said, does CL https://go-review.googlesource.com/c/go/+/241080 makes any difference (on ARM64)? Thanks.
steeve commentedon Jul 6, 2020
@cherrymui Thank you for the CL, I wasn't hoping as much. Will definitely try and let you know.
My ultimate target is indeed iOS (and Android, to an extent).
Before trying to fix it on iOS though, I figure it'd be easier to reproduce on my computer (darwin/amd64), and since iOS uses libunwind, try to investigate it myself.
My other experiment in which I tried to call the method directly in the Go stack, is to try and narrow down if the frame pointer was trashed because of the stack switch itself.
gopherbot commentedon Jul 6, 2020
Change https://golang.org/cl/241158 mentions this issue:
runtime: adjust frame pointer on stack copy on ARM64
ianlancetaylor commentedon Jul 6, 2020
@steeve libunwind unwinds the stack using the unwind information, which is not DWARF but is approximately the same format as a subset of DWARF. That's what I was referring to when I suggested that we could write unwind information for
asmcgocall
. (You can see the horrible details at https://www.airs.com/blog/archives/460).steeve commentedon Jul 9, 2020
I just tried your CL @cherrymui on a real device, and unfortunately, when I pause inside XCode's, I only see the stack up to
asmcgocall
.In my case I did put a
time.Sleep()
and the backtrace in XCode'slldb
was:Note that on amd64,
lldb
manages to properly unwind.qmuntal commentedon Dec 23, 2022
The lack of frame pointer in
asmcgocall
is also breaking stack unwinding on WinDbg (using the prototype from #57302).I've tried with gdb, and it's also broken:
@steeve curiously, WinDbg and gdb can unwind the stack if
asmcgocall
stack is incremented to 16 bytes, which makes the assembler to introduce a frame pointer:qmuntal commentedon Dec 23, 2022
@ianlancetaylor @cherrymui
asmcgocall
does not have a frame pointer even it being a non-leaf function due to condition 2 in this code:go/src/cmd/internal/obj/x86/obj6.go
Lines 609 to 626 in 1ba7341
asmcgocall
seems to be doing well with a frame pointer, so it does not fit in the scary-runtime-internals category (at least on Windows, have to try other OSes). IMO we should get rid of that heuristic and just useNOFRAME
if a runtime function would be messed up be frame pointer, just as it is happening on arm and arm64.gopherbot commentedon Dec 23, 2022
Change https://go.dev/cl/459395 mentions this issue:
runtime: use explicit NOFRAME on windows/amd64
cherrymui commentedon Dec 27, 2022
I think that is a good direction. Thanks for looking into it. Are we sure what matters are all assembly functions? We don't have explicit NOFRAME control for compiled functions.
I'm still not sure about stack transition in asmcgocall being "broken". Technically, it is running on two stacks -- the C functions run on a different stack. So if we are unwinding the physical stack, you shouldn't see both Go and C functions. You could argue we're expecting to unwind the logical stack. That is a reasonable argument. But I don't think a decision has been made for whether the unwinding should be the physical stack or the logical stack. Further, at least some debugger is not happy if the stack pointer suddenly changes direction. I don't think it is a good idea if the unwinding only works when the C stack is at a higher address than the Go stack, given that we don't generally control where the stacks are in the address space.
For a (not quite accurate) analogy, what does the debugger do for C
longjmp
orsetcontext
or other stack transitions? Does it unwind through it? What machinery does it use?Thanks.
ianlancetaylor commentedon Dec 27, 2022
The C
longjmp
andsetcontext
functions change the stack entirely and irrevocably (modulo another call tolongjmp
orsetcontext
, of course). They are unlikeasmcgocall
, which switches to a temporary stack for the duration of a function call and then returns to the original stack. So I don't think the same issues arise. For the C functions you get either the original call stack or the new call stack.Note that we do support unwinding the cgo stack to the Go stack via the
runtime.SetCgoTraceback
function. However, that only supports tracebacks that use Go functions (runtime.Callers
,runtime.Stack
). It does not help with libunwind.What would work for libunwind is for us to write unwind information for
asmcgocall
that tells libunwind how to load the stack and PC of the calling frame. This is feasible, as the unwind information can use arbitrary expressions, but complicated.cherrymui commentedon Dec 27, 2022
I agree that longjmp is not really a good analogy because it could be an actual context switch (although it could be used to implement a temporary stack transition like asmcgocall, but the tools never know). I don't think there is anything in C that does a temporary stack switch?
I think the Go traceback API is mostly showing only the "user frames", e.g. we hide compiler-generated wrappers. So hiding the asmcgocall stack transition when
SetCgoTraceback
is set seems also reasonable. But I'm not sure we should blindly hide stack transitions for other low-level unwinders.qmuntal commentedon Dec 27, 2022
I don't expect Go to hide wrappers and stack transitions to external unwinders, if it is possible at all.
I do expect Go to facilitate unwinding stack transitions, even from C to Go, and vice versa. This is certainly doable using Windows' SEH if a frame pointer is set in the function prologue and the linker emits the appropriate metadata.
runtime: use explicit NOFRAME on windows/amd64
qmuntal commentedon Mar 3, 2023
runtime: adjust frame pointer on stack copy on ARM64