-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: address space conflict at startup using buildmode=c-shared #16936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@sh4m1l65, if you are interested in debugging this, the thing to do would be to try to make this failure dump /proc/self/maps as it dies. You'd do that by adding code something like this to runtime/mem_linux.go and rebuilding the runtime and your binary:
Thanks. |
I met exactly the same issue. What version of Go are you using (go version)?
What operating system and processor architecture are you using (go env)?
What did you do?I'm writing a python module by cgo. The code is quite simple, since the stack backtrace doesn't point to any method in my codes (neither c codes or go codes), I believe it's an issue of go runtime. One of the crashed stack backtrace:
I also have a couple of global variables declared gSegmentersLock sync.Locker = &sync.Mutex{}
gSegmenterCounter int = 0
gSegmenters map[int]*segment.Segmenter = make(map[int]*segment.Segmenter)
// The strings used by c
CStringSegmenterObjectReleasedErrorMessage = C.CString("Segmenter object has already released") The python class is defined as usual, the This error doesn't happen every time. In one of my tests, I started a hadoop job with 6000 map tasks (in 9 nodes), each task will import the python module and create objects (namely, the go method will be called at least once for each task), and there're at most 19 map tasks running in one node at the same time. And finally I got 136 tasks failed because of this error. And It seems that the error happens at the end of the task which means the error possibly always happens when finalizing the process. Each node has 1 * 12 * 2core cpu and 128G memory and no memory limitation is configured. The hadoop dashboard shows that the resources are absoluately sufficient. OS is The build command is I can stably reproduce this error. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please reopen if this is a mistake or you have the requested information.) |
Change https://golang.org/cl/85887 mentions this issue: |
This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Rick Hudson <[email protected]>
The failure occurred in a shared library (
go build -buildmode=c-shared
) that is loaded as a ulogd plugin. So, obviously, go is not in complete control of its runtime situation. Still, I believe this stack trace represents an issue with the embedded go runtime and not the host application.Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.6 linux/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
I may be able to obtain clearance from my employer to share binary code and/or source for the application in which this panic appeared. I don't yet have such clearance.
At a high level, this is a shared library that is compiled against Linux ulogd sources (http://www.netfilter.org/projects/ulogd/) to produce a plugin for the ulogd host app. The plugin receives callbacks from the host application and composes log messages sent through dropsonde (https://github.com/cloudfoundry/dropsonde) to be collected via Cloud Foundry's Loggregator services.
Because the panic stack trace does not refer to any code outside of the Go runtime, it is difficult for me to see which allocation provoked the allocator failure.
What did you expect to see?
normally (in thousands of instances of restarting the host application that is loading the go shared library) the app runs fine. The stack trace comes from exactly once-ever-witnessed failure of this kind.
What did you see instead?
stack trace follows:
The text was updated successfully, but these errors were encountered: