Description
What version of Go are you using (go version
)?
$ go version 1.19.2
Does this issue reproduce with the latest release?
y
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env (encountered on Linux but I assume it shows up everywhere)
What did you do?
I saw a profile that showed 85% of CPU time in calls to runtime.Stack()
, and 81% of total CPU time in calls to printlock
and printunlock
.
What did you expect to see?
I expected calls that were independent of each other, writing into buffers rather than to shared streams, on different CPUs, to proceed without lock contention.
What did you see instead?
Every single write in any part of the traceback process goes through print
(and friends) ultimately, which means that it uses printlock
and printunlock
. This is true even when g.writebuf
is non-nil and the different g
s will be writing into different buffers. As a result, the stack tracebacks block each other, and since they're both making large numbers of individual calls to print, that results in heavy lock contention. (To complicate things, there's cases where a non-nil g.writebuf still doesn't cause the writes to go into the buffer, as when gp.m
is dying.)
This looks like a mess to fix. It's "easy" to fix by changing the code to thread a write target through it, except that you absolutely don't want any of this code dealing with stuff like io.Writer
or other higher-level abstractions. I also don't know whether it's actually completely safe to have multiple copies of print
running, against different write buffers, on different g
s. In the case where you really are writing to a stream, you probably don't want a buffer. You don't want to have two copies of all of this code, identical except for their output. But having two calls to runtime.Stack() generate hundreds of contended locks seems pretty bad too...
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
heschi commentedon Oct 25, 2022
cc @golang/runtime
mknyszek commentedon Oct 26, 2022
In triage this doesn't seem like high priority to look at.
Stack
could probably avoid theprintlock
but that might be tricky today. This might be more doable if we clean upgentraceback
in general (CC @aclements).mknyszek commentedon Oct 26, 2022
(In other words, effectively blocking this on #54466.)
prattmic commentedon Oct 26, 2022
Note that using runtime.Callers should avoid this lock contention.
josharian commentedon Oct 26, 2022
There was also some discussion in the Gophers Slack about one possible hack fix:
(and the same for printunlock)
ianlancetaylor commentedon Oct 26, 2022
That's not the only reason
printlock
exists. It also ensures that if two goroutines callprint
in parallel with multiple arguments then the output will not be interleaved.josharian commentedon Oct 26, 2022
Right. But if I read the code correctly (big if), when gp.writebuf is non-nil, we are always non-concurrent.
ianlancetaylor commentedon Oct 26, 2022
Perhaps you are not aware that the compiler generates calls to
printlock
. When the compiler seesprint(a, b, c)
, it generates code likejosharian commentedon Oct 26, 2022
I am indeed quite aware of that. :)
In those cases, however, gp.writebuf is nil (the output goes straight to stderr). gp.writebuf is only non-nil for runtime.Stack and a runtime tests.
ianlancetaylor commentedon Oct 26, 2022
Sure, but my point was that if two different goroutines call
print
in parallel with multiple arguments, their output will not be interleaved. And that still seems true to me. Am I missing something?(It's true that the output from
print
can be interleaved with other output that the program might be writing to standard error. That seems harder to avoid, though as you imply it could be done by settingwritebuf
.)josharian commentedon Oct 27, 2022
I hope so? :P
The idea of the patch above was that we effectively disable
printlock
andprintunlock
only on the relevant G when its writebuf is non-nil. writebuf is non-nil only when runtime.Stack is actively using its (systemstack's) G to print a stack. All other goroutines that are printing will have nil writebufs, and the locking will proceed as usual.ianlancetaylor commentedon Oct 27, 2022
Argh, sorry, I get it now. Completely misunderstood. Apologies.
seebs commentedon Oct 27, 2022
I did note that runtime.Callers produces part of this data without locking, but there's no way to get the argument info, if you want it, and also there's no easy way to just get the pretty-printed output. So, it solves the performance problem but it loses information; so far as I can tell, there's nothing available to a user that can produce the call stack with args that isn't this.
So far as I can tell,
writebuf
can only be non-nil during a call toruntime.Stack()
, and during one specific test that also assigns to it for some reason. There's no other uses in the code.So I think in practice the "just dummy out locks when writebuf is non-nil" behavior would be correct, but I'd feel a bit nervous about it being brittle. I'm also not 100% sure I know what it would do in the case where you request tracebacks for all goroutines, but I assume it works. (I also note that if you request tracebacks for all goroutines, the world is stopped, so, good news, no lock contention, bad news, you already took every performance hit.)
That said, I'm certainly not about to volunteer to try to make the traceback stuff work differently, that looks like a really hard problem.
don't obtain stack traces on rbf.Tx creation
merge: featurebase merge updates for 2022-10-28 (#2188)
don't obtain stack traces on rbf.Tx creation