Skip to content

Commit f5e475e

Browse files
mknyszekgopherbot
authored andcommitted
runtime: make a much better effort to emit CPU profile in a trace
Currently the new execution tracer's handling of CPU profile samples is very best-effort. The same CPU profile buffer is used across generations, leading to a high probability that CPU samples will bleed across generations. Also, because the CPU profile buffer (not the trace buffer the samples get written into) isn't guaranteed to be flushed when we close out a generation, nor when tracing stops. This has led to test failures, but can more generally just lead to lost samples. In general, lost samples are considered OK. The CPU profile buffer is only read from every 100 ms, so if it fills up too much before then, old samples will get overwritten. The tests already account for this, and in that sense the CPU profile samples are already best-effort. But with actual CPU profiles, this is really the only condition under which samples are dropped. This CL aims to align CPU profiles better with traces by eliminating all best-effort parts of the implementation aside from the possibility of dropped samples from a full buffer. To achieve this, this CL adds a second CPU profile buffer and has the SIGPROF handler pick which CPU profile buffer to use based on the generation, much like every other part of the tracer. The SIGPROF handler then reads the trace generation, but not before ensuring it can't change: it grabs its own thread's trace seqlock. It's possible that a SIGPROF signal lands while this seqlock is already held by the thread. Luckily this is detectable and the SIGPROF handler can simply elide the locking if this happens (the tracer will already wait until all threads exit their seqlock critical section). Now that there are two CPU profile buffers written to, the read side needs to change. Instead of calling traceAcquire/traceRelease for every single CPU sample event, the trace CPU profile reader goroutine holds this conceptual lock over the entirety of flushing a buffer. This means it can pick the CPU profile buffer for the current generation to flush. With all this machinery in place, we're now at a point where all CPU profile samples get divided into either the previous generation or the current generation. This is good, since it means that we're able to emit profile samples into the correct generation, avoiding surprises in the final trace. All that's missing is to flush the CPU profile buffer from the previous generation, once the runtime has moved on from that generation. That is, when the generation counter updates, there may yet be CPU profile samples sitting in the last generation's buffer. So, traceCPUFlush now first flushes the CPU profile buffer, followed by any trace buffers containing CPU profile samples. The end result of all this is that no sample gets left behind unless it gets overwritten in the CPU profile buffer in the first place. CPU profile samples in the trace will now also get attributed to the right generation, since the SIGPROF handler now participates in the tracer's synchronization across trace generations. Fixes #55317. Change-Id: I47719fad164c544eef0bb12f99c8f3c15358e344 Reviewed-on: https://go-review.googlesource.com/c/go/+/555495 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
1 parent 7cb98c1 commit f5e475e

File tree

3 files changed

+77
-26
lines changed

3 files changed

+77
-26
lines changed

src/runtime/trace2.go

+5-4
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ var trace struct {
7171
stringTab [2]traceStringTable // maps strings to unique ids
7272

7373
// cpuLogRead accepts CPU profile samples from the signal handler where
74-
// they're generated. It uses a three-word header to hold the IDs of the P, G,
74+
// they're generated. There are two profBufs here: one for gen%2, one for
75+
// 1-gen%2. These profBufs use a three-word header to hold the IDs of the P, G,
7576
// and M (respectively) that were active at the time of the sample. Because
7677
// profBuf uses a record with all zeros in its header to indicate overflow,
7778
// we make sure to make the P field always non-zero: The ID of a real P will
@@ -82,9 +83,9 @@ var trace struct {
8283
// when sampling g0.
8384
//
8485
// Initialization and teardown of these fields is protected by traceAdvanceSema.
85-
cpuLogRead *profBuf
86-
signalLock atomic.Uint32 // protects use of the following member, only usable in signal handlers
87-
cpuLogWrite atomic.Pointer[profBuf] // copy of cpuLogRead for use in signal handlers, set without signalLock
86+
cpuLogRead [2]*profBuf
87+
signalLock atomic.Uint32 // protects use of the following member, only usable in signal handlers
88+
cpuLogWrite [2]atomic.Pointer[profBuf] // copy of cpuLogRead for use in signal handlers, set without signalLock
8889
cpuSleep *wakeableSleep
8990
cpuLogDone <-chan struct{}
9091
cpuBuf [2]*traceBuf

src/runtime/trace2cpu.go

+66-21
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,18 @@ func traceInitReadCPU() {
1616
throw("traceInitReadCPU called with trace enabled")
1717
}
1818
// Create new profBuf for CPU samples that will be emitted as events.
19-
profBuf := newProfBuf(3, profBufWordCount, profBufTagCount) // after the timestamp, header is [pp.id, gp.goid, mp.procid]
20-
trace.cpuLogRead = profBuf
19+
// Format: after the timestamp, header is [pp.id, gp.goid, mp.procid].
20+
trace.cpuLogRead[0] = newProfBuf(3, profBufWordCount, profBufTagCount)
21+
trace.cpuLogRead[1] = newProfBuf(3, profBufWordCount, profBufTagCount)
2122
// We must not acquire trace.signalLock outside of a signal handler: a
2223
// profiling signal may arrive at any time and try to acquire it, leading to
2324
// deadlock. Because we can't use that lock to protect updates to
2425
// trace.cpuLogWrite (only use of the structure it references), reads and
2526
// writes of the pointer must be atomic. (And although this field is never
2627
// the sole pointer to the profBuf value, it's best to allow a write barrier
2728
// here.)
28-
trace.cpuLogWrite.Store(profBuf)
29+
trace.cpuLogWrite[0].Store(trace.cpuLogRead[0])
30+
trace.cpuLogWrite[1].Store(trace.cpuLogRead[1])
2931
}
3032

3133
// traceStartReadCPU creates a goroutine to start reading CPU profile
@@ -52,7 +54,15 @@ func traceStartReadCPU() {
5254
// we would still want to do a goroutine-level sleep in between
5355
// reads to avoid frequent wakeups.
5456
trace.cpuSleep.sleep(100_000_000)
55-
if !traceReadCPU(trace.cpuLogRead) {
57+
58+
tl := traceAcquire()
59+
if !tl.ok() {
60+
// Tracing disabled.
61+
break
62+
}
63+
keepGoing := traceReadCPU(tl.gen)
64+
traceRelease(tl)
65+
if !keepGoing {
5666
break
5767
}
5868
}
@@ -76,29 +86,36 @@ func traceStopReadCPU() {
7686
//
7787
// Wake the goroutine so it can observe that their the buffer is
7888
// closed an exit.
79-
trace.cpuLogWrite.Store(nil)
80-
trace.cpuLogRead.close()
89+
trace.cpuLogWrite[0].Store(nil)
90+
trace.cpuLogWrite[1].Store(nil)
91+
trace.cpuLogRead[0].close()
92+
trace.cpuLogRead[1].close()
8193
trace.cpuSleep.wake()
8294

8395
// Wait until the logger goroutine exits.
8496
<-trace.cpuLogDone
8597

8698
// Clear state for the next trace.
8799
trace.cpuLogDone = nil
88-
trace.cpuLogRead = nil
100+
trace.cpuLogRead[0] = nil
101+
trace.cpuLogRead[1] = nil
89102
trace.cpuSleep.close()
90103
}
91104

92-
// traceReadCPU attempts to read from the provided profBuf and write
105+
// traceReadCPU attempts to read from the provided profBuf[gen%2] and write
93106
// into the trace. Returns true if there might be more to read or false
94107
// if the profBuf is closed or the caller should otherwise stop reading.
95108
//
109+
// The caller is responsible for ensuring that gen does not change. Either
110+
// the caller must be in a traceAcquire/traceRelease block, or must be calling
111+
// with traceAdvanceSema held.
112+
//
96113
// No more than one goroutine may be in traceReadCPU for the same
97114
// profBuf at a time.
98-
func traceReadCPU(pb *profBuf) bool {
115+
func traceReadCPU(gen uintptr) bool {
99116
var pcBuf [traceStackSize]uintptr
100117

101-
data, tags, eof := pb.read(profBufNonBlocking)
118+
data, tags, eof := trace.cpuLogRead[gen%2].read(profBufNonBlocking)
102119
for len(data) > 0 {
103120
if len(data) < 4 || data[0] > uint64(len(data)) {
104121
break // truncated profile
@@ -147,12 +164,7 @@ func traceReadCPU(pb *profBuf) bool {
147164
}
148165

149166
// Write out a trace event.
150-
tl := traceAcquire()
151-
if !tl.ok() {
152-
// Tracing disabled, exit without continuing.
153-
return false
154-
}
155-
w := unsafeTraceWriter(tl.gen, trace.cpuBuf[tl.gen%2])
167+
w := unsafeTraceWriter(gen, trace.cpuBuf[gen%2])
156168

157169
// Ensure we have a place to write to.
158170
var flushed bool
@@ -163,7 +175,7 @@ func traceReadCPU(pb *profBuf) bool {
163175
}
164176

165177
// Add the stack to the table.
166-
stackID := trace.stackTab[tl.gen%2].put(pcBuf[:nstk])
178+
stackID := trace.stackTab[gen%2].put(pcBuf[:nstk])
167179

168180
// Write out the CPU sample.
169181
w.byte(byte(traceEvCPUSample))
@@ -173,8 +185,7 @@ func traceReadCPU(pb *profBuf) bool {
173185
w.varint(goid)
174186
w.varint(stackID)
175187

176-
trace.cpuBuf[tl.gen%2] = w.traceBuf
177-
traceRelease(tl)
188+
trace.cpuBuf[gen%2] = w.traceBuf
178189
}
179190
return !eof
180191
}
@@ -187,6 +198,10 @@ func traceReadCPU(pb *profBuf) bool {
187198
//
188199
//go:systemstack
189200
func traceCPUFlush(gen uintptr) {
201+
// Read everything out of the last gen's CPU profile buffer.
202+
traceReadCPU(gen)
203+
204+
// Flush any remaining trace buffers containing CPU samples.
190205
if buf := trace.cpuBuf[gen%2]; buf != nil {
191206
lock(&trace.lock)
192207
traceBufFlush(buf, gen)
@@ -197,13 +212,38 @@ func traceCPUFlush(gen uintptr) {
197212

198213
// traceCPUSample writes a CPU profile sample stack to the execution tracer's
199214
// profiling buffer. It is called from a signal handler, so is limited in what
200-
// it can do.
215+
// it can do. mp must be the thread that is currently stopped in a signal.
201216
func traceCPUSample(gp *g, mp *m, pp *p, stk []uintptr) {
202217
if !traceEnabled() {
203218
// Tracing is usually turned off; don't spend time acquiring the signal
204219
// lock unless it's active.
205220
return
206221
}
222+
if mp == nil {
223+
// Drop samples that don't have an identifiable thread. We can't render
224+
// this in any useful way anyway.
225+
return
226+
}
227+
228+
// We're going to conditionally write to one of two buffers based on the
229+
// generation. To make sure we write to the correct one, we need to make
230+
// sure this thread's trace seqlock is held. If it already is, then we're
231+
// in the tracer and we can just take advantage of that. If it isn't, then
232+
// we need to acquire it and read the generation.
233+
locked := false
234+
if mp.trace.seqlock.Load()%2 == 0 {
235+
mp.trace.seqlock.Add(1)
236+
locked = true
237+
}
238+
gen := trace.gen.Load()
239+
if gen == 0 {
240+
// Tracing is disabled, as it turns out. Release the seqlock if necessary
241+
// and exit.
242+
if locked {
243+
mp.trace.seqlock.Add(1)
244+
}
245+
return
246+
}
207247

208248
now := traceClockNow()
209249
// The "header" here is the ID of the M that was running the profiled code,
@@ -231,12 +271,17 @@ func traceCPUSample(gp *g, mp *m, pp *p, stk []uintptr) {
231271
osyield()
232272
}
233273

234-
if log := trace.cpuLogWrite.Load(); log != nil {
274+
if log := trace.cpuLogWrite[gen%2].Load(); log != nil {
235275
// Note: we don't pass a tag pointer here (how should profiling tags
236276
// interact with the execution tracer?), but if we did we'd need to be
237277
// careful about write barriers. See the long comment in profBuf.write.
238278
log.write(nil, int64(now), hdr[:], stk)
239279
}
240280

241281
trace.signalLock.Store(0)
282+
283+
// Release the seqlock if we acquired it earlier.
284+
if locked {
285+
mp.trace.seqlock.Add(1)
286+
}
242287
}

src/runtime/trace2runtime.go

+6-1
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,12 @@ func traceAcquireEnabled() traceLocker {
192192
// Prevent preemption.
193193
mp := acquirem()
194194

195-
// Acquire the trace seqlock.
195+
// Acquire the trace seqlock. This prevents traceAdvance from moving forward
196+
// until all Ms are observed to be outside of their seqlock critical section.
197+
//
198+
// Note: The seqlock is mutated here and also in traceCPUSample. If you update
199+
// usage of the seqlock here, make sure to also look at what traceCPUSample is
200+
// doing.
196201
seq := mp.trace.seqlock.Add(1)
197202
if debugTraceReentrancy && seq%2 != 1 {
198203
throw("bad use of trace.seqlock or tracer is reentrant")

0 commit comments

Comments
 (0)