Skip to content

Commit 2c02834

Browse files
danscalesrandall77
authored andcommitted
design: add 34481-opencoded-defers.md
This changes adds a design document for reducing the cost of defers significantly by essentially inlining defer calls on normal exits in most cases. Updates golang/go#34481 Change-Id: If64787e591864ab7843503b8f09c4d6dd6a7a535 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/196964 Reviewed-by: Keith Randall <[email protected]>
1 parent 79a6465 commit 2c02834

File tree

1 file changed

+328
-0
lines changed

1 file changed

+328
-0
lines changed

design/34481-opencoded-defers.md

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
# Proposal: Low-cost defers through inline code, and extra funcdata to manage the panic case
2+
3+
Author(s): Dan Scales, Keith Randall, and Austin Clements
4+
(with input from many others, including Russ Cox and Cherry Zhang)
5+
6+
Last updated: 2019-09-23
7+
8+
Discussion at https://golang.org/issue/34481
9+
10+
General defer performance discussion at https://golang.org/issue/14939.
11+
12+
## Abstract
13+
14+
As of Go 1.13, most `defer` operations take about 35ns (reduced from about 50ns
15+
in Go 1.12).
16+
In contrast, a direct call takes about 6ns.
17+
This gap incentivizes engineers to eliminate `defer` operations from hot code
18+
paths, which takes away time from more productive tasks, leads to less
19+
maintainable code (e.g., if a `panic` is later introduced, the "optimization" is
20+
no longer correct), and discourages people from using a language feature when it
21+
would otherwise be an appropriate solution to a problem.
22+
23+
We propose a way to make most `defer`s no more expensive than open-coding the
24+
call, hence eliminating the incentives to shy away from using this language
25+
feature to its fullest extent.
26+
27+
28+
## Background
29+
30+
Go 1.13 implements the `defer` statement by calling into the runtime to push a
31+
"defer object" onto the defer chain.
32+
Then, in any function that contains `defer` statements, the compiler inserts a
33+
call to `runtime.deferreturn` at every function exit point to unwind that
34+
function's defers.
35+
Both of these cause overhead: the defer object must be populated with function
36+
call information (function pointer, arguments, closure information, etc.) when
37+
it is added to the chain, and `deferreturn` must find the right defers to
38+
unwind, copy out the call information, and invoke the deferred calls.
39+
Furthermore, this inhibits compiler optimizations like inlining, since the defer
40+
functions are called from the runtime using the defer information.
41+
42+
When a function panics, the runtime runs the deferred calls on the defer chain
43+
until one of these calls `recover` or it exhausts the chain, resulting in a
44+
fatal panic.
45+
The stack itself is *not* unwound unless a deferred call recovers.
46+
This has the important property that examining the stack from a deferred call
47+
run during a panic will include the panicking frame, even if the defer was
48+
pushed by an ancestor of the panicking frame.
49+
50+
In general, this defer chain is necessary since a function can defer an
51+
unbounded or dynamic number of calls that must all run when it returns.
52+
For example, a `defer` statement can appear in a loop or an `if` block.
53+
This also means that, in general, the defer objects must be heap-allocated,
54+
though the runtime uses an allocation pool to reduce the cost of the allocation.
55+
56+
This is notably different from exception handling in C++ or Java, where the
57+
applicable set of `except` or `finally` blocks can be determined statically at
58+
every program counter in a function.
59+
In these languages, the non-exception case is typically inlined and exception
60+
handling is driven by a side table giving the locations of the `except` and
61+
`finally` blocks that apply to each PC.
62+
63+
However, while Go's `defer` mechanism permits unbounded calls, the vast majority
64+
of functions that use `defer` invoke each `defer` statement at most once, and do
65+
not invoke `defer` in a loop.
66+
Go 1.13 adds an [optimization](https://golang.org/cl/171758) to stack-allocate
67+
defer objects in this case, but they must still be pushed and popped from the
68+
defer chain.
69+
This applies to 363 out of the 370 static defer sites in the `cmd/go` binary and
70+
speeds up this case by 30% relative to heap-allocated defer objects.
71+
72+
This proposal combines this insight with the insights used in C++ and Java to
73+
make the non-panic case of most `defer` operations no more expensive than the
74+
manually open-coded case, while retaining correct `panic` handling.
75+
76+
77+
## Requirements
78+
79+
While the common case of defer handling is simple enough, it can interact in
80+
often non-obvious ways with things like recursive panics, recover, and stack
81+
traces.
82+
Here we attempt to enumerate the requirements that any new defer implementation
83+
should likely satisfy, in addition to those in the language specification for
84+
[Defer statements](https://golang.org/ref/spec#Defer_statements) and [Handling
85+
panics](https://golang.org/ref/spec#Handling_panics).
86+
87+
1. Executing a `defer` statement logically pushes a deferred function call onto
88+
a per-goroutine stack.
89+
Deferred calls are always executed starting from the top of this stack (hence
90+
in the reverse order of the execution of `defer` statements).
91+
Furthermore, each execution of a `defer` statement corresponds to exactly one
92+
deferred call (except in the case of program termination, where a deferred
93+
function may not be called at all).
94+
95+
2. Defer calls are executed in one of two ways.
96+
Whenever a function call returns normally, the runtime starts popping and
97+
executing all existing defer calls for that stack frame only (in reverse order
98+
of original execution).
99+
Separately, whenever a panic (or a call to Goexit) occurs, the runtime starts
100+
popping and executing all existing defer calls for the entire defer stack.
101+
The execution of any defer call may be interrupted by a panic within the
102+
execution of the defer call.
103+
104+
3. A program may have multiple outstanding panics, since a recursive (second)
105+
panic may occur during any of the defer calls being executed during the
106+
processing of the first panic.
107+
A previous panic is “aborted” if the processing of defers by the new panic
108+
reaches the frame where the previous panic was processing defers when the new
109+
panic happened.
110+
When a defer call returns that did a successful `recover` that applies to a
111+
panic, the stack is immediately unwound to the frame which contains the defer
112+
that did the recover call, and any remaining defers in that frame are
113+
executed.
114+
Normal execution continues in the preceding frame (but note that normal
115+
execution may actually be continuing a defer call for an outer panic).
116+
Any panic that has not been recovered or aborted must still appear on the
117+
caller stack.
118+
Note that the first panic may never continue its defer processing, if the
119+
second panic actually successfully runs all defer calls, but the original
120+
panic must appear on the stack during all the processing by the second panic.
121+
122+
4. When a defer call is executed because a function is returning normally
123+
(whether there are any outstanding panics or not), the call site of a
124+
deferred call must appear to be the function that invoked `defer` to push
125+
that function on the defer stack, at the line where that function is
126+
returning.
127+
A consequence of this is that, if the runtime is executing deferred calls in
128+
panic mode and a deferred call recovers, it must unwind the stack immediately
129+
after that deferred call returns and before executing another deferred call.
130+
131+
5. When a defer call is executed because of an explicit panic, the call stack of
132+
a deferred function must include `runtime.gopanic` and the frame that
133+
panicked (and its callers) immediately below the deferred function call.
134+
As mentioned, the call stack must also include any outstanding previous
135+
panics.
136+
If a defer call is executed because of a run-time panic, the same condition
137+
applies, except that ‘runtime.gopanic’ does not necessarily need to be on the
138+
stack.
139+
(In the current gc-go implementation, runtime.gopanic does appear on
140+
the stack even for run-time panics.)
141+
142+
## Proposal
143+
144+
We propose optimizing deferred calls in functions where every `defer` is
145+
executed at most once (specifically, a `defer` may be on a conditional path, but
146+
is never in a loop in the control-flow graph).
147+
In this optimization, the compiler assigns a bit for every `defer` site to
148+
indicate whether that defer had been reached or not.
149+
The `defer` statement itself simply sets the corresponding bit and stores all
150+
necessary arguments in specific stack slots.
151+
Then, at every exit point of the function, the compiler open-codes each deferred
152+
call, protected by (and clearing) each corresponding bit.
153+
154+
For example, the following:
155+
156+
```go
157+
defer f1(a)
158+
if cond {
159+
defer f2(b)
160+
}
161+
body...
162+
```
163+
164+
would compile to
165+
166+
```go
167+
deferBits |= 1<<0
168+
tmpF1 = f1
169+
tmpA = a
170+
if cond {
171+
deferBits |= 1<<1
172+
tmpF2 = f2
173+
tmpB = b
174+
}
175+
body...
176+
exit:
177+
if deferBits & 1<<1 != 0 {
178+
deferBits &^= 1<<1
179+
tmpF2(tmpB)
180+
}
181+
if deferBits & 1<<0 != 0 {
182+
deferBits &^= 1<<0
183+
tmpF1(tmpA)
184+
}
185+
```
186+
187+
In order to ensure that the value of `deferBits` and all the tmp variables are
188+
available in case of a panic, these variables must be allocated explicit stack
189+
slots and the stores to deferBits and the tmp variables (`tmpF1`, `tmpA`, etc.)
190+
must write the values into these stack slots.
191+
In addition, the updates to `deferBits` in the defer exit code must explicitly
192+
store the `deferBits` value to the corresponding stack slot.
193+
This will ensure that panic processing can determine exactly which defers have
194+
been executed so far.
195+
196+
However, the defer exit code can still be optimized significantly in many cases.
197+
We can refer directly to the `deferBits` and tmpA ‘values’ (in the SSA sense),
198+
and these accesses can therefore be optimized in terms of using existing values
199+
in registers, propagating constants, etc.
200+
Also, if the defers were called unconditionally, then constant propagation may
201+
in some cases to eliminate the checks on `deferBits` (because the value of
202+
`deferBits` is known statically at the exit point).
203+
204+
If there are multiple exits (returns) from the function, we can either duplicate
205+
the defer exit code at each exit, or we can have one copy of the defer exit code
206+
that is shared among all (or most) of the exits.
207+
Note that any sharing of defer-exit code code may lead to less specific line
208+
numbers (which don’t indicate the exact exit location) if the user happens to
209+
look at the call stack while in a call made by the defer exit code.
210+
211+
## Panic processing
212+
213+
Because no actual defer records have been created, panic processing is quite
214+
different and somewhat more complex in this approach.
215+
When generating the code for a function, the compiler also emits an extra set of
216+
`FUNCDATA` information that records information about each of the open-coded
217+
defers.
218+
For each open-coded defer, the compiler emits `FUNCDATA` that specifies the
219+
exact stack locations that store the function pointer and each of the arguments.
220+
It also emits the location of the stack slot containing `deferBits`.
221+
Since stack frames can get arbitrarily large, the compiler uses a varint
222+
encoding for the stack slot offsets.
223+
224+
In addition, for all functions with open-coded defers, the compiler adds a small
225+
segment of code that does a call to `runtime.deferreturn` and then returns.
226+
This code segment is not reachable by the main code of the function, but is used
227+
to unwind the stack properly when a panic is successfully recovered.
228+
229+
To handle a `panic`, the runtime conceptually walks the defer chain in parallel
230+
with the stack in order to interleave execution of pushed defers with defers in
231+
open-coded frames.
232+
When the runtime encounters an open-coded frame `F` executing function ‘f’, it
233+
executes the following steps.
234+
235+
1. The runtime reads the funcdata for function `f` that contains the open-defer
236+
information.
237+
238+
2. Using the information about the location in frame `F` of the stack slot for
239+
`deferBits`, the runtime loads the current value of `deferBits` for this
240+
frame.
241+
The runtime processes each of the active defers, as specified by the value of
242+
`deferBits`, in reverse order.
243+
244+
3. For each active defer, the runtime loads the function pointer for the defer
245+
call from the appropriate stack slot.
246+
It also builds up an argument frame by copying each of the defer arguments
247+
from its specified stack slot to the appropriate location in the argument
248+
frame.
249+
It then updates `deferBits` in its stack slot after masking off the bit for
250+
the current defer.
251+
Then it uses the function pointer and argument frame to call the deferred
252+
function.
253+
254+
4. If the defer call returns normally without doing a recovery, then the runtime
255+
continues executing active defer calls for frame F until all active defer
256+
calls have finished.
257+
258+
5. If any defer call returns normally but has done a successful recover, then
259+
the runtime stops processing defers in the current frame.
260+
There may or may not be any remaining defers to process.
261+
The runtime then arranges to jump to the `deferreturn` code segment and
262+
unwind the stack to frame `F`, by simultaneously setting the PC to the
263+
address of the `deferreturn` segment and setting the SP to the appropriate
264+
value for frame `F`.
265+
The `deferreturn` code segment then calls back into the runtime.
266+
The runtime can now process any remaining active defers from frame `F`.
267+
But for these defers, the stack has been appropriately unwound and the defer
268+
appears to be called directly from function `f`.
269+
When all defers for the frame have finished, the deferreturn finishes and the
270+
code segment returns from frame F to continue execution.
271+
272+
If a deferred call in step 3 itself panics, the runtime starts its normal panic
273+
processing again.
274+
For any frame with open-coded defers that has already run some defers, the
275+
deferBits value at the specified stack slot will always accurately reflect the
276+
remaining defers that need to be run.
277+
278+
## Rationale
279+
280+
One other approach that we extensively considered (and prototyped) also has
281+
inlined defer code for the normal case, but actual executes the defer exit code
282+
directly even in the panic case.
283+
Executing the defer exit code in the panic case requires duplication of stack
284+
frame F and some complex runtime code to start execution of the defer exit code
285+
using this new duplicated frame and to regain control when the defer exit code
286+
complete.
287+
The required runtime code for this approach is much more architecture-dependent
288+
and seems to be much more complex (and possibly fragile).
289+
290+
291+
## Compatibility
292+
293+
This proposal does not change any user-facing APIs, and hence satisfies the [compatibility
294+
guidelines](https://golang.org/doc/go1compat).
295+
296+
## Implementation
297+
298+
An implementation has been mostly done.
299+
The change is [here](https://go-review.googlesource.com/c/go/+/190098/6)
300+
Comments on the design or implementation are very welcome.
301+
302+
Some miscellaneous implementation details:
303+
304+
1. We need to restrict the number of defers in a function to the size of the
305+
deferBits bitmask.
306+
To minimize code size, we currently make deferBits to be 8 bits, and don’t do
307+
open-coded defers if there are more than 8 defers in a function.
308+
309+
2. The deferBits variable and defer arguments variables (such as ‘tmpA’) must be
310+
declared (via OpVarDef) in the entry block, since the unconditional defer
311+
exit code at the bottom of the function will access them, so these variables
312+
are live throughout the entire function.
313+
(And, of course, they can be accessed by panic processing at any point within
314+
the function that might cause a panic.)
315+
For any defer argument stack slots that are pointers (or contain pointers),
316+
we must initialize those stack slots to zero in the entry block.
317+
The initialization is required for garbage collection, which doesn’t know
318+
which of these defer arguments are active (i.e. which of the defer sites have
319+
been reached, but the corresponding defer call has not yet happened)
320+
321+
2. Because the `deferreturn` code segment is disconnected from the rest of the
322+
function, it would not normally indicate that any stack slots are live.
323+
However, we want the liveness information at the `deferreturn` call to
324+
indicate that all of the stack slots associated with defers (which may
325+
include pointers to variables accessed by closures) and all of the return
326+
values are live.
327+
We must explicitly set the liveness for the `deferreturn` call to be the same
328+
as the liveness at the first defer call on the defer exit path.

0 commit comments

Comments
 (0)