|
| 1 | +# Proposal: Low-cost defers through inline code, and extra funcdata to manage the panic case |
| 2 | + |
| 3 | +Author(s): Dan Scales, Keith Randall, and Austin Clements |
| 4 | +(with input from many others, including Russ Cox and Cherry Zhang) |
| 5 | + |
| 6 | +Last updated: 2019-09-23 |
| 7 | + |
| 8 | +Discussion at https://golang.org/issue/34481 |
| 9 | + |
| 10 | +General defer performance discussion at https://golang.org/issue/14939. |
| 11 | + |
| 12 | +## Abstract |
| 13 | + |
| 14 | +As of Go 1.13, most `defer` operations take about 35ns (reduced from about 50ns |
| 15 | +in Go 1.12). |
| 16 | +In contrast, a direct call takes about 6ns. |
| 17 | +This gap incentivizes engineers to eliminate `defer` operations from hot code |
| 18 | +paths, which takes away time from more productive tasks, leads to less |
| 19 | +maintainable code (e.g., if a `panic` is later introduced, the "optimization" is |
| 20 | +no longer correct), and discourages people from using a language feature when it |
| 21 | +would otherwise be an appropriate solution to a problem. |
| 22 | + |
| 23 | +We propose a way to make most `defer`s no more expensive than open-coding the |
| 24 | +call, hence eliminating the incentives to shy away from using this language |
| 25 | +feature to its fullest extent. |
| 26 | + |
| 27 | + |
| 28 | +## Background |
| 29 | + |
| 30 | +Go 1.13 implements the `defer` statement by calling into the runtime to push a |
| 31 | +"defer object" onto the defer chain. |
| 32 | +Then, in any function that contains `defer` statements, the compiler inserts a |
| 33 | +call to `runtime.deferreturn` at every function exit point to unwind that |
| 34 | +function's defers. |
| 35 | +Both of these cause overhead: the defer object must be populated with function |
| 36 | +call information (function pointer, arguments, closure information, etc.) when |
| 37 | +it is added to the chain, and `deferreturn` must find the right defers to |
| 38 | +unwind, copy out the call information, and invoke the deferred calls. |
| 39 | +Furthermore, this inhibits compiler optimizations like inlining, since the defer |
| 40 | +functions are called from the runtime using the defer information. |
| 41 | + |
| 42 | +When a function panics, the runtime runs the deferred calls on the defer chain |
| 43 | +until one of these calls `recover` or it exhausts the chain, resulting in a |
| 44 | +fatal panic. |
| 45 | +The stack itself is *not* unwound unless a deferred call recovers. |
| 46 | +This has the important property that examining the stack from a deferred call |
| 47 | +run during a panic will include the panicking frame, even if the defer was |
| 48 | +pushed by an ancestor of the panicking frame. |
| 49 | + |
| 50 | +In general, this defer chain is necessary since a function can defer an |
| 51 | +unbounded or dynamic number of calls that must all run when it returns. |
| 52 | +For example, a `defer` statement can appear in a loop or an `if` block. |
| 53 | +This also means that, in general, the defer objects must be heap-allocated, |
| 54 | +though the runtime uses an allocation pool to reduce the cost of the allocation. |
| 55 | + |
| 56 | +This is notably different from exception handling in C++ or Java, where the |
| 57 | +applicable set of `except` or `finally` blocks can be determined statically at |
| 58 | +every program counter in a function. |
| 59 | +In these languages, the non-exception case is typically inlined and exception |
| 60 | +handling is driven by a side table giving the locations of the `except` and |
| 61 | +`finally` blocks that apply to each PC. |
| 62 | + |
| 63 | +However, while Go's `defer` mechanism permits unbounded calls, the vast majority |
| 64 | +of functions that use `defer` invoke each `defer` statement at most once, and do |
| 65 | +not invoke `defer` in a loop. |
| 66 | +Go 1.13 adds an [optimization](https://golang.org/cl/171758) to stack-allocate |
| 67 | +defer objects in this case, but they must still be pushed and popped from the |
| 68 | +defer chain. |
| 69 | +This applies to 363 out of the 370 static defer sites in the `cmd/go` binary and |
| 70 | +speeds up this case by 30% relative to heap-allocated defer objects. |
| 71 | + |
| 72 | +This proposal combines this insight with the insights used in C++ and Java to |
| 73 | +make the non-panic case of most `defer` operations no more expensive than the |
| 74 | +manually open-coded case, while retaining correct `panic` handling. |
| 75 | + |
| 76 | + |
| 77 | +## Requirements |
| 78 | + |
| 79 | +While the common case of defer handling is simple enough, it can interact in |
| 80 | +often non-obvious ways with things like recursive panics, recover, and stack |
| 81 | +traces. |
| 82 | +Here we attempt to enumerate the requirements that any new defer implementation |
| 83 | +should likely satisfy, in addition to those in the language specification for |
| 84 | +[Defer statements](https://golang.org/ref/spec#Defer_statements) and [Handling |
| 85 | +panics](https://golang.org/ref/spec#Handling_panics). |
| 86 | + |
| 87 | +1. Executing a `defer` statement logically pushes a deferred function call onto |
| 88 | + a per-goroutine stack. |
| 89 | + Deferred calls are always executed starting from the top of this stack (hence |
| 90 | + in the reverse order of the execution of `defer` statements). |
| 91 | + Furthermore, each execution of a `defer` statement corresponds to exactly one |
| 92 | + deferred call (except in the case of program termination, where a deferred |
| 93 | + function may not be called at all). |
| 94 | + |
| 95 | +2. Defer calls are executed in one of two ways. |
| 96 | + Whenever a function call returns normally, the runtime starts popping and |
| 97 | + executing all existing defer calls for that stack frame only (in reverse order |
| 98 | + of original execution). |
| 99 | + Separately, whenever a panic (or a call to Goexit) occurs, the runtime starts |
| 100 | + popping and executing all existing defer calls for the entire defer stack. |
| 101 | + The execution of any defer call may be interrupted by a panic within the |
| 102 | + execution of the defer call. |
| 103 | + |
| 104 | +3. A program may have multiple outstanding panics, since a recursive (second) |
| 105 | + panic may occur during any of the defer calls being executed during the |
| 106 | + processing of the first panic. |
| 107 | + A previous panic is “aborted” if the processing of defers by the new panic |
| 108 | + reaches the frame where the previous panic was processing defers when the new |
| 109 | + panic happened. |
| 110 | + When a defer call returns that did a successful `recover` that applies to a |
| 111 | + panic, the stack is immediately unwound to the frame which contains the defer |
| 112 | + that did the recover call, and any remaining defers in that frame are |
| 113 | + executed. |
| 114 | + Normal execution continues in the preceding frame (but note that normal |
| 115 | + execution may actually be continuing a defer call for an outer panic). |
| 116 | + Any panic that has not been recovered or aborted must still appear on the |
| 117 | + caller stack. |
| 118 | + Note that the first panic may never continue its defer processing, if the |
| 119 | + second panic actually successfully runs all defer calls, but the original |
| 120 | + panic must appear on the stack during all the processing by the second panic. |
| 121 | + |
| 122 | +4. When a defer call is executed because a function is returning normally |
| 123 | + (whether there are any outstanding panics or not), the call site of a |
| 124 | + deferred call must appear to be the function that invoked `defer` to push |
| 125 | + that function on the defer stack, at the line where that function is |
| 126 | + returning. |
| 127 | + A consequence of this is that, if the runtime is executing deferred calls in |
| 128 | + panic mode and a deferred call recovers, it must unwind the stack immediately |
| 129 | + after that deferred call returns and before executing another deferred call. |
| 130 | + |
| 131 | +5. When a defer call is executed because of an explicit panic, the call stack of |
| 132 | + a deferred function must include `runtime.gopanic` and the frame that |
| 133 | + panicked (and its callers) immediately below the deferred function call. |
| 134 | + As mentioned, the call stack must also include any outstanding previous |
| 135 | + panics. |
| 136 | + If a defer call is executed because of a run-time panic, the same condition |
| 137 | + applies, except that ‘runtime.gopanic’ does not necessarily need to be on the |
| 138 | + stack. |
| 139 | + (In the current gc-go implementation, runtime.gopanic does appear on |
| 140 | + the stack even for run-time panics.) |
| 141 | + |
| 142 | +## Proposal |
| 143 | + |
| 144 | +We propose optimizing deferred calls in functions where every `defer` is |
| 145 | +executed at most once (specifically, a `defer` may be on a conditional path, but |
| 146 | +is never in a loop in the control-flow graph). |
| 147 | +In this optimization, the compiler assigns a bit for every `defer` site to |
| 148 | +indicate whether that defer had been reached or not. |
| 149 | +The `defer` statement itself simply sets the corresponding bit and stores all |
| 150 | +necessary arguments in specific stack slots. |
| 151 | +Then, at every exit point of the function, the compiler open-codes each deferred |
| 152 | +call, protected by (and clearing) each corresponding bit. |
| 153 | + |
| 154 | +For example, the following: |
| 155 | + |
| 156 | +```go |
| 157 | +defer f1(a) |
| 158 | +if cond { |
| 159 | + defer f2(b) |
| 160 | +} |
| 161 | +body... |
| 162 | +``` |
| 163 | + |
| 164 | +would compile to |
| 165 | + |
| 166 | +```go |
| 167 | +deferBits |= 1<<0 |
| 168 | +tmpF1 = f1 |
| 169 | +tmpA = a |
| 170 | +if cond { |
| 171 | + deferBits |= 1<<1 |
| 172 | + tmpF2 = f2 |
| 173 | +tmpB = b |
| 174 | +} |
| 175 | +body... |
| 176 | +exit: |
| 177 | +if deferBits & 1<<1 != 0 { |
| 178 | + deferBits &^= 1<<1 |
| 179 | + tmpF2(tmpB) |
| 180 | +} |
| 181 | +if deferBits & 1<<0 != 0 { |
| 182 | + deferBits &^= 1<<0 |
| 183 | + tmpF1(tmpA) |
| 184 | +} |
| 185 | +``` |
| 186 | + |
| 187 | +In order to ensure that the value of `deferBits` and all the tmp variables are |
| 188 | +available in case of a panic, these variables must be allocated explicit stack |
| 189 | +slots and the stores to deferBits and the tmp variables (`tmpF1`, `tmpA`, etc.) |
| 190 | +must write the values into these stack slots. |
| 191 | +In addition, the updates to `deferBits` in the defer exit code must explicitly |
| 192 | +store the `deferBits` value to the corresponding stack slot. |
| 193 | +This will ensure that panic processing can determine exactly which defers have |
| 194 | +been executed so far. |
| 195 | + |
| 196 | +However, the defer exit code can still be optimized significantly in many cases. |
| 197 | +We can refer directly to the `deferBits` and tmpA ‘values’ (in the SSA sense), |
| 198 | +and these accesses can therefore be optimized in terms of using existing values |
| 199 | +in registers, propagating constants, etc. |
| 200 | +Also, if the defers were called unconditionally, then constant propagation may |
| 201 | +in some cases to eliminate the checks on `deferBits` (because the value of |
| 202 | +`deferBits` is known statically at the exit point). |
| 203 | + |
| 204 | +If there are multiple exits (returns) from the function, we can either duplicate |
| 205 | +the defer exit code at each exit, or we can have one copy of the defer exit code |
| 206 | +that is shared among all (or most) of the exits. |
| 207 | +Note that any sharing of defer-exit code code may lead to less specific line |
| 208 | +numbers (which don’t indicate the exact exit location) if the user happens to |
| 209 | +look at the call stack while in a call made by the defer exit code. |
| 210 | + |
| 211 | +## Panic processing |
| 212 | + |
| 213 | +Because no actual defer records have been created, panic processing is quite |
| 214 | +different and somewhat more complex in this approach. |
| 215 | +When generating the code for a function, the compiler also emits an extra set of |
| 216 | +`FUNCDATA` information that records information about each of the open-coded |
| 217 | +defers. |
| 218 | +For each open-coded defer, the compiler emits `FUNCDATA` that specifies the |
| 219 | +exact stack locations that store the function pointer and each of the arguments. |
| 220 | +It also emits the location of the stack slot containing `deferBits`. |
| 221 | +Since stack frames can get arbitrarily large, the compiler uses a varint |
| 222 | +encoding for the stack slot offsets. |
| 223 | + |
| 224 | +In addition, for all functions with open-coded defers, the compiler adds a small |
| 225 | +segment of code that does a call to `runtime.deferreturn` and then returns. |
| 226 | +This code segment is not reachable by the main code of the function, but is used |
| 227 | +to unwind the stack properly when a panic is successfully recovered. |
| 228 | + |
| 229 | +To handle a `panic`, the runtime conceptually walks the defer chain in parallel |
| 230 | +with the stack in order to interleave execution of pushed defers with defers in |
| 231 | +open-coded frames. |
| 232 | +When the runtime encounters an open-coded frame `F` executing function ‘f’, it |
| 233 | +executes the following steps. |
| 234 | + |
| 235 | +1. The runtime reads the funcdata for function `f` that contains the open-defer |
| 236 | + information. |
| 237 | + |
| 238 | +2. Using the information about the location in frame `F` of the stack slot for |
| 239 | + `deferBits`, the runtime loads the current value of `deferBits` for this |
| 240 | + frame. |
| 241 | + The runtime processes each of the active defers, as specified by the value of |
| 242 | + `deferBits`, in reverse order. |
| 243 | + |
| 244 | +3. For each active defer, the runtime loads the function pointer for the defer |
| 245 | + call from the appropriate stack slot. |
| 246 | + It also builds up an argument frame by copying each of the defer arguments |
| 247 | + from its specified stack slot to the appropriate location in the argument |
| 248 | + frame. |
| 249 | + It then updates `deferBits` in its stack slot after masking off the bit for |
| 250 | + the current defer. |
| 251 | + Then it uses the function pointer and argument frame to call the deferred |
| 252 | + function. |
| 253 | + |
| 254 | +4. If the defer call returns normally without doing a recovery, then the runtime |
| 255 | + continues executing active defer calls for frame F until all active defer |
| 256 | + calls have finished. |
| 257 | + |
| 258 | +5. If any defer call returns normally but has done a successful recover, then |
| 259 | + the runtime stops processing defers in the current frame. |
| 260 | + There may or may not be any remaining defers to process. |
| 261 | + The runtime then arranges to jump to the `deferreturn` code segment and |
| 262 | + unwind the stack to frame `F`, by simultaneously setting the PC to the |
| 263 | + address of the `deferreturn` segment and setting the SP to the appropriate |
| 264 | + value for frame `F`. |
| 265 | + The `deferreturn` code segment then calls back into the runtime. |
| 266 | + The runtime can now process any remaining active defers from frame `F`. |
| 267 | + But for these defers, the stack has been appropriately unwound and the defer |
| 268 | + appears to be called directly from function `f`. |
| 269 | + When all defers for the frame have finished, the deferreturn finishes and the |
| 270 | + code segment returns from frame F to continue execution. |
| 271 | + |
| 272 | +If a deferred call in step 3 itself panics, the runtime starts its normal panic |
| 273 | +processing again. |
| 274 | +For any frame with open-coded defers that has already run some defers, the |
| 275 | +deferBits value at the specified stack slot will always accurately reflect the |
| 276 | +remaining defers that need to be run. |
| 277 | + |
| 278 | +## Rationale |
| 279 | + |
| 280 | +One other approach that we extensively considered (and prototyped) also has |
| 281 | +inlined defer code for the normal case, but actual executes the defer exit code |
| 282 | +directly even in the panic case. |
| 283 | +Executing the defer exit code in the panic case requires duplication of stack |
| 284 | +frame F and some complex runtime code to start execution of the defer exit code |
| 285 | +using this new duplicated frame and to regain control when the defer exit code |
| 286 | +complete. |
| 287 | +The required runtime code for this approach is much more architecture-dependent |
| 288 | +and seems to be much more complex (and possibly fragile). |
| 289 | + |
| 290 | + |
| 291 | +## Compatibility |
| 292 | + |
| 293 | +This proposal does not change any user-facing APIs, and hence satisfies the [compatibility |
| 294 | +guidelines](https://golang.org/doc/go1compat). |
| 295 | + |
| 296 | +## Implementation |
| 297 | + |
| 298 | +An implementation has been mostly done. |
| 299 | +The change is [here](https://go-review.googlesource.com/c/go/+/190098/6) |
| 300 | +Comments on the design or implementation are very welcome. |
| 301 | + |
| 302 | +Some miscellaneous implementation details: |
| 303 | + |
| 304 | +1. We need to restrict the number of defers in a function to the size of the |
| 305 | + deferBits bitmask. |
| 306 | + To minimize code size, we currently make deferBits to be 8 bits, and don’t do |
| 307 | + open-coded defers if there are more than 8 defers in a function. |
| 308 | + |
| 309 | +2. The deferBits variable and defer arguments variables (such as ‘tmpA’) must be |
| 310 | + declared (via OpVarDef) in the entry block, since the unconditional defer |
| 311 | + exit code at the bottom of the function will access them, so these variables |
| 312 | + are live throughout the entire function. |
| 313 | + (And, of course, they can be accessed by panic processing at any point within |
| 314 | + the function that might cause a panic.) |
| 315 | + For any defer argument stack slots that are pointers (or contain pointers), |
| 316 | + we must initialize those stack slots to zero in the entry block. |
| 317 | + The initialization is required for garbage collection, which doesn’t know |
| 318 | + which of these defer arguments are active (i.e. which of the defer sites have |
| 319 | + been reached, but the corresponding defer call has not yet happened) |
| 320 | + |
| 321 | +2. Because the `deferreturn` code segment is disconnected from the rest of the |
| 322 | + function, it would not normally indicate that any stack slots are live. |
| 323 | + However, we want the liveness information at the `deferreturn` call to |
| 324 | + indicate that all of the stack slots associated with defers (which may |
| 325 | + include pointers to variables accessed by closures) and all of the return |
| 326 | + values are live. |
| 327 | + We must explicitly set the liveness for the `deferreturn` call to be the same |
| 328 | + as the liveness at the first defer call on the defer exit path. |
0 commit comments