-
Notifications
You must be signed in to change notification settings - Fork 60
[BEAM] Critical sections #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,290 @@ | ||
# Critical sections | ||
|
||
When two contexts (e.g. interrupt handlers) running at different priorities need | ||
to access the same static variable some form mutual exclusion is required for | ||
memory safety. Mutual exclusion can be implemented using a critical section on | ||
the lower priority context where the critical section prevents the start of the | ||
higher priority handler (preemption). We use compiler fences or the "memory" | ||
clobber to prevent the compiler misoptimizing these critical sections but are | ||
they enough? | ||
|
||
Note that: | ||
|
||
- The programs in this document target the [Basic Embedded Abstract Machine | ||
(BEAM)][beam]. Please become familiar with the linked specification before you | ||
read the rest of this document. | ||
|
||
[beam]: https://github.com/rust-lang/unsafe-code-guidelines/pull/111 | ||
|
||
- In these programs we assume that [rust-lang/rfcs#2585][rfc2585] has been | ||
accepted and implemented. | ||
|
||
[rfc2585]: https://github.com/rust-lang/rfcs/pull/2585 | ||
|
||
## Disable all interrupts (global mask) | ||
|
||
Consider this program where a critical section is created by temporarily | ||
disabling *all* interrupts. | ||
|
||
``` rust | ||
#![no_std] | ||
|
||
static mut X: Type = Type::default(); | ||
|
||
#[no_mangle] | ||
unsafe fn main() -> ! { | ||
unsafe { | ||
asm!("ENABLE_INTERRUPTS" : : : : "volatile"); | ||
} | ||
|
||
loop { | ||
// .. any safe code .. | ||
|
||
unsafe { | ||
// start of critical section | ||
asm!("DISABLE_INTERRUPTS" : : : "memory" : "volatile"); | ||
// ^^^^^^^^ | ||
} | ||
|
||
// `INTERRUPT0` can *not* preempt this block | ||
// (because all interrupts are disabled) | ||
{ | ||
let x: &mut Type = unsafe { | ||
&mut X | ||
}; | ||
|
||
// .. any safe code .. | ||
} | ||
|
||
unsafe { | ||
// end of critical section | ||
asm!("ENABLE_INTERRUPTS" : : : "memory" : "volatile"); | ||
// ^^^^^^^^ | ||
} | ||
|
||
// .. any safe code .. | ||
} | ||
} | ||
|
||
#[no_mangle] | ||
unsafe fn INTERRUPT0() { | ||
let x: &mut Type = unsafe { | ||
&mut X | ||
}; | ||
|
||
// .. any safe code .. | ||
} | ||
``` | ||
|
||
Note that "any safe code" can *not* call `main` or `INTERRUPT0` (because they | ||
are `unsafe` functions), use `asm!` or access registers. | ||
|
||
**Claim**: This program is well-defined / sound if and only if `Type` | ||
implements the `Send` trait. | ||
|
||
Example that shows that the bound is required: `type Type = Rc<u8>` could | ||
result in an unsound program (data race between `main` and `INTERRUPT0`). | ||
|
||
"Why are the memory clobbers required?" Without them the compiler can reorder | ||
`main`'s operations on `X` to outside the critical section leading to a data | ||
race. | ||
|
||
## Interrupt masking | ||
|
||
Consider this program that creates a critical section by masking a single | ||
interrupt (individual mask). | ||
|
||
> Aside: it's also possible to implement a critical section by raising the | ||
> running priority but the implementation of that kind of critical section is | ||
> very similar to this one (volatile write + compiler fence) so we won't include | ||
> it in this document. | ||
|
||
``` rust | ||
#![no_std] | ||
|
||
use core::{cell::UnsafeCell, ptr, sync::atomic::{self, Ordering}}; | ||
|
||
extern "C" { | ||
static MASK_INTERRUPT: UnsafeCell<u8>; | ||
static UNMASK_INTERRUPT: UnsafeCell<u8>; | ||
} | ||
|
||
static mut X: Type = Type::default(); | ||
|
||
#[no_mangle] | ||
unsafe fn main() -> ! { | ||
unsafe { | ||
asm!("ENABLE_INTERRUPTS" : : : : "volatile"); | ||
} | ||
|
||
loop { | ||
// .. any safe code .. | ||
|
||
unsafe { | ||
// start of critical section | ||
ptr::write_volatile(MASK_INTERRUPT.get(), 1 << 0); | ||
} | ||
|
||
atomic::compiler_fence(Ordering::SeqCst); | ||
|
||
// `INTERRUPT0` can *not* preempt this block | ||
// (because it's masked) | ||
{ | ||
let x: &mut Type = unsafe { | ||
&mut X | ||
}; | ||
|
||
// .. any safe code .. | ||
} | ||
|
||
atomic::compiler_fence(Ordering::SeqCst); | ||
|
||
unsafe { | ||
// end of critical section | ||
ptr::write_volatile(UNMASK_INTERRUPT.get(), 1 << 0); | ||
} | ||
|
||
// .. any safe code .. | ||
} | ||
} | ||
|
||
#[no_mangle] | ||
unsafe fn INTERRUPT0() { | ||
let x: &mut Type = unsafe { | ||
&mut X | ||
}; | ||
|
||
// .. any safe code .. | ||
} | ||
``` | ||
|
||
**Claim**: This program is well-defined / sound if and only if `Type` | ||
implements the `Send` trait. | ||
|
||
Example that shows that the bound is required: `type Type = Rc<u8>` could | ||
result in an unsound program (data race between `main` and `INTERRUPT0`). | ||
|
||
"Why are the compiler fences required?" Without them the compiler can reorder | ||
`main`'s operations on `X` to outside the critical section leading to a data | ||
race. | ||
|
||
## Questions | ||
|
||
- Can these programs be misoptimized by the compiler? In particular, the | ||
compiler fences in the second program prevent memory operations on `X` from | ||
being reordered to outside the critical section but AFAIK they don't tell the | ||
compiler that `X` may change outside the critical section -- could the program | ||
cache the value of `X` on the stack? That would change the semantics of the | ||
program. | ||
|
||
- I have observed that an `asm!("")` expression with *no* clobbers prevents | ||
operations on `static mut` variables from being merged and reordered. See | ||
example below. Is this intended behavior? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that this is intended. LLVM may apply stronger constraints than necessary on asm, but that is not something that we should rely on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Rather than rely on this I would actually like to see instructions like (At some point I thought that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's probably related to this issue: LLVM is weird and expects the But also note that Clang (but not GCC) treats |
||
|
||
``` rust | ||
#[no_mangle] | ||
static mut X: u32 = 0; | ||
|
||
#[no_mangle] | ||
unsafe fn INTERRUPT0() { | ||
X += 1; | ||
|
||
asm!(""); | ||
|
||
X += 2; | ||
} | ||
``` | ||
|
||
Produces this machine code (sorry for the ARM assembly) | ||
|
||
``` armasm | ||
INTERRUPT0: | ||
movw r0, #0 | ||
movt r0, #8192 | ||
ldr r1, [r0] | ||
adds r1, #1 | ||
str r1, [r0] ; X += 1 | ||
ldr r1, [r0] | ||
adds r1, #2 | ||
str r1, [r0] ; X += 2 | ||
bx lr | ||
``` | ||
|
||
This is the corresponding (post-optimization) LLVM IR: | ||
|
||
``` llvm | ||
; Function Attrs: nounwind | ||
define void @INTERRUPT0() unnamed_addr #1 !dbg !1268 { | ||
start: | ||
%0 = load i32, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1269 | ||
%1 = add i32 %0, 1, !dbg !1269 | ||
store i32 %1, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1269 | ||
tail call void asm sideeffect "", ""() #5, !dbg !1270, !srcloc !1271 | ||
%2 = load i32, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1272 | ||
%3 = add i32 %2, 2, !dbg !1272 | ||
store i32 %3, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1272 | ||
ret void, !dbg !1273 | ||
} | ||
``` | ||
|
||
## Other comments | ||
|
||
(Feel free to disregard this section completely; it's about better optimizations | ||
rather than misoptimizations) | ||
|
||
`atomic::compiler_fence` and the "memory" clobber are coarse grained and they | ||
can prevent optimization of memory accesses that don't need to be synchronized. | ||
For example, this Rust code | ||
|
||
``` rust | ||
use core::{ptr, sync::atomic::{self, Ordering}}; | ||
|
||
static mut X: u32 = 0; | ||
|
||
unsafe fn main() -> ! { | ||
let mut y = 0; | ||
|
||
// this could be part of a critical section | ||
atomic::compiler_fence(Ordering::SeqCst); | ||
|
||
y += 1; | ||
|
||
// prevent the compiler from optimizing away `y` | ||
unsafe { | ||
ptr::read_volatile(&y); | ||
} | ||
|
||
loop {} | ||
} | ||
``` | ||
|
||
produces this machine code | ||
|
||
``` armasm | ||
main: | ||
sub sp, #4 | ||
movs r0, #0 | ||
str r0, [sp] ; y = 0 | ||
ldr r0, [sp] | ||
adds r0, #1 | ||
str r0, [sp] ; y += 1 | ||
ldr r0, [sp] ; ptr::read_volatile | ||
b #-4 <main+0xe> | ||
``` | ||
|
||
Without the compiler fence `y = 0` and `y += 1` would have been optimized into | ||
`y = 1` resulting in shorter machine code: | ||
|
||
``` armasm | ||
main: | ||
sub sp, #4 | ||
movs r0, #1 | ||
str r0, [sp] ; y = 1 | ||
ldr r0, [sp] ; ptr::read_volatile | ||
b #-4 <main+0x8> | ||
``` | ||
|
||
I wish we had a `atomic::compiler_fence_val(&x, ORDERING)` function that | ||
prevented only memory operations on `x` from being reordered -- though I don't | ||
know if LLVM supports that kind of fine grained compiler fences -- that would | ||
result in better optimizations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that is the whole point of a compiler fence: it tells the compiler that other threads may have modified shared data (i.e. globals + any locals whose address has been made available globally) and that it should reload those at the point of the fence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also flushes writes made before the fence, which is why
memory
is required on the secondasm!
(that re-enables interrupts).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Amanieu is that documented somewhere? That's not what I understood from the
atomic::compiler_fence
documentation; that doc only talks about preventingthe compiler from reordering memory operations across the fence; it says
nothing about forcing shared / static variables to be reloaded.
I can certainly see the behavior you mention arising from
asm!("" : : : "memory")
where the compiler has to assume that the assembly block may modifyany memory, but the text you have commented on is asking about the program that
exclusively uses
atomic::compiler_fence
. Or do you mean to say that I shouldconsider
asm!("" : : : "memory")
andatomic::compiler_fence
to beequivalent? IME, they do have observable differences on optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the source code loads a variable twice but the emitted assembly only loads it once, the compiler has effectively reordered the second load up to the same point as the first load.
atomic::compiler_fence
emits a LLVMfence
instruction withsyncscope
set tosinglethread
. As mentioned in a different thread, it's also used for Catomic_signal_fence
, which is made specifically for the interrupts use case:(It says "signal handler", but from the compiler's perspective, a Unix signal handler in userland and an interrupt handler on bare metal have basically the same semantics; they both interrupt execution at an arbitrary instruction and move control flow to somewhere else.)
I think
asm!("" : : : "memory")
should be largely if not entirely equivalent toatomic::compiler_fence
with eitherAcqRel
orSeqCst
(as opposed toAcquire
andRelease
, which only create one-sided barriers). I'd be interested to see what code creates differences in assembly output between the two. Since they are different LLVM constructs, it's possible that LLVM's optimizer could just coincidentally generate different code (in particular, it might be being overly conservative withasm!
), but it's also possible that I'm forgetting something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@japaric
The RFC that added these describes their semantics as equivalent to C11's
atomic_signal_fence
. The current docs do not make that clear, but I'd argue that this is a documentation bug.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comex
That clarifies things; thanks.
(Thinking of #112) Is removing a store operation considered a re-ordering?
compiler_fence
prevents memory operations on stack variables from beingmerged / reordered. The "memory" clobber seems to have no effect on stack
variables. See code below:
vs
If you replace the stack variable with a static variable then both the compiler
fence and the "memory" clobber prevent merging / re-ordering of memory
operations on the static variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are asking? It is not, but once you reordered enough to get two adjacent stores you can obviously remove the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this looks like an arbitrary LLVM limitation and not a fundamental difference. All clobber and orders aside, the compiler can in both cases assume that
x
is unobserved by the outside world because its address is not leaked to anything. So IMO in both cases it would be legal to optimize away all the writes, and just leave the volatile read.