Skip to content

[BEAM] Critical sections #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
290 changes: 290 additions & 0 deletions active_discussion/embedded/critical-section.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# Critical sections

When two contexts (e.g. interrupt handlers) running at different priorities need
to access the same static variable some form mutual exclusion is required for
memory safety. Mutual exclusion can be implemented using a critical section on
the lower priority context where the critical section prevents the start of the
higher priority handler (preemption). We use compiler fences or the "memory"
clobber to prevent the compiler misoptimizing these critical sections but are
they enough?

Note that:

- The programs in this document target the [Basic Embedded Abstract Machine
(BEAM)][beam]. Please become familiar with the linked specification before you
read the rest of this document.

[beam]: https://github.com/rust-lang/unsafe-code-guidelines/pull/111

- In these programs we assume that [rust-lang/rfcs#2585][rfc2585] has been
accepted and implemented.

[rfc2585]: https://github.com/rust-lang/rfcs/pull/2585

## Disable all interrupts (global mask)

Consider this program where a critical section is created by temporarily
disabling *all* interrupts.

``` rust
#![no_std]

static mut X: Type = Type::default();

#[no_mangle]
unsafe fn main() -> ! {
unsafe {
asm!("ENABLE_INTERRUPTS" : : : : "volatile");
}

loop {
// .. any safe code ..

unsafe {
// start of critical section
asm!("DISABLE_INTERRUPTS" : : : "memory" : "volatile");
// ^^^^^^^^
}

// `INTERRUPT0` can *not* preempt this block
// (because all interrupts are disabled)
{
let x: &mut Type = unsafe {
&mut X
};

// .. any safe code ..
}

unsafe {
// end of critical section
asm!("ENABLE_INTERRUPTS" : : : "memory" : "volatile");
// ^^^^^^^^
}

// .. any safe code ..
}
}

#[no_mangle]
unsafe fn INTERRUPT0() {
let x: &mut Type = unsafe {
&mut X
};

// .. any safe code ..
}
```

Note that "any safe code" can *not* call `main` or `INTERRUPT0` (because they
are `unsafe` functions), use `asm!` or access registers.

**Claim**: This program is well-defined / sound if and only if `Type`
implements the `Send` trait.

Example that shows that the bound is required: `type Type = Rc<u8>` could
result in an unsound program (data race between `main` and `INTERRUPT0`).

"Why are the memory clobbers required?" Without them the compiler can reorder
`main`'s operations on `X` to outside the critical section leading to a data
race.

## Interrupt masking

Consider this program that creates a critical section by masking a single
interrupt (individual mask).

> Aside: it's also possible to implement a critical section by raising the
> running priority but the implementation of that kind of critical section is
> very similar to this one (volatile write + compiler fence) so we won't include
> it in this document.

``` rust
#![no_std]

use core::{cell::UnsafeCell, ptr, sync::atomic::{self, Ordering}};

extern "C" {
static MASK_INTERRUPT: UnsafeCell<u8>;
static UNMASK_INTERRUPT: UnsafeCell<u8>;
}

static mut X: Type = Type::default();

#[no_mangle]
unsafe fn main() -> ! {
unsafe {
asm!("ENABLE_INTERRUPTS" : : : : "volatile");
}

loop {
// .. any safe code ..

unsafe {
// start of critical section
ptr::write_volatile(MASK_INTERRUPT.get(), 1 << 0);
}

atomic::compiler_fence(Ordering::SeqCst);

// `INTERRUPT0` can *not* preempt this block
// (because it's masked)
{
let x: &mut Type = unsafe {
&mut X
};

// .. any safe code ..
}

atomic::compiler_fence(Ordering::SeqCst);

unsafe {
// end of critical section
ptr::write_volatile(UNMASK_INTERRUPT.get(), 1 << 0);
}

// .. any safe code ..
}
}

#[no_mangle]
unsafe fn INTERRUPT0() {
let x: &mut Type = unsafe {
&mut X
};

// .. any safe code ..
}
```

**Claim**: This program is well-defined / sound if and only if `Type`
implements the `Send` trait.

Example that shows that the bound is required: `type Type = Rc<u8>` could
result in an unsound program (data race between `main` and `INTERRUPT0`).

"Why are the compiler fences required?" Without them the compiler can reorder
`main`'s operations on `X` to outside the critical section leading to a data
race.

## Questions

- Can these programs be misoptimized by the compiler? In particular, the
compiler fences in the second program prevent memory operations on `X` from
being reordered to outside the critical section but AFAIK they don't tell the
compiler that `X` may change outside the critical section -- could the program
cache the value of `X` on the stack? That would change the semantics of the
program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that is the whole point of a compiler fence: it tells the compiler that other threads may have modified shared data (i.e. globals + any locals whose address has been made available globally) and that it should reload those at the point of the fence.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also flushes writes made before the fence, which is why memory is required on the second asm! (that re-enables interrupts).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu is that documented somewhere? That's not what I understood from the
atomic::compiler_fence documentation; that doc only talks about preventing
the compiler from reordering memory operations across the fence; it says
nothing about forcing shared / static variables to be reloaded.

I can certainly see the behavior you mention arising from asm!("" : : : "memory") where the compiler has to assume that the assembly block may modify
any memory, but the text you have commented on is asking about the program that
exclusively uses atomic::compiler_fence. Or do you mean to say that I should
consider asm!("" : : : "memory") and atomic::compiler_fence to be
equivalent? IME, they do have observable differences on optimizations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the source code loads a variable twice but the emitted assembly only loads it once, the compiler has effectively reordered the second load up to the same point as the first load.

atomic::compiler_fence emits a LLVM fence instruction with syncscope set to singlethread. As mentioned in a different thread, it's also used for C atomic_signal_fence, which is made specifically for the interrupts use case:

Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, between a thread and a signal handler executed on the same thread.

(It says "signal handler", but from the compiler's perspective, a Unix signal handler in userland and an interrupt handler on bare metal have basically the same semantics; they both interrupt execution at an arbitrary instruction and move control flow to somewhere else.)

I think asm!("" : : : "memory") should be largely if not entirely equivalent to atomic::compiler_fence with either AcqRel or SeqCst (as opposed to Acquire and Release, which only create one-sided barriers). I'd be interested to see what code creates differences in assembly output between the two. Since they are different LLVM constructs, it's possible that LLVM's optimizer could just coincidentally generate different code (in particular, it might be being overly conservative with asm!), but it's also possible that I'm forgetting something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@japaric

That's not what I understood from the
atomic::compiler_fence documentation; that doc only talks about preventing
the compiler from reordering memory operations across the fence; it says
nothing about forcing shared / static variables to be reloaded.

The RFC that added these describes their semantics as equivalent to C11's atomic_signal_fence. The current docs do not make that clear, but I'd argue that this is a documentation bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comex

If the source code loads a variable twice but the emitted assembly only loads
it once, the compiler has effectively reordered the second load up to the same
point as the first load.

That clarifies things; thanks.

(Thinking of #112) Is removing a store operation considered a re-ordering?

I'd be interested to see what code creates differences in assembly output
between the two

compiler_fence prevents memory operations on stack variables from being
merged / reordered. The "memory" clobber seems to have no effect on stack
variables. See code below:

fn SysTick() {
    let mut x = 0;

    atomic::compiler_fence(Ordering::AcqRel);

    x += 1;

    unsafe {
        // prevent LLVM from optimizing away `x`
        ptr::read_volatile(&x);
    }
}
SysTick:
        sub     sp, #4
        movs    r0, #0
        str     r0, [sp] ; x = 0
        ldr     r0, [sp]
        adds    r0, #1
        str     r0, [sp] ; x += 1
        ldr     r0, [sp] ; read_volatile(&x)
        add     sp, #4
        bx      lr

vs

fn SysTick() {
    let mut x = 0;

    unsafe { asm!("" : : : "memory" : "volatile") }

    x += 1;

    unsafe {
        // prevent LLVM from optimizing away `x`
        ptr::read_volatile(&x);
    }
}
SysTick:
        sub     sp, #4
        movs    r0, #1
        str     r0, [sp] ; x = 1
        ldr     r0, [sp] ; read_volatile(&x)
        add     sp, #4
        bx      lr

If you replace the stack variable with a static variable then both the compiler
fence and the "memory" clobber prevent merging / re-ordering of memory
operations on the static variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Thinking of #112) Is removing a store operation considered a re-ordering?

Not sure what you are asking? It is not, but once you reordered enough to get two adjacent stores you can obviously remove the first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler_fence prevents memory operations on stack variables from being
merged / reordered. The "memory" clobber seems to have no effect on stack
variables. See code below:

To me this looks like an arbitrary LLVM limitation and not a fundamental difference. All clobber and orders aside, the compiler can in both cases assume that x is unobserved by the outside world because its address is not leaked to anything. So IMO in both cases it would be legal to optimize away all the writes, and just leave the volatile read.


- I have observed that an `asm!("")` expression with *no* clobbers prevents
operations on `static mut` variables from being merged and reordered. See
example below. Is this intended behavior?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is intended. LLVM may apply stronger constraints than necessary on asm, but that is not something that we should rely on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not something that we should rely on

Rather than rely on this I would actually like to see instructions like asm!("NOP" :::: "volatile") not preventing optimizations.

(At some point I thought that the sideeffect attribute that's attached to to the call asm in the LLVM-IR was preventing the optimizations but after double checking today by running opt on the IR after removing the sideeffect attribute I see that that doesn't help.)

Copy link

@comex comex Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably related to this issue: LLVM is weird and expects the memory clobber to be partly handled by the frontend, by attaching readnone/readonly to the call asm instructions; Clang does this but rustc doesn't, so in effect, every Rust asm! statement is treated as if it had the memory clobber.

But also note that Clang (but not GCC) treats volatile as implying a memory clobber, and both Clang and GCC treat any asm statement with no output operands as implicitly volatile. Whether Rust asm! does the same is... not fully defined, I guess; rustc does mark asm! without output operands as sideeffect, but the current implementation doesn't have an opinion on whether volatile should imply clobbering memory, since it treats everything as clobbering memory. :p


``` rust
#[no_mangle]
static mut X: u32 = 0;

#[no_mangle]
unsafe fn INTERRUPT0() {
X += 1;

asm!("");

X += 2;
}
```

Produces this machine code (sorry for the ARM assembly)

``` armasm
INTERRUPT0:
movw r0, #0
movt r0, #8192
ldr r1, [r0]
adds r1, #1
str r1, [r0] ; X += 1
ldr r1, [r0]
adds r1, #2
str r1, [r0] ; X += 2
bx lr
```

This is the corresponding (post-optimization) LLVM IR:

``` llvm
; Function Attrs: nounwind
define void @INTERRUPT0() unnamed_addr #1 !dbg !1268 {
start:
%0 = load i32, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1269
%1 = add i32 %0, 1, !dbg !1269
store i32 %1, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1269
tail call void asm sideeffect "", ""() #5, !dbg !1270, !srcloc !1271
%2 = load i32, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1272
%3 = add i32 %2, 2, !dbg !1272
store i32 %3, i32* bitcast (<{ [4 x i8] }>* @X to i32*), align 4, !dbg !1272
ret void, !dbg !1273
}
```

## Other comments

(Feel free to disregard this section completely; it's about better optimizations
rather than misoptimizations)

`atomic::compiler_fence` and the "memory" clobber are coarse grained and they
can prevent optimization of memory accesses that don't need to be synchronized.
For example, this Rust code

``` rust
use core::{ptr, sync::atomic::{self, Ordering}};

static mut X: u32 = 0;

unsafe fn main() -> ! {
let mut y = 0;

// this could be part of a critical section
atomic::compiler_fence(Ordering::SeqCst);

y += 1;

// prevent the compiler from optimizing away `y`
unsafe {
ptr::read_volatile(&y);
}

loop {}
}
```

produces this machine code

``` armasm
main:
sub sp, #4
movs r0, #0
str r0, [sp] ; y = 0
ldr r0, [sp]
adds r0, #1
str r0, [sp] ; y += 1
ldr r0, [sp] ; ptr::read_volatile
b #-4 <main+0xe>
```

Without the compiler fence `y = 0` and `y += 1` would have been optimized into
`y = 1` resulting in shorter machine code:

``` armasm
main:
sub sp, #4
movs r0, #1
str r0, [sp] ; y = 1
ldr r0, [sp] ; ptr::read_volatile
b #-4 <main+0x8>
```

I wish we had a `atomic::compiler_fence_val(&x, ORDERING)` function that
prevented only memory operations on `x` from being reordered -- though I don't
know if LLVM supports that kind of fine grained compiler fences -- that would
result in better optimizations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If x where to be a pointer here, that would not prevent re-orderings on memory that can be reached from x, only on x itself. Would that be ok?