|
| 1 | +# RFC: Add a Mutex trait |
| 2 | + |
| 3 | +# Summary |
| 4 | +[summary]: #summary |
| 5 | + |
| 6 | +This RFC proposes to add a `core-mutex` crate which adds a `Mutex` trait for a stable and generic foundation on which HALs and applications can be built on when there is a need for mutual exclusive access. |
| 7 | + |
| 8 | +The original idea to a `Mutex` trait was given here [embedded-hal#132]. |
| 9 | + |
| 10 | +[embedded-hal#132]: https://github.com/rust-embedded/embedded-hal/pull/132 |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +Today in the embedded ecosystem there is no base on which mutual exclusive access stems from which can be used as a generic. This has made it so there are now multiple implementations of mutexes in the ecosystem that are incompatible (cortex-m and risc-v mutex based on [bare-metal mutex], [RTFM mutex], [mutex.rs], ...). The consequence of this is that no HAL or application can be made independent on the choice of `Mutex` and lock within their implementations, one has to select one which is detrimental to generic implementations. |
| 16 | + |
| 17 | +This can however easily be addressed by introducing a `Mutex` trait on which previous implementations can be built upon, which each architectural crate (cortex-m, risc-v, etc) reexports/implements, together with the [`embedded-hal`] which can use said trait in HAL design. |
| 18 | + |
| 19 | +[bare-metal mutex]: https://github.com/japaric/bare-metal/blob/a2f8b222a14253141bbc7853e5699641e61e5ebf/src/lib.rs#L63 |
| 20 | +[RTFM mutex]: https://github.com/japaric/cortex-m-rtfm/blob/fafeeb27270ef24fc3852711c6032f65aa7dbcc0/src/lib.rs#L289 |
| 21 | +[mutex.rs]: https://gist.github.com/mvirkkunen/cc63d20648737022a8c5f93666093f4a |
| 22 | +[`embedded-hal`]: https://github.com/rust-embedded/embedded-hal |
| 23 | + |
| 24 | +# Detailed design |
| 25 | +[detailed-design]: #detailed-design |
| 26 | + |
| 27 | + |
| 28 | +## Trait |
| 29 | + |
| 30 | +The trait is proposed as follows: |
| 31 | + |
| 32 | +```rust |
| 33 | +/// Any object implementing this trait guarantees exclusive access to the data contained |
| 34 | +/// within the mutex for the duration of the lock. |
| 35 | +pub trait Mutex { |
| 36 | + /// Data protected by the mutex |
| 37 | + type Data; |
| 38 | + |
| 39 | + /// Creates a critical section and grants temporary access to the protected data |
| 40 | + fn lock<R>(&mut self, f: impl FnOnce(&mut Self::Data) -> R) -> R; |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +The design is such that the data protected by the mutex is only accessible within the closure provided to the `lock` method. Moreover, the closure shall run such that it is guaranteed that no-one else can access the same data at the same time, for example within a critical section. The critical section can for example be: |
| 45 | + |
| 46 | +* Disabling interrupts (e.g. `cortex-m` / `risc-v` critical sections) |
| 47 | +* Analyzing the program and set BASEPRI in Cortex-M devices (Stack Resource Policy/RTFM way) |
| 48 | +* Analyzing the program and set interrupt masks correctly |
| 49 | +* etc |
| 50 | + |
| 51 | +### Taxonomy |
| 52 | + |
| 53 | +A "MUTual EXclusion" aka `Mutex` as used in this trait describes the **principle** and not the *implementation* characteristics. More specifically: |
| 54 | + |
| 55 | +* A struct `Spinlock` could be one implementor of trait `Mutex` where the struct name gives away HOW mutual exclusion is achieved. |
| 56 | +* A struct `Sleeplock` would be another implementor with a telling name. |
| 57 | + * For unfortunate reasons, a sleeping lock is often also called just `Mutex` in existing code (most prominent example being Linux). Using that naming scheme would mean a struct `Mutex` implementing the trait `Mutex` - this is discouraged to avoid stated ambiguity. |
| 58 | + |
| 59 | +## Helpers |
| 60 | + |
| 61 | +A concern that has been raised is that taking multiple locks can lead to excessive rightward drift as: |
| 62 | + |
| 63 | +```rust |
| 64 | +a.lock(|a| { |
| 65 | + b.lock(|b| { |
| 66 | + c.lock(|c| { |
| 67 | + d.lock(|d| { |
| 68 | + e.lock(|e| { |
| 69 | + // ... |
| 70 | + }); |
| 71 | + }); |
| 72 | + }); |
| 73 | + }); |
| 74 | +}); |
| 75 | +``` |
| 76 | + |
| 77 | +It is proposed to provide a helper implementation to allow the following shorthand: |
| 78 | + |
| 79 | +```rust |
| 80 | +(a, b, c, d, e).lock(|a, b, c, d, e| { |
| 81 | + // ... |
| 82 | +}); |
| 83 | +``` |
| 84 | + |
| 85 | +## Implementation |
| 86 | + |
| 87 | +A proof-of-concept implementation of the crate with trait and helpers [can be found here](https://github.com/korken89/core-mutex). |
| 88 | + |
| 89 | +### Design decisions and compatibility |
| 90 | + |
| 91 | +#### The [Rust standard library `Mutex`] uses `&self` for the `lock` method, why choose `&mut self`? |
| 92 | + |
| 93 | +This is the **most important** part of this RFC proposal. Let's assume that the implementation was `&self`, what would this mean? This would allow the following code to compile: |
| 94 | + |
| 95 | +```rust |
| 96 | +fn deadlock(mtx: &impl core_mutex::Mutex<Data = i32>) { |
| 97 | + mtx.lock(|data| { |
| 98 | + *data += 1; |
| 99 | + |
| 100 | + mtx.lock(|data| *data += 1); |
| 101 | + }); |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +This example allows for a mutex to lock itself within a lock of itself, which is **deadlock by construction** - while if implementing it with `&mut self` the following would happen: |
| 106 | + |
| 107 | +```rust |
| 108 | +fn compile_error(mtx: &mut impl core_mutex::Mutex<Data = i32>) { |
| 109 | + mtx.lock(|data| { |
| 110 | + *data += 1; |
| 111 | + |
| 112 | + mtx.lock(|data| *data += 1); |
| 113 | + }); |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +Which produces the following error and making the mutex **deadlock-safe by construction**: |
| 118 | + |
| 119 | +``` |
| 120 | +error[E0499]: cannot borrow `mtx` as mutable more than once at a time |
| 121 | + --> src/main.rs:40:5 |
| 122 | + | |
| 123 | +40 | mtx.lock(|data| { |
| 124 | + | ^ ---- ------ first mutable borrow occurs here |
| 125 | + | | | |
| 126 | + | _____| first borrow later used by call |
| 127 | + | | |
| 128 | +41 | | *data += 1; |
| 129 | +42 | | |
| 130 | +43 | | mtx.lock(|data| *data += 1); |
| 131 | + | | --- first borrow occurs due to use of `mtx` in closure |
| 132 | +44 | | }); |
| 133 | + | |______^ second mutable borrow occurs here |
| 134 | +``` |
| 135 | + |
| 136 | +The common point raised with `&mut self` is that this definition is **not compatible** with with the current `Mutex<RefCell<T>>` way of implementing mutexes in embedded Rust today, as one needs to take a mutable reference to the mutex. This can however be circumvented with the following naive implementation of the trait: |
| 137 | + |
| 138 | +```rust |
| 139 | +struct Mutex<T> { |
| 140 | + data: T, |
| 141 | +} |
| 142 | + |
| 143 | +impl<T> Mutex<T> { |
| 144 | + pub const fn new(data: T) -> Self { |
| 145 | + Self { data } |
| 146 | + } |
| 147 | + |
| 148 | + fn access(&self, _cs: &CriticalSection) -> &T { |
| 149 | + &self.data |
| 150 | + } |
| 151 | +} |
| 152 | + |
| 153 | +impl<'a, T> core_mutex::Mutex for &'a Mutex<RefCell<T>> { |
| 154 | + type Data = T; |
| 155 | + |
| 156 | + fn lock<R>(&mut self, f: impl FnOnce(&mut Self::Data) -> R) -> R { |
| 157 | + cortex_m::interrupt::free(|cs| f(&mut *self.access(cs).borrow_mut())) |
| 158 | + } |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +Which allows for the following program to exist, hence **not breaking compatibility** with existing pattern: |
| 163 | + |
| 164 | +```rust |
| 165 | +static MUT: Mutex<RefCell<i32>> = Mutex::new(RefCell::new(0)); |
| 166 | + |
| 167 | +#[entry] |
| 168 | +fn init() -> ! { |
| 169 | + let mut r = &MUT; // Note that r is mutable |
| 170 | + |
| 171 | + // Locking!! |
| 172 | + r.lock(|data| *data += 1); |
| 173 | + |
| 174 | + loop {} |
| 175 | +} |
| 176 | + |
| 177 | +#[interrupt] |
| 178 | +fn SWI0_EGU0() { |
| 179 | + let mut r = &MUT; // Note that r is mutable |
| 180 | + |
| 181 | + // Locking!! |
| 182 | + r.lock(|data| *data += 1); |
| 183 | + |
| 184 | + // More Locking!! |
| 185 | + increment_in_mutex(&mut r); |
| 186 | +} |
| 187 | + |
| 188 | +// Look no RefCell and a lock!!! |
| 189 | +fn increment_in_mutex(m: &mut impl core_mutex::Mutex<Data = i32>) { |
| 190 | + m.lock(|data| *data += 1); |
| 191 | +} |
| 192 | +``` |
| 193 | + |
| 194 | +This implementation does however **allow for relocking the mutex within a lock again, so why bother** (double lock = panic in this case)? The use of `&mut self` locks has been evaluated extensively in the [RTFM framework], and is the entire basis for safe by construction applications. When pairing the `Mutex` trait with procedural macros and analysis it is possible to create deadlock and race condition free programs by construction - **but these are only possible** if we have a basis which is build on the `&mut self`. |
| 195 | + |
| 196 | +It is, in my opinion, an error that the standard library did not use this design together with the class of issues it eliminates. |
| 197 | + |
| 198 | +For a detailed analysis on `&mut self` critical sections see [this part of the RTFM book]. |
| 199 | + |
| 200 | +[Rust standard library `Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html#method.lock |
| 201 | +[RTFM framework]: https://github.com/japaric/cortex-m-rtfm |
| 202 | +[this part of the RTFM book]: https://github.com/japaric/cortex-m-rtfm/blob/2596ea0e46bec73d090d9e51d41e6c2f481b8e15/book/en/src/internals/critical-sections.md |
| 203 | + |
| 204 | +#### Why is the result of the `lock` method not a `Result<...>`? |
| 205 | + |
| 206 | +The `Mutex` trait is meant to be available so that HALs (and other crates/apps) can have a notion about a resource and unique access to it. For example, this means we can make driver which has shared access we can protect this driver via a `lock` and we can reason about safety and soundness of said driver. |
| 207 | + |
| 208 | +And here comes the problem with a `Result` based `lock` - **what does it mean to have a lock which can fail**? A lock which can fail would be difficult to use, if not impossible, to create HALs as the reason for the lock failing is outside the scope of the HAL - only an infallible lock makes sense here. Another view is what if a lock fails - what does error handling mean at that low level rather than just allowing it to boil up through the API? Or, if you have half setup a peripheral and a `lock` fails, how do you abort? How do you reason about Worst Case Execution Times? What state is the hardware left in? Suddenly drivers have to implement mitigation and error handling routines that are **system design flaws**. |
| 209 | + |
| 210 | +The next thing is, **why should a lock be fallible**? One implementation of a spin-lock mutex was found which used a `Result` return to indicate a deadlock, but what other than a panic is correct if one would get a deadlock in a driver? |
| 211 | + |
| 212 | +All these questions point to an infallible `Mutex` trait. The example of a deadlock in a driver is to simply implement this trait and `.unwrap()` in the implementation. |
| 213 | + |
| 214 | +It boils down to this philosophy: `panic` is for program error - `Result` is for user error |
| 215 | + |
| 216 | +#### Should the trait be marked `unsafe`? |
| 217 | + |
| 218 | +The trait does not need to be marked as `unsafe` as the signature of `lock` together with the Rust type system guarantees exclusive access. If there is an implementation which handles multiple access points to the same mutable reference, `unsafe` code must be used, which puts the responsibility of upholding Rust's alias rule on the implementor. |
| 219 | + |
| 220 | +#### How does one implement generic functions that understand `lock`? |
| 221 | + |
| 222 | +The following example shows how to write generic functions over the trait: |
| 223 | + |
| 224 | +```rust |
| 225 | +// A simple resource holding an `i32` |
| 226 | +fn increment_in_mutex(mtx: &mut impl core_mutex::Mutex<Data = i32>) { |
| 227 | + mtx.lock(|data| *data += 1); |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +## FAQ |
| 232 | + |
| 233 | +> Can this be used in a lock/unlock setting such as: |
| 234 | +
|
| 235 | +```rust |
| 236 | +mtx.lock(); |
| 237 | + |
| 238 | +// Do something ... |
| 239 | + |
| 240 | +mtx.unlock(); |
| 241 | +``` |
| 242 | + |
| 243 | +No. The `Mutex` trait is specifically for scoped use, i.e. the lock is defined in such a way that the resource protected by the lock is only accessible within the critical section that is the closure in the `lock` method. |
| 244 | + |
| 245 | +This means that for example if you have a peripheral driver and you want to lock access to it while it does its work, this mutex is not designed for that. However you can implement a higher level mutex on top of this one. |
| 246 | + |
| 247 | +> Should the `Mutex` trait not be in the [`embedded-hal`]? |
| 248 | +
|
| 249 | +It is our strong recommendation to **not** place it in the [`embedded-hal`] crate. |
| 250 | +If there are breaking changes in the [`embedded-hal`] it can be so that competing implementations of the Mutex trait start to exist which could bifurcate the ecosystem based on version, plus that the Mutex trait will be tied to HAL releases. |
| 251 | + |
| 252 | +# How We Teach This |
| 253 | +[how-we-teach-this]: #how-we-teach-this |
| 254 | + |
| 255 | +There will be 2 reference implementations after the acceptance of this RFC. One direct implementation based on critical sections in `cortex-m` and `risc-v` crates (that now reexport the [bare-metal mutex]) as the mutex will now be implemented in these crates and the reexport removed. And another in the [RTFM framework] which works together with procedural macros to give correct by construction resource handling and locks. |
| 256 | + |
| 257 | +# Drawbacks |
| 258 | +[drawbacks]: #drawbacks |
| 259 | + |
| 260 | +As of writing this RFC I see no drawbacks. |
| 261 | + |
| 262 | +# Alternatives |
| 263 | +[alternatives]: #alternatives |
| 264 | + |
| 265 | +The alternatives, taking `&self` and returning a `Result<...>` are discussed in [Detailed design](#detailed-design). |
| 266 | + |
| 267 | +One can also further follow the [Rust standard library `Mutex`] and its RAII pattern, however this should not be pursued due to these causes: |
| 268 | + |
| 269 | +1. The closure is explicit where it starts and ends, no need to wait for the destruction or add extra `{ ... }` around code to get predictable unlocks. |
| 270 | +2. Makes code using locks more stable under refactoring, as small changes can drastically change the scope of a lock when using RAII. |
| 271 | +3. A RAII based mutex can be `mem::forget` or `mem::swap` and all guarantees are out, the closure based mutex cannot be circumvented. |
| 272 | + * A RAII based mutex can be circumvented easily as follows: |
| 273 | + |
| 274 | +```rust |
| 275 | +// Break all guarantees |
| 276 | +let x = lock1.lock(); |
| 277 | +let y = lock2.lock(); |
| 278 | +drop(x); |
| 279 | +drop(y); |
| 280 | +``` |
| 281 | + |
| 282 | +# Unresolved questions |
| 283 | +[unresolved]: #unresolved-questions |
| 284 | + |
| 285 | +* Should the crate be named `core-mutex` or something different? |
0 commit comments