-
Notifications
You must be signed in to change notification settings - Fork 207
Make more ringbufs zero-initialized, saving flash #2168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
For what it's worth, [if the enum has a repr
attribute, you can also explicitly set the discriminant for that variant to 0
][1]:
#[derive(Copy, Clone, PartialEq)]
#[repr(u8)]
enum Trace {
None = 0,
Whatever(u64)
}
or similar. IMO this might be worth considering since it feels slightly less like Hoping The Compiler Does The Right Thing Because We Put Them In Order, but 🤷♀️.
It might be worth adding a note about this in the ringbuf
docs, so that folks adding new ringbufs in the future don't mess it up?
Hmmmm. Doesn't the repr attribute affect niche optimization? I'm concerned it might inflate the ringbuf statics. |
Mmm, it might --- the current thing is fine, we can look into whether the explicit discriminant is worth doing separately. |
It does not, assuming you don't have enums like this: #[repr(u8)]
enum A {
None,
Some(B),
}
#[repr(u8)]
enum B {
X,
Y,
} I think here the repr enum layout might guarantee that X is 0 and Y is 1, which then means that the compiler cannot "flatten" A to have values 0/1/2. I was also going to suggest considering repr enums: it would be a little easier to look for "mistakes" like this by searching for " None," in the codebase. |
I think if I'm going to put much more thought into it, I'd rather eliminate the need for a sentinel value and default ringbufs to uninit. Having a |
f301ea0
to
1b13a51
Compare
Yup, agreed , making them uninitialized would be a nicer solution. Let's do that at some point. |
That definitely sounds like the correct way to solve the whole problem, and not even too hard to do. Out of curiosity though, would you have a temporary counter for unset values to avoid reading the uninitialised data? Or would you somehow try to force ringbuf enums to not use the 0 value, so the bss 0 initial values could be read as uninitiated dynamically? |
@aapoalas there are a couple options -- keeping a size that saturates at the size of the table would do it, for example. That'd cost one more word of memory, but it's probably worth it. There are alternatives that could let you pack this information into the existing "position" counter used to track the position in the circular buffer, but they'd save at most one word, and are potentially fragile. (Example: use |
1b13a51
to
0083ba6
Compare
Thank you for the answer, much appreciated <3 I was actually just going to edit my comment as I was looking at the internals of ringbuf and realised I hadn't understood that there's metadata on each entry as well. That kind of changes the equation a little: one alternative would then be to use an |
To simplify its implementation, ringbuf currently requires every ringbuf to be declared with an initialization element. It creates a static array containing that element. If that element is represented as zero, that goes in BSS and is zero-initialized with the rest of BSS. If that element is represented as anything other than zero, the compiler creates a full-sized "initialization image" copy of the ringbuf _in flash_ that it can then memcpy into RAM on startup. Using the default Rust enum representation, the easiest way to get a ringbuf initialized to non-zero bytes is to put its initialization sentinel value (e.g. None) as the last variant in the enum. Unfortunately, this is super common. This commit moves all the ones I could find, which should reduce flash usage in a bunch of cases. A survey of examples in the Cosmo build (note: many of these savings are not fully realized due to pow2 alignment restrictions): - Cosmo Seq: -6328 - Spartan7 Loader: -2712 - I2C: 2488 bytes - Every SPI driver: -1068 (so, -2136 total) - Hiffy: -1564 - Sprot: -1076 - Update: -1036 - Validate: -1036 - Power: -559 - Thermal: -212 - Dump agent: -12 (...?)
0083ba6
to
88872f5
Compare
Currently, Hubris' Rust toolchain is pinned to `nightly-2024-09-17`, which is from 10 months ago. This is kind of unfortunate, especially because that nightly is too old to support the Rust 2024 edition, which means we cannot update our dependencies on any crates where the latest version is built with Rust 2024. Beyond just updating the toolchain, there were some noteworthy changes: - Naked functions are now stable (yay!), but the attribute changed to `#[unsafe(naked)]`. Inline assembly in naked functions must now use `core::arch::naked_asm!` rather than normal `asm!`. As far as I can tell, the only difference between this and regular `asm!` is that it does not support `options(noreturn)`, as I believe the `naked_asm!` block kind of implies at least some of the codegen differences for `noreturn`. - New warnings on creation of temporary shared references to mutable statics showed up in `stm32h7-update-server`, where we were using `[T]::as_ptr()` on zero-sized slices in mutable statics that were used to get linker-generated addresses. `[T]::as_ptr()` takes an `&self` receiver, so this was generating a temporary shared reference to the mutable static. I changed this to use `&raw const`, which takes the address of the static without creating a shared reference. - There was a substantial regression in flash and RAM usage on the new toolchain due to [a change in the behavior of the `#[used]` attribute][1] which revealed [an underlying issue where ringbufs were not zero initialized][2]. These issues were resolved separately in #2167, #2168, #2170, and oxidecomputer/idolatry#65. In addition, there were a variety of unremarkable linting changes, including slightly better dead code detection (which detected some new dead code), and some annoying clippy nonsense. Note that this branch requires oxidecomputer/idolatry#65, which updates `idol` to generate code that doesn't emit warnings with the new toolchain, and fixes some of the flash/RAM size regression in code generated by `idol`. Fixes #2165 [1]: #2165 (comment) [2]: #2165 (comment)
Currently, Hubris' Rust toolchain is pinned to `nightly-2024-09-17`, which is from 10 months ago. This is kind of unfortunate, especially because that nightly is too old to support the Rust 2024 edition, which means we cannot update our dependencies on any crates where the latest version is built with Rust 2024. Beyond just updating the toolchain, there were some noteworthy changes: - Naked functions are now stable (yay!), but the attribute changed to `#[unsafe(naked)]`. Inline assembly in naked functions must now use `core::arch::naked_asm!` rather than normal `asm!`. As far as I can tell, the only difference between this and regular `asm!` is that it does not support `options(noreturn)`, as I believe the `naked_asm!` block kind of implies at least some of the codegen differences for `noreturn`. - New warnings on creation of temporary shared references to mutable statics showed up in `stm32h7-update-server`, where we were using `[T]::as_ptr()` on zero-sized slices in mutable statics that were used to get linker-generated addresses. `[T]::as_ptr()` takes an `&self` receiver, so this was generating a temporary shared reference to the mutable static. I changed this to use `&raw const`, which takes the address of the static without creating a shared reference. - There was a substantial regression in flash and RAM usage on the new toolchain due to [a change in the behavior of the `#[used]` attribute][1] which revealed [an underlying issue where ringbufs were not zero initialized][2]. These issues were resolved separately in oxidecomputer#2167, oxidecomputer#2168, oxidecomputer#2170, and oxidecomputer/idolatry#65. In addition, there were a variety of unremarkable linting changes, including slightly better dead code detection (which detected some new dead code), and some annoying clippy nonsense. Note that this branch requires oxidecomputer/idolatry#65, which updates `idol` to generate code that doesn't emit warnings with the new toolchain, and fixes some of the flash/RAM size regression in code generated by `idol`. Fixes oxidecomputer#2165 [1]: oxidecomputer#2165 (comment) [2]: oxidecomputer#2165 (comment)
To simplify its implementation, ringbuf currently requires every ringbuf to be declared with an initialization element. It creates a static array containing that element.
If that element is represented as zero, that goes in BSS and is zero-initialized with the rest of BSS.
If that element is represented as anything other than zero, the compiler creates a full-sized "initialization image" copy of the ringbuf in flash that it can then memcpy into RAM on startup.
Using the default Rust enum representation, the easiest way to get a ringbuf initialized to non-zero bytes is to put its initialization sentinel value (e.g. None) as the last variant in the enum. Unfortunately, this is super common.
This commit moves all the ones I could find, which should reduce flash usage in a bunch of cases.