Hack together inline-always-overrides #141055

saethlin · 2025-05-15T21:30:58Z

@cbiffle pointed out to me that sometimes when size-optimizing, you really want a #[inline(always)] on some function in a dependency or the standard library and it's a chore to add that attribute in, and it would be neat to give the compiler an override. I don't think this is too hard to hack together, so here's a hack. I don't think the implementation can be more effective than this, so this draft will be useful to collect some information on whether this can even do the thing.

saethlin · 2025-05-16T15:54:30Z

@bors try

Hack together inline-always-overrides `@cbiffle` pointed out to me that sometimes when size-optimizing, you really want a `#[inline(always)]` on some function in a dependency or the standard library and it's a chore to add that attribute in, and it would be neat to give the compiler an override. I don't think this is too hard to hack together, so here's a hack. I don't think the implementation can be more effective than this, so this draft will be useful to collect some information on whether this can even do the thing.

bors · 2025-05-16T15:55:42Z

⌛ Trying commit d2221bd with merge cd55aff...

bors · 2025-05-16T18:01:22Z

☀️ Try build successful - checks-actions
Build commit: cd55aff (cd55affffd49803ab7d91c2c503737cbc5632625)

cbiffle · 2025-05-18T11:23:57Z

Alright, I built this locally from the commit bors used above (though I have also now tested it on your original commit). I'm testing using the stage1 build, in case that's significant.

Without flags, this reproduces the poor inlining I noted on the keypad:GO firmware (good!)

Building with RUSTFLAGS="-Zinline-always-overrides" complains about needing a comma-separated list of strings (good!).

(I then switched to flags in .cargo/config.toml since RUSTFLAGS can't handle spaces well.)

Adding a comma-separated list of strings causes panics building most crates, though it's of course possible I'm holding it wrong. Things that cause panics include

the empty list
main
a,b,c
The actual function I'm targeting, which is deep breath core::iter::range::<impl core::iter::traits::iterator::Iterator for core::ops::range::Range<A>>::next

It seems the call to def_path_str here is responsible but I'm not familiar enough with the API to explain why. I've attached one of the panics. Please let me know what else I can do to help.

rustc-ice-2025-05-18T11_20_04-56902.txt

(Side note: since rustc doesn't fill in the generic parameters in the debug symbols (something I would love to fix!) I'm not sure that the name I get from nm and the name rustc uses internally will match. I could try to get a more precise name from the DWARF information, which does tend to record the generic types.)

cbiffle · 2025-05-18T11:41:50Z

Oh, incidentally, if you'd like to test the repro case I'm using yourself, the firmware is

https://github.com/cbiffle/keypad-go-firmware

Built at current tip commit (5442d69) using 1.87 or current nightly, the Range iter impl I mentioned above doesn't inline, so I'm trying to fix that. You will need the thumbv6m-none-eabi target built (I added it to my bootstrap.toml).

Steps to reproduce: edit the rust-toolchain.toml to point to your built toolchain and

$ cargo build --release
$ nm -C target/thumbv6m-none-eabi/release/keybad-fw | grep Range

saethlin · 2025-05-18T17:57:51Z

Yeah, I meant to try this out then realized after you posted your scenario that figuring out the def_path_str for the iterator method is going to be gnarly. So I think I need to also add a way to dump def_path_str values or base this on mangled symbols.

saethlin · 2025-05-18T18:00:33Z

Btw you can use https://crates.io/crates/rustup-toolchain-install-master to install a try build, just be aware you can't add components after the fact so you may need to use some -c rust-src or such when you run rustup-toolchain-install-master.

cbiffle · 2025-05-19T17:45:56Z

Btw you can use https://crates.io/crates/rustup-toolchain-install-master to install a try build, just be aware you can't add components after the fact so you may need to use some -c rust-src or such when you run rustup-toolchain-install-master.

Thanks! Now that I'm set up to build the toolchain again, I'm happy just doing that -- my current laptop takes about 3-4 minutes. It's faster than downloading the trybuild archive on my current internet connection.

saethlin · 2025-05-20T21:24:10Z

With the latest commit, I can do this:

$ RUSTFLAGS=-Zinline-always-overrides="_ZN4core4iter5range101_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..Range$LT$A$GT$$GT$4next17hd12be7ae50b99c2cE" cargo +stage1 build --release

And the symbols mentioning Range are gone.

Actually wait something is afoot...

saethlin · 2025-05-20T21:53:45Z

I'm not sure why, but setting the flag seems to make every symbol in the keybad-fw executable disappear. But between my debug printing and some local testing on a program built for my host arch, things seem to work.

The flag is now a prefix; we can't use the literal entire symbol name because the suffix on them is a hash that contains all the flags that were used for the crate.

quininer · 2025-05-23T01:27:06Z

As a symmetric feature, can we ask the compiler to never inline? this is useful for debugging.

maybe it looks like this

-Zinline-overrides="always=foo1,never=foo2"

saethlin · 2025-05-23T02:03:11Z

Maybe. If you want something to not be inlined for debugging, why doesn't lowering the optimization level suffice?

quininer · 2025-05-23T02:24:28Z

I want to keep performance as high as possible. Using -O0 for debugging is too slow in large projects.

saethlin · 2025-05-23T17:32:00Z

So do you want to turn off inlining or all optimizations for a function? We have #[optimize(none)]. If you are debugging your own code, you should use that.

The specific problem Cliff pointed out to me is that a generic function defined in core wasn't being inlined. It is a very different problem to hack in extra optimization after-the-fact than to make a function debuggable but optimize the rest of a program.

In addition, this PR is not a feature proposal. That would be done by a MCP on rust-lang/compiler-team. I'm doing a single experiment here to establish a use case. I don't want to have runaway discussion here about why various things are or aren't possible in the current compiler architecture or aren't reliable because of the precompiled sysroot. I can try to provide such an explanation in a place that's suited for discussion. PRs are not.

quininer · 2025-05-23T18:01:36Z

I just want the inline turn off so it is tracked. It's probably a std or third-party library code and mark optimize(none) is very inconvenient.

I just happened to see this. sorry for insert into this thread.

cbiffle · 2025-05-23T19:51:45Z

@quininer I actually agree that this would be useful, strategic un-inlining of a routine is something I need much less often, but I have wished I had it once or twice.

Personally, applying inline(always) on uncooperative code is a higher priority, but it would be neat to have both someday. So, I don't think your suggestion was unreasonable! But I do think it's worth focusing on inline(always) for now while we figure out whether a proper proposal is appropriate. Such a proposal should probably at least mention inline(never) as a possible extension.

cbiffle · 2025-05-23T20:45:42Z

I'm not sure why, but setting the flag seems to make every symbol in the keybad-fw executable disappear.

Building that way also makes the PHDRs disappear, resulting in an ELF file that only contains the debug symbols. This is because overriding RUSTFLAGS on the command line replaces those in the .cargo/config.toml, which are unfortunately load-bearing in rust-embedded projects since they're how we specify the linker script. Without the linker script, all symbols appear dead and are collected.

Adding the overrides to .cargo/config.toml (or actually .cargo/config in this old project) produces better results, though it does generate a bit of a game of whack-a-mole. The set of symbols that must be inlined to get the behavior I was after (namely, all range iterator step functions that may influence bounds check decisions are inlined) required overriding on the following symbol prefixes:

_ZN4core4iter5range101_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..Range$LT$A$GT$$GT$4next
_ZN89_$LT$core..ops..range..Range$LT$T$GT$$u20$as$u20$core..iter..range..RangeIteratorImpl$GT$9spec_next
_ZN4core4iter5range110_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..RangeInclusive$LT$A$GT$$GT$4next
_ZN107_$LT$core..ops..range..RangeInclusive$LT$T$GT$$u20$as$u20$core..iter..range..RangeInclusiveIteratorImpl$GT$9spec_next

...owing to how the range iterator impls are actually written. With these overrides, it does appear to work!

As an example of a case where the approach doesn't appear to work, the resulting binary contains some outlined routines for logical_shr and wrapped_shr for integers. I believe these are being included because they're inserted late in the codegen process -- they're only used indirectly by the compiler builtins that are used to implement 64-bit shifts. Inlining these routines would save about 40 bytes and many instructions, since in all cases they're (1) setting up a stack frame and (2) repeating a mask of the right-hand shift-distance argument that isn't necessary given the actual parameter values. However, no pattern I can come up with will apply an override to these routines. Perhaps that's expected (or an unavoidable consequence of the approach).

saethlin · 2025-05-24T02:59:43Z

However, no pattern I can come up with will apply an override to these routines. Perhaps that's expected (or an unavoidable consequence of the approach).

This is an unavoidable limitation of the precompiled sysroot. The override can only change inlining of instantiations that are done with the override flag provided. I'm saying instantiations not compilations, because you could get a generic or #[inline] function which is instantiated as part of compiling the sysroot that ends up in the binary not inlined, but then if you instantiate the same function again as part of compiling your own dependency graph, that second instantiation will get the override. So it can also look like the flag partly worked.

compiler-builtins is a slightly special case. All of its public functions are #[no_mangle] which means they need to be instantiated once in the defining crate, so all inlining decisions about their call graphs are frozen in when the sysroot is built.

That being said, all the symbols that compiler-builtins exports are magic intrinsics known to LLVM. So in theory some optimization can be done without looking at the implementation of the intrinsics. You'd just have to be really sure the optimization will be profitable.

cbiffle · 2025-05-24T14:09:25Z

The action on instantiations makes sense to me, thanks for explaining that.

From my perspective, the next step is to put some miles on this with different programs and learn why it's wrong / how it fails / etc. I'll poke around my projects directory and see if I've got some suitable tests.

(I'm basically ignoring UI issues right now, assuming they could be fixed if we decide the feature is good. e.g. prefix match on mangled symbol names is probably not quite right, but that's also just one line of the patch, and can probably be redone when/if required.)

I'm concerned that a lot of cases I'm interested in might run into the "precompiled sysroot" limitation, particularly around builtins. But I am hoping to prove myself wrong there!

Thanks for prototyping this!

Hack together inline-always-overrides

d2221bd

rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 15, 2025

Use a much simpler hack: just consult the flag inside cg_llvm

0dfc4d7

Make overrides a prefix and add a debug print

d94c483

saethlin added S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. and removed A-attributes Area: Attributes (`#[…]`, `#![…]`) labels May 23, 2025

Hack together inline-always-overrides #141055

Are you sure you want to change the base?

Hack together inline-always-overrides #141055

Uh oh!

Conversation

saethlin commented May 15, 2025

Uh oh!

saethlin commented May 16, 2025

Uh oh!

bors commented May 16, 2025

Uh oh!

bors commented May 16, 2025

Uh oh!

cbiffle commented May 18, 2025

Uh oh!

cbiffle commented May 18, 2025

Uh oh!

saethlin commented May 18, 2025

Uh oh!

saethlin commented May 18, 2025

Uh oh!

cbiffle commented May 19, 2025

Uh oh!

saethlin commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saethlin commented May 20, 2025

Uh oh!

quininer commented May 23, 2025

Uh oh!

saethlin commented May 23, 2025

Uh oh!

quininer commented May 23, 2025

Uh oh!

saethlin commented May 23, 2025

Uh oh!

quininer commented May 23, 2025

Uh oh!

cbiffle commented May 23, 2025

Uh oh!

cbiffle commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saethlin commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbiffle commented May 24, 2025

Uh oh!

Uh oh!

saethlin commented May 20, 2025 •

edited

Loading

cbiffle commented May 23, 2025 •

edited

Loading

saethlin commented May 24, 2025 •

edited

Loading