Skip to content

Hack together inline-always-overrides #141055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

saethlin
Copy link
Member

@cbiffle pointed out to me that sometimes when size-optimizing, you really want a #[inline(always)] on some function in a dependency or the standard library and it's a chore to add that attribute in, and it would be neat to give the compiler an override. I don't think this is too hard to hack together, so here's a hack. I don't think the implementation can be more effective than this, so this draft will be useful to collect some information on whether this can even do the thing.

@rustbot rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 15, 2025
@saethlin
Copy link
Member Author

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request May 16, 2025
Hack together inline-always-overrides

`@cbiffle` pointed out to me that sometimes when size-optimizing, you really want a `#[inline(always)]` on some function in a dependency or the standard library and it's a chore to add that attribute in, and it would be neat to give the compiler an override. I don't think this is too hard to hack together, so here's a hack. I don't think the implementation can be more effective than this, so this draft will be useful to collect some information on whether this can even do the thing.
@bors
Copy link
Collaborator

bors commented May 16, 2025

⌛ Trying commit d2221bd with merge cd55aff...

@bors
Copy link
Collaborator

bors commented May 16, 2025

☀️ Try build successful - checks-actions
Build commit: cd55aff (cd55affffd49803ab7d91c2c503737cbc5632625)

@cbiffle
Copy link
Contributor

cbiffle commented May 18, 2025

Alright, I built this locally from the commit bors used above (though I have also now tested it on your original commit). I'm testing using the stage1 build, in case that's significant.

Without flags, this reproduces the poor inlining I noted on the keypad:GO firmware (good!)

Building with RUSTFLAGS="-Zinline-always-overrides" complains about needing a comma-separated list of strings (good!).

(I then switched to flags in .cargo/config.toml since RUSTFLAGS can't handle spaces well.)

Adding a comma-separated list of strings causes panics building most crates, though it's of course possible I'm holding it wrong. Things that cause panics include

  • the empty list
  • main
  • a,b,c
  • The actual function I'm targeting, which is deep breath core::iter::range::<impl core::iter::traits::iterator::Iterator for core::ops::range::Range<A>>::next

It seems the call to def_path_str here is responsible but I'm not familiar enough with the API to explain why. I've attached one of the panics. Please let me know what else I can do to help.

rustc-ice-2025-05-18T11_20_04-56902.txt

(Side note: since rustc doesn't fill in the generic parameters in the debug symbols (something I would love to fix!) I'm not sure that the name I get from nm and the name rustc uses internally will match. I could try to get a more precise name from the DWARF information, which does tend to record the generic types.)

@cbiffle
Copy link
Contributor

cbiffle commented May 18, 2025

Oh, incidentally, if you'd like to test the repro case I'm using yourself, the firmware is

https://github.com/cbiffle/keypad-go-firmware

Built at current tip commit (5442d69) using 1.87 or current nightly, the Range iter impl I mentioned above doesn't inline, so I'm trying to fix that. You will need the thumbv6m-none-eabi target built (I added it to my bootstrap.toml).

Steps to reproduce: edit the rust-toolchain.toml to point to your built toolchain and

$ cargo build --release
$ nm -C target/thumbv6m-none-eabi/release/keybad-fw | grep Range

@saethlin
Copy link
Member Author

Yeah, I meant to try this out then realized after you posted your scenario that figuring out the def_path_str for the iterator method is going to be gnarly. So I think I need to also add a way to dump def_path_str values or base this on mangled symbols.

@saethlin
Copy link
Member Author

Btw you can use https://crates.io/crates/rustup-toolchain-install-master to install a try build, just be aware you can't add components after the fact so you may need to use some -c rust-src or such when you run rustup-toolchain-install-master.

@cbiffle
Copy link
Contributor

cbiffle commented May 19, 2025

Btw you can use https://crates.io/crates/rustup-toolchain-install-master to install a try build, just be aware you can't add components after the fact so you may need to use some -c rust-src or such when you run rustup-toolchain-install-master.

Thanks! Now that I'm set up to build the toolchain again, I'm happy just doing that -- my current laptop takes about 3-4 minutes. It's faster than downloading the trybuild archive on my current internet connection.

@saethlin
Copy link
Member Author

saethlin commented May 20, 2025

With the latest commit, I can do this:

$ RUSTFLAGS=-Zinline-always-overrides="_ZN4core4iter5range101_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..Range$LT$A$GT$$GT$4next17hd12be7ae50b99c2cE" cargo +stage1 build --release

And the symbols mentioning Range are gone.

Actually wait something is afoot...

@saethlin
Copy link
Member Author

I'm not sure why, but setting the flag seems to make every symbol in the keybad-fw executable disappear. But between my debug printing and some local testing on a program built for my host arch, things seem to work.

The flag is now a prefix; we can't use the literal entire symbol name because the suffix on them is a hash that contains all the flags that were used for the crate.

@quininer
Copy link
Contributor

As a symmetric feature, can we ask the compiler to never inline? this is useful for debugging.

maybe it looks like this

-Zinline-overrides="always=foo1,never=foo2"

@saethlin
Copy link
Member Author

Maybe. If you want something to not be inlined for debugging, why doesn't lowering the optimization level suffice?

@quininer
Copy link
Contributor

I want to keep performance as high as possible. Using -O0 for debugging is too slow in large projects.

@saethlin
Copy link
Member Author

So do you want to turn off inlining or all optimizations for a function? We have #[optimize(none)]. If you are debugging your own code, you should use that.

The specific problem Cliff pointed out to me is that a generic function defined in core wasn't being inlined. It is a very different problem to hack in extra optimization after-the-fact than to make a function debuggable but optimize the rest of a program.

In addition, this PR is not a feature proposal. That would be done by a MCP on rust-lang/compiler-team. I'm doing a single experiment here to establish a use case. I don't want to have runaway discussion here about why various things are or aren't possible in the current compiler architecture or aren't reliable because of the precompiled sysroot. I can try to provide such an explanation in a place that's suited for discussion. PRs are not.

@saethlin saethlin added S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. and removed A-attributes Area: Attributes (`#[…]`, `#![…]`) labels May 23, 2025
@quininer
Copy link
Contributor

I just want the inline turn off so it is tracked. It's probably a std or third-party library code and mark optimize(none) is very inconvenient.

I just happened to see this. sorry for insert into this thread.

@cbiffle
Copy link
Contributor

cbiffle commented May 23, 2025

@quininer I actually agree that this would be useful, strategic un-inlining of a routine is something I need much less often, but I have wished I had it once or twice.

Personally, applying inline(always) on uncooperative code is a higher priority, but it would be neat to have both someday. So, I don't think your suggestion was unreasonable! But I do think it's worth focusing on inline(always) for now while we figure out whether a proper proposal is appropriate. Such a proposal should probably at least mention inline(never) as a possible extension.

@cbiffle
Copy link
Contributor

cbiffle commented May 23, 2025

I'm not sure why, but setting the flag seems to make every symbol in the keybad-fw executable disappear.

Building that way also makes the PHDRs disappear, resulting in an ELF file that only contains the debug symbols. This is because overriding RUSTFLAGS on the command line replaces those in the .cargo/config.toml, which are unfortunately load-bearing in rust-embedded projects since they're how we specify the linker script. Without the linker script, all symbols appear dead and are collected.

Adding the overrides to .cargo/config.toml (or actually .cargo/config in this old project) produces better results, though it does generate a bit of a game of whack-a-mole. The set of symbols that must be inlined to get the behavior I was after (namely, all range iterator step functions that may influence bounds check decisions are inlined) required overriding on the following symbol prefixes:

  • _ZN4core4iter5range101_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..Range$LT$A$GT$$GT$4next
  • _ZN89_$LT$core..ops..range..Range$LT$T$GT$$u20$as$u20$core..iter..range..RangeIteratorImpl$GT$9spec_next
  • _ZN4core4iter5range110_$LT$impl$u20$core..iter..traits..iterator..Iterator$u20$for$u20$core..ops..range..RangeInclusive$LT$A$GT$$GT$4next
  • _ZN107_$LT$core..ops..range..RangeInclusive$LT$T$GT$$u20$as$u20$core..iter..range..RangeInclusiveIteratorImpl$GT$9spec_next

...owing to how the range iterator impls are actually written. With these overrides, it does appear to work!

As an example of a case where the approach doesn't appear to work, the resulting binary contains some outlined routines for logical_shr and wrapped_shr for integers. I believe these are being included because they're inserted late in the codegen process -- they're only used indirectly by the compiler builtins that are used to implement 64-bit shifts. Inlining these routines would save about 40 bytes and many instructions, since in all cases they're (1) setting up a stack frame and (2) repeating a mask of the right-hand shift-distance argument that isn't necessary given the actual parameter values. However, no pattern I can come up with will apply an override to these routines. Perhaps that's expected (or an unavoidable consequence of the approach).

@saethlin
Copy link
Member Author

saethlin commented May 24, 2025

However, no pattern I can come up with will apply an override to these routines. Perhaps that's expected (or an unavoidable consequence of the approach).

This is an unavoidable limitation of the precompiled sysroot. The override can only change inlining of instantiations that are done with the override flag provided. I'm saying instantiations not compilations, because you could get a generic or #[inline] function which is instantiated as part of compiling the sysroot that ends up in the binary not inlined, but then if you instantiate the same function again as part of compiling your own dependency graph, that second instantiation will get the override. So it can also look like the flag partly worked.

compiler-builtins is a slightly special case. All of its public functions are #[no_mangle] which means they need to be instantiated once in the defining crate, so all inlining decisions about their call graphs are frozen in when the sysroot is built.

That being said, all the symbols that compiler-builtins exports are magic intrinsics known to LLVM. So in theory some optimization can be done without looking at the implementation of the intrinsics. You'd just have to be really sure the optimization will be profitable.

@cbiffle
Copy link
Contributor

cbiffle commented May 24, 2025

The action on instantiations makes sense to me, thanks for explaining that.

From my perspective, the next step is to put some miles on this with different programs and learn why it's wrong / how it fails / etc. I'll poke around my projects directory and see if I've got some suitable tests.

(I'm basically ignoring UI issues right now, assuming they could be fixed if we decide the feature is good. e.g. prefix match on mangled symbol names is probably not quite right, but that's also just one line of the patch, and can probably be redone when/if required.)

I'm concerned that a lot of cases I'm interested in might run into the "precompiled sysroot" limitation, particularly around builtins. But I am hoping to prove myself wrong there!

Thanks for prototyping this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants