-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add inline(always)
to a few functions that end up in opt-level=z
compiled output
#96624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
This does seem to fix the issue (for these functions), so lets if it's too costly to go through with. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
It looks like the functions get created with external rather than internal linkage when optimizing for size. |
This is probably the responsible code:
|
I suspect that this is just a bug, and the intention here was to make this only apply to debug builds, but it ended up applying to size-optimized builds as well. Though possibly size-optimized builds should be using linkonce_odr linkage rather than internal linkage, to allow trivial linker-deduplication across codegen units and give less inlining bonus. |
The impact here might also be different if #91743 lands, enabling MIR-level inlining. |
If this issue is just a bug, no need to take this change. |
I just came back to this, and my analysis of why this happens wasn't quite right. The actual culprit is share-generics, which is enabled by default when optimizing for size: rust/compiler/rustc_session/src/config.rs Line 778 in 877877a
This can be easily verified using So apparently this is working as intended? Can't optimize it away if it's exported for use by downstream crates. Presumably this does not happen when building an executable. |
I've noticed these repeated many1 times in a projects
-Copt-level=z
output (rustc seems to manage to remove them for 2 and 3, but notz
. I haven't checkeds
). You can see this behavior in an isolated case here: https://rust.godbolt.org/z/ThMrGeb6v. Note that I believe this only happens in the asm and static outputs (and presumably stuff like IR and the like, but I don't think this would help IR, since it still hasinline(always)
functions present).Anyway, as you might note, the functions do actually get inlined. They just... end up in the compiled output anyway. Someone explained to me why this happened at some point, but I've since forgotten it. Either way, in practice this has bad behavior with tooling (it confuses anything that looks at assembly or staticlib output), although this is probably not a huge deal, since I imagine that the linker which strips dead code is able to figure this out and remove them (and presumably users without such linker functionality have larger problems when it comes to code size).
So, in practice, this probably doesn't actually matter, although it leads to some... odd behaviors in addition to the tooling weirdness. For example, on
aarch64
, the machine-outliner pass (which isn't enabled on x86_64 for whatever reason -- perhaps it only supports aarch64) even bothers to go through the trouble of deduplicating many hundreds of these functions. This sort of thing seems... goofy, but is hopefully harmless.Anyway, some of these pretty important and common functions, so this definitely needs a perf run before landing. Like, Deref/DerefMut for Vec is really important, and has a real chance of causing some... changes in terms of perf. So I'll be doing a perf run for this (after tests pass to make sure there's no major fumbles). That said, if it's too costly, I might try again without the patch adding
Deref
-- since that would still be an improvement.P.S.
x install
is still running, and so I haven't verified this actually fixes the issue -- I'm just assuming it will. That said, given that I don't really remember why these are showing up in the first place, I could be mistaken. Sorry if this is the case -- It actually seems somewhat plausible to me so I'll wait to ensure this before kicking off the perf run.Footnotes
In smaller project I had around 30 copies of
TypeId::of
, over 100 copies of<Vec<T> as AsRef<[T]>>::as_ref
, and even more forDeref
-- tragically I've deleted the file with the counts (I'll get another one shortly, after thex install
finishes and my computer returns to a usable state), but I think it was around 150 for each of the Derefs. ↩