Skip to content

Add inline(always) to a few functions that end up in opt-level=z compiled output #96624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

thomcc
Copy link
Member

@thomcc thomcc commented May 2, 2022

I've noticed these repeated many1 times in a projects -Copt-level=z output (rustc seems to manage to remove them for 2 and 3, but not z. I haven't checked s). You can see this behavior in an isolated case here: https://rust.godbolt.org/z/ThMrGeb6v. Note that I believe this only happens in the asm and static outputs (and presumably stuff like IR and the like, but I don't think this would help IR, since it still has inline(always) functions present).

Anyway, as you might note, the functions do actually get inlined. They just... end up in the compiled output anyway. Someone explained to me why this happened at some point, but I've since forgotten it. Either way, in practice this has bad behavior with tooling (it confuses anything that looks at assembly or staticlib output), although this is probably not a huge deal, since I imagine that the linker which strips dead code is able to figure this out and remove them (and presumably users without such linker functionality have larger problems when it comes to code size).

So, in practice, this probably doesn't actually matter, although it leads to some... odd behaviors in addition to the tooling weirdness. For example, on aarch64, the machine-outliner pass (which isn't enabled on x86_64 for whatever reason -- perhaps it only supports aarch64) even bothers to go through the trouble of deduplicating many hundreds of these functions. This sort of thing seems... goofy, but is hopefully harmless.

Anyway, some of these pretty important and common functions, so this definitely needs a perf run before landing. Like, Deref/DerefMut for Vec is really important, and has a real chance of causing some... changes in terms of perf. So I'll be doing a perf run for this (after tests pass to make sure there's no major fumbles). That said, if it's too costly, I might try again without the patch adding Deref -- since that would still be an improvement.

P.S. x install is still running, and so I haven't verified this actually fixes the issue -- I'm just assuming it will. That said, given that I don't really remember why these are showing up in the first place, I could be mistaken. Sorry if this is the case -- It actually seems somewhat plausible to me so I'll wait to ensure this before kicking off the perf run.

Footnotes

  1. In smaller project I had around 30 copies of TypeId::of, over 100 copies of <Vec<T> as AsRef<[T]>>::as_ref, and even more for Deref -- tragically I've deleted the file with the counts (I'll get another one shortly, after the x install finishes and my computer returns to a usable state), but I think it was around 150 for each of the Derefs.

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 2, 2022
@rust-highfive
Copy link
Contributor

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Contributor

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2022
@thomcc
Copy link
Member Author

thomcc commented May 2, 2022

This does seem to fix the issue (for these functions), so lets if it's too costly to go through with.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 2, 2022
@nikic
Copy link
Contributor

nikic commented May 2, 2022

It looks like the functions get created with external rather than internal linkage when optimizing for size.

@nikic
Copy link
Contributor

nikic commented May 2, 2022

This is probably the responsible code:

pub fn instantiation_mode(&self, tcx: TyCtxt<'tcx>) -> InstantiationMode {

@nikic
Copy link
Contributor

nikic commented May 2, 2022

I suspect that this is just a bug, and the intention here was to make this only apply to debug builds, but it ended up applying to size-optimized builds as well.

Though possibly size-optimized builds should be using linkonce_odr linkage rather than internal linkage, to allow trivial linker-deduplication across codegen units and give less inlining bonus.

@scottmcm
Copy link
Member

scottmcm commented May 2, 2022

The impact here might also be different if #91743 lands, enabling MIR-level inlining.

@thomcc
Copy link
Member Author

thomcc commented May 2, 2022

If this issue is just a bug, no need to take this change.

@nikic
Copy link
Contributor

nikic commented Sep 30, 2022

I just came back to this, and my analysis of why this happens wasn't quite right. The actual culprit is share-generics, which is enabled by default when optimizing for size:

OptLevel::No | OptLevel::Less | OptLevel::Size | OptLevel::SizeMin => true,

This can be easily verified using -Z share-generics=no: https://rust.godbolt.org/z/fx67dqW84

So apparently this is working as intended? Can't optimize it away if it's exported for use by downstream crates. Presumably this does not happen when building an executable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-perf Status: Waiting on a perf run to be completed. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants