Skip to content

Conversation

MichalPetryka
Copy link
Contributor

Try enabling more aggressive inlining to see TP diffs.

cc @EgorBo

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 1, 2025
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 1, 2025
@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 1, 2025
@EgorBo
Copy link
Member

EgorBo commented Jul 1, 2025

@MichalPetryka nice tp diffs! What is the size diff for the resulting binaries?

@EgorBo
Copy link
Member

EgorBo commented Jul 1, 2025

From what I see it adds ~10% to binary size for JIT (x64 and arm64) for ~3% TP improvement.

@dotnet/jit-contrib @jkotas opinions?

@jakobbotsch
Copy link
Member

It's not clear what the impact of something like this will be when PGO is enabled (which it isn't in the superpmi runs).
I would try running some local runs with PGO enabled too and seeing how it affects size and TP.

@vzarytovskii
Copy link
Member

vzarytovskii commented Jul 1, 2025

It's not clear what the impact of something like this will be when PGO is enabled (which it isn't in the superpmi runs). I would try running some local runs with PGO enabled too and seeing how it affects size and TP.

PGO would ignore /Ob3

@EgorBo
Copy link
Member

EgorBo commented Jul 1, 2025

Yeah this matches my observation that only alt jits were affected, coreclr.dll and clrjit.dll were not affected (same size).

@jkotas
Copy link
Member

jkotas commented Jul 1, 2025

Yeah this matches my observation that only alt jits were affected, coreclr.dll and clrjit.dll were not affected (same size).

I can see two justifications for this change:

  • It may make altjits throughput to have better fidelity with regular jit throughput optimized with PGO. What do we use aljit throughput numbers for? Do the numbers suggest that this change improves fidelity between regular jit throughput and altjit throughput?

  • It may improve throughput of AOT compilers. I would like to see some numbers to demonstrate that there is an observable improvement in AOT compiler wall-clock time with this change (e.g. on the TodosApi-windows used by https://github.com/MichalStrehovsky/rt-sz). The instruction count improvement does not necessarily translate to wall-clock time improvement with a change like this.

In any case, I do not think we want to make this change for checked JIT. It would make checked JIT harder to debug.

@JulieLeeMSFT
Copy link
Member

I do not think we want to make this change for checked JIT. It would make checked JIT harder to debug.

@MichalPetryka, please address this feedback.

@JulieLeeMSFT JulieLeeMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 28, 2025
@MichalPetryka
Copy link
Contributor Author

I do not think we want to make this change for checked JIT. It would make checked JIT harder to debug.

Wouldn't having different configuration between Checked and Release also cause issues by potentially causing configuration specific bugs?

@dotnet-policy-service dotnet-policy-service bot removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Aug 13, 2025
@jkotas
Copy link
Member

jkotas commented Aug 13, 2025

Wouldn't having different configuration between Checked and Release also cause issues by potentially causing configuration specific bugs?

Checked and Release compile very different code due to all ifdef DEBUG. We do introduce bugs every once a while that make checked and release observable behaviors to not match, and I believe we have outer loop tests to catch these.

I am not sure why it is relevant here.

@MichalPetryka
Copy link
Contributor Author

@jkotas @EgorBo Can this be merged then if you're ok with Release only?


void Compiler::impPushOnStack(GenTree* tree, typeInfo ti)
{
// dummy diff to trigger SPMI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// dummy diff to trigger SPMI

@EgorBo
Copy link
Member

EgorBo commented Sep 4, 2025

@jkotas @EgorBo Can this be merged then if you're ok with Release only?

I'm just confused what it does then. It seems to be improving the alt jits only which can result in faster AOT compilation, but does it mean we just don't provide Native PGO to those? Since Native PGO effectively makes Ob3 no-op

@EgorBo
Copy link
Member

EgorBo commented Sep 4, 2025

Not opposite of taking this, leaving for @jkotas to comment
Re-using native pgo collected for clrjit may (or may not) work for alt jits - not sure if it works that way in MSVC/Clang

@jkotas
Copy link
Member

jkotas commented Sep 4, 2025

Not opposite of taking this, leaving for @jkotas to comment

This is performance related change. I would like to see some numbers for what it is improving.

Performance related changes without numbers have high probability of being performance regressions in practice.

@EgorBo EgorBo added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 4, 2025
@EgorBo
Copy link
Member

EgorBo commented Sep 4, 2025

Not opposite of taking this, leaving for @jkotas to comment

This is performance related change. I would like to see some numbers for what it is improving.

Performance related changes without numbers have high probability of being performance regressions in practice.

Spmi TP-diff reports significant wins ("instructions retired" which in most cases is correlated with the performance) and I assume it's not because of less inlines (this mode should be more aggressive), but I agree it'd nice to measure something

@AndyAyersMS
Copy link
Member

Not opposite of taking this, leaving for @jkotas to comment

This is performance related change. I would like to see some numbers for what it is improving.
Performance related changes without numbers have high probability of being performance regressions in practice.

Spmi TP-diff reports significant wins ("instructions retired" which in most cases is correlated with the performance) and I assume it's not because of less inlines (this mode should be more aggressive), but I agree it'd nice to measure something

Don't we use non-PGO release builds for those TP-diff jits (to try and mitigate the effects of possibly mismatched PGO in the diff)? So it's not really telling us what product impact would be.

My experience is that once you enable PGO, extra tweaking like this is rarely an improvement. We should ask the MSVC team what they'd recommend.

@EgorBo
Copy link
Member

EgorBo commented Sep 5, 2025

Don't we use non-PGO release builds for those TP-diff jits

@AndyAyersMS yes, but it seems we don't apply it for altjit because when I applied this fix locally and built with Release (and didn't disable NativePGO) only atljits were affected. coreclr.dll and clrjit.dll were intact.

Copy link
Contributor

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants