Skip to content

Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period #11610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

JohnTortugo
Copy link
Contributor

Closes: #11045

Please review this patch that modifies Truffle to:

  1. Reset a CallTarget's profile when the nmethod associated with it was invalidated by HotSpot because the Code Cache's method flushing heuristics deemed the nmethod to be cold.

  2. Limit the number of compilations that a CallTarget can have within a time period. This doesn't change the current behavior, instead it just makes the constraint more flexible. I added a new parameter, MaximumCompilationsWindow, that the user can set to specify a time window within which the number of compilations will be limited (currently the window is whole duration of application execution).

Tests:

  • OSX AArch64 with LabsJDK "tip" using "mx gate"
  • Linux x65 with LabsJDK "tip" using "mx gate"
  • GitHub Actions
  • We have been running an internal system with similar changes for months and we didn't notice any regression.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 12, 2025
@JohnTortugo JohnTortugo changed the title Detect cold method invalidation & reprofile. Limit on number of compilations Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period Jul 12, 2025
@JohnTortugo
Copy link
Contributor Author

Can someone please take a look? /cc @chumer @dougxc ?

@dougxc
Copy link
Member

dougxc commented Jul 22, 2025

Looks reasonable to me but a more detailed review should be done by @tzezula or @chumer.

boolean isValid = installedCode.isValid();
if (!isValid && installedCode != INVALID_CODE) {
if (getInvalidationReason() == ((HotSpotTruffleRuntime) runtime()).getColdMethodInvalidationReason()) {
invalidateExistingCode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I am a fan of this solution. How do we know isValid() is called often enough?
Also it is surprising that isValid() resets the compilation profile and even can call listener methods (listeners could call into isValid() too!)

What did you consider as alternatives here?
Can you explain when / how often this is invoked? Can we call this maybe between compilations on compiler threads?

I don't think we should use this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know isValid() is called often enough?

Do you mean often enough so that we detect "soon" that the call target was invalidated? My understanding is that there is no requirement for how "soon" we detect the invalidation. We just need to eventually detect it and since the isValid method is always called before a new compilation I decided to implement things like this.

... listeners could call into isValid() too!

The existing code is invalidated before onCompilationDeoptimized is called, which prevents a recursion.

Can we call this maybe between compilations on compiler threads?

I'm not sure what's your suggestion here. Do you mean iterate on all current compiled call targets to see if any was invalidated?

Can't say I am a fan of this solution.

Please, let me know your suggestions!

Copy link
Contributor Author

@JohnTortugo JohnTortugo Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to reset the CT (Call Target) profile when its INSTALLED_CODE was invalidated by HotSpot because it was cold. This needs to happen before we decide to compile the CT again. Realistically speaking this kind of invalidation can only happen between executions, not during an execution.

JVMCI is not able to call a method in Truffle or GraalJIT. Therefore, it cannot downcall to notify that the CT was invalidated. Thus, the identification that the CT was invalidated is a passive event that happens in Truffle.

Approaches that we can use to detect that the CT was invalidated:

  • Check before every execution of the CT in interpreter mode; before we check if the CT should be compiled.

    • Pros:
      • Relatively simple to implement.
      • This is a direct way to achieve what we want: prevent recompilation of cold method.
    • Cons:
      • Executed every time that we want to execute the CT.
  • Use a separate thread to detect all invalidations.

    • Blockers:
      • Will require maintaining a list of all compiled non-invalidated CT.
  • When we are installing a new code.

    • Blockers:
      • This is too late. There is already a new compilation of the CT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Copy link
Member

@chumer chumer Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, ideally we would have a callback from HotSpot on invalidate/flush which allows to take action. We do have such a callback on SubstrateVM for example. Its very unfortunate we don't have that, because we could also use it for better logging.

Thinking loud, it seems the second best solution could be to install code for a call target that calls resetCompilationProfile() which then invalidates itself. This means we wouldn't need to check anything and we would just reset once on the next invocation and not on follow-up invocations. Does that seem feasible to you?

Could we schedule a call for this? It seems like we should do a bit of brain storming on a solution here. (feel free to ping me on slack)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we schedule a call for this? It seems like we should do a bit of brain storming on a solution here. (feel free to ping me on slack)

Absolutely. I'll ping you on slack.

runtime().getListener().onCompilationDeoptimized(this, frame, getInvalidationReasonDescription());
}

@TruffleBoundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code behind a boundary? If this code is called during PE (not sure it is) then we need to better optimize this code, aka better profiling depending on where its used. If its never called in PE we should remove the boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make this a TruffleBoundary because an unit test was failing because a compilation was getting too big due to recursive inlining during PE. Since the test was triggering that I suspected that it could eventually affect an end user as well.

*/
default void onCompilationDeoptimized(OptimizedCallTarget target, Frame frame) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface needs to evolve in a compatible way. Should be easy enough keeping the old method deprecated, each delegating to the other. See for example onCompilationStarted.

@@ -537,6 +540,7 @@ public final RootNode getRootNode() {
public final void resetCompilationProfile() {
this.callCount = 0;
this.callAndLoopCount = 0;
this.timeOfFirstCompilationInWindow = Instant.now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit the number of compilations that a CallTarget can have within a time period.

tbh I can't quite follow what the rationale for this feature is. When would you use this?
The maximum number of compilation is intended for compiler bugs, and should never be hit (breakpoint toggling in the debugger for example). If we extend this to time based limit, that defeats the purpose to detect those bugs.

So I think I need a motivating example here to understand.

Also I think setting:

 private Instant timeOfFirstCompilationInWindow;

in resetCompilationProfile() seems not very intuitive. Is this really the right place to do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I can't quite follow what the rationale for this feature is. When would you use this?
The maximum number of compilation is intended for compiler bugs, and should never be hit (breakpoint toggling in the debugger for example). If we extend this to time based limit, that defeats the purpose to detect those bugs.

My usage scenario may be a little edgey because 1) the application that caused me to bring up this change is very large and the code cache is conservatively small, therefore it causes many method flushes; 2) the recompilation of cold methods (the other issue that this PR addresses) together with (1) was causing many recompilations; 3) new guest language code is added/modified/removed from the system frequently.

I understand that the maximum recompilation limit is just a failsafe and isn't intended to be the main way to limit recompilations and I'm not trying to make it so. What I'm looking for is to make it more flexible, so that the system can remain 'stable' until I have a chance to act on the problem. By stable I mean that there aren't too many recompilations of the call target happening (i.e., a recompilation rate) and also that the method isn't run interpreted until I restart the application.

Copy link
Member

@chumer chumer Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So because the compilation queue is flushed often, you want to limit the maximum compilations per certain time frame? Wouldn't it be better to just reset the successfulCompilationsCount on a flush? Or do I miss something here?

If we never reset the successfulCompilationsCount we will also trigger deopt loop detection which is not limited by a time window.

if (engine.maximumCompilationsWindowInMinutes > 0) {
long ageInMinutes = ChronoUnit.MINUTES.between(timeOfFirstCompilationInWindow, Instant.now());
if (ageInMinutes >= engine.maximumCompilationsWindowInMinutes) {
// This compilation would have been blocked if the window hadn't overflowed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: use multi line comments for long text.

option("engine.BackgroundCompilation", "false").//
option("engine.CompileImmediately", "true").//
option("engine.MaximumCompilations", "2").//
option("engine.MaximumCompilationsWindow", "1").build()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we use seconds not minutes. This came to a surprise to me. Any reason this is not seconds?
We could then also spend less time on this test. 90 seconds for such a test is excessive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truffle OptimizedCallTarget profile counters do not decay
3 participants