Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period #11610

JohnTortugo · 2025-07-12T20:30:01Z

Closes: #11045

Please review this patch that modifies Truffle to:

Reset a CallTarget's profile when the nmethod associated with it was invalidated by HotSpot because the Code Cache's method flushing heuristics deemed the nmethod to be cold.
Limit the number of compilations that a CallTarget can have within a time period. This doesn't change the current behavior, instead it just makes the constraint more flexible. I added a new parameter, MaximumCompilationsWindow, that the user can set to specify a time window within which the number of compilations will be limited (currently the window is whole duration of application execution).

Tests:

OSX AArch64 with LabsJDK "tip" using "mx gate"
Linux x65 with LabsJDK "tip" using "mx gate"
GitHub Actions
We have been running an internal system with similar changes for months and we didn't notice any regression.

…lations

JohnTortugo · 2025-07-22T17:40:30Z

Can someone please take a look? /cc @chumer @dougxc ?

dougxc · 2025-07-22T18:08:29Z

Looks reasonable to me but a more detailed review should be done by @tzezula or @chumer.

chumer · 2025-07-22T19:02:43Z

...racle.truffle.runtime/src/com/oracle/truffle/runtime/hotspot/HotSpotOptimizedCallTarget.java

+        boolean isValid = installedCode.isValid();
+        if (!isValid && installedCode != INVALID_CODE) {
+            if (getInvalidationReason() == ((HotSpotTruffleRuntime) runtime()).getColdMethodInvalidationReason()) {
+                invalidateExistingCode();


Can't say I am a fan of this solution. How do we know isValid() is called often enough?
Also it is surprising that isValid() resets the compilation profile and even can call listener methods (listeners could call into isValid() too!)

What did you consider as alternatives here?
Can you explain when / how often this is invoked? Can we call this maybe between compilations on compiler threads?

I don't think we should use this approach.

How do we know isValid() is called often enough?

Do you mean often enough so that we detect "soon" that the call target was invalidated? My understanding is that there is no requirement for how "soon" we detect the invalidation. We just need to eventually detect it and since the isValid method is always called before a new compilation I decided to implement things like this.

... listeners could call into isValid() too!

The existing code is invalidated before onCompilationDeoptimized is called, which prevents a recursion.

Can we call this maybe between compilations on compiler threads?

I'm not sure what's your suggestion here. Do you mean iterate on all current compiled call targets to see if any was invalidated?

Can't say I am a fan of this solution.

Please, let me know your suggestions!

The goal here is to reset the CT (Call Target) profile when its INSTALLED_CODE was invalidated by HotSpot because it was cold. This needs to happen before we decide to compile the CT again. Realistically speaking this kind of invalidation can only happen between executions, not during an execution.

JVMCI is not able to call a method in Truffle or GraalJIT. Therefore, it cannot downcall to notify that the CT was invalidated. Thus, the identification that the CT was invalidated is a passive event that happens in Truffle.

Approaches that we can use to detect that the CT was invalidated:

Check before every execution of the CT in interpreter mode; before we check if the CT should be compiled.

Pros:

Relatively simple to implement.

This is a direct way to achieve what we want: prevent recompilation of cold method.

Cons:

Executed every time that we want to execute the CT.

Use a separate thread to detect all invalidations.

Blockers:

Will require maintaining a list of all compiled non-invalidated CT.

When we are installing a new code.

Blockers:

This is too late. There is already a new compilation of the CT.

What do you think?

As you say, ideally we would have a callback from HotSpot on invalidate/flush which allows to take action. We do have such a callback on SubstrateVM for example. Its very unfortunate we don't have that, because we could also use it for better logging.

Thinking loud, it seems the second best solution could be to install code for a call target that calls resetCompilationProfile() which then invalidates itself. This means we wouldn't need to check anything and we would just reset once on the next invocation and not on follow-up invocations. Does that seem feasible to you?

Could we schedule a call for this? It seems like we should do a bit of brain storming on a solution here. (feel free to ping me on slack)

Could we schedule a call for this? It seems like we should do a bit of brain storming on a solution here. (feel free to ping me on slack)

Absolutely. I'll ping you on slack.

chumer · 2025-07-22T19:02:47Z

...racle.truffle.runtime/src/com/oracle/truffle/runtime/hotspot/HotSpotOptimizedCallTarget.java

+        runtime().getListener().onCompilationDeoptimized(this, frame, getInvalidationReasonDescription());
+    }
+
+    @TruffleBoundary


Why is this code behind a boundary? If this code is called during PE (not sure it is) then we need to better optimize this code, aka better profiling depending on where its used. If its never called in PE we should remove the boundary.

I had to make this a TruffleBoundary because an unit test was failing because a compilation was getting too big due to recursive inlining during PE. Since the test was triggering that I suspected that it could eventually affect an end user as well.

chumer · 2025-07-22T19:05:52Z

...m.oracle.truffle.runtime/src/com/oracle/truffle/runtime/OptimizedTruffleRuntimeListener.java

     */
-    default void onCompilationDeoptimized(OptimizedCallTarget target, Frame frame) {


This interface needs to evolve in a compatible way. Should be easy enough keeping the old method deprecated, each delegating to the other. See for example onCompilationStarted.

chumer · 2025-07-22T19:11:56Z

truffle/src/com.oracle.truffle.runtime/src/com/oracle/truffle/runtime/OptimizedCallTarget.java

@@ -537,6 +540,7 @@ public final RootNode getRootNode() {
    public final void resetCompilationProfile() {
        this.callCount = 0;
        this.callAndLoopCount = 0;
+        this.timeOfFirstCompilationInWindow = Instant.now();


Limit the number of compilations that a CallTarget can have within a time period.

tbh I can't quite follow what the rationale for this feature is. When would you use this?
The maximum number of compilation is intended for compiler bugs, and should never be hit (breakpoint toggling in the debugger for example). If we extend this to time based limit, that defeats the purpose to detect those bugs.

So I think I need a motivating example here to understand.

Also I think setting:

private Instant timeOfFirstCompilationInWindow;

in resetCompilationProfile() seems not very intuitive. Is this really the right place to do it?

tbh I can't quite follow what the rationale for this feature is. When would you use this?
The maximum number of compilation is intended for compiler bugs, and should never be hit (breakpoint toggling in the debugger for example). If we extend this to time based limit, that defeats the purpose to detect those bugs.

My usage scenario may be a little edgey because 1) the application that caused me to bring up this change is very large and the code cache is conservatively small, therefore it causes many method flushes; 2) the recompilation of cold methods (the other issue that this PR addresses) together with (1) was causing many recompilations; 3) new guest language code is added/modified/removed from the system frequently.

I understand that the maximum recompilation limit is just a failsafe and isn't intended to be the main way to limit recompilations and I'm not trying to make it so. What I'm looking for is to make it more flexible, so that the system can remain 'stable' until I have a chance to act on the problem. By stable I mean that there aren't too many recompilations of the call target happening (i.e., a recompilation rate) and also that the method isn't run interpreted until I restart the application.

So because the compilation queue is flushed often, you want to limit the maximum compilations per certain time frame? Wouldn't it be better to just reset the successfulCompilationsCount on a flush? Or do I miss something here?

If we never reset the successfulCompilationsCount we will also trigger deopt loop detection which is not limited by a time window.

chumer · 2025-07-22T19:12:20Z

truffle/src/com.oracle.truffle.runtime/src/com/oracle/truffle/runtime/OptimizedCallTarget.java

+        if (engine.maximumCompilationsWindowInMinutes > 0) {
+            long ageInMinutes = ChronoUnit.MINUTES.between(timeOfFirstCompilationInWindow, Instant.now());
+            if (ageInMinutes >= engine.maximumCompilationsWindowInMinutes) {
+                // This compilation would have been blocked if the window hadn't overflowed,


Minor: use multi line comments for long text.

chumer · 2025-08-04T17:28:17Z

...src/jdk.graal.compiler.test/src/jdk/graal/compiler/truffle/test/MaximumCompilationsTest.java

+                        option("engine.BackgroundCompilation", "false").//
+                        option("engine.CompileImmediately", "true").//
+                        option("engine.MaximumCompilations", "2").//
+                        option("engine.MaximumCompilationsWindow", "1").build()) {


Typically we use seconds not minutes. This came to a surprise to me. Any reason this is not seconds?
We could then also spend less time on this test. 90 seconds for such a test is excessive.

Detect cold method invalidation & reprofile. Limit on number of compi…

ec0c74f

…lations

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 12, 2025

JohnTortugo changed the title ~~Detect cold method invalidation & reprofile. Limit on number of compilations~~ Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period Jul 12, 2025

JohnTortugo added 2 commits July 16, 2025 10:41

Force new build.

b4d6789

Merge remote-tracking branch 'origin/master' into recompilations

138dbde

chumer requested changes Jul 22, 2025

View reviewed changes

chumer reviewed Aug 4, 2025

View reviewed changes

		*/
		default void onCompilationDeoptimized(OptimizedCallTarget target, Frame frame) {

Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period #11610

Are you sure you want to change the base?

Detect cold CallTarget invalidation and reset its profile; Limit number of recompilations within a time period #11610

Uh oh!

Conversation

JohnTortugo commented Jul 12, 2025

Uh oh!

JohnTortugo commented Jul 22, 2025

Uh oh!

dougxc commented Jul 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohnTortugo Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chumer Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chumer Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohnTortugo Jul 31, 2025 •

edited

Loading

chumer Aug 4, 2025 •

edited

Loading

chumer Aug 4, 2025 •

edited

Loading