Skip to content

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 2, 2025

Currently the inlineability determination is in a bit of an odd spot - just after the optimizers while everything is still in IRCode. It seems more sensible to move this code into the cache transformation code, which is the first place that makes an actual decision based on inlineability. If an external AbstractInterpreter does not need to covert to CodeInfo for compilation purposes this also potentially saves that extra conversion. While we're at it, clean up some naming to deconflict it with other uses.

Comment on lines 497 to 498
function finishopt!(interp::AbstractInterpreter, opt::OptimizationState,
ir::IRCode, caller::InferenceResult)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function finishopt!(interp::AbstractInterpreter, opt::OptimizationState,
ir::IRCode, caller::InferenceResult)
function finishopt!(::AbstractInterpreter, opt::OptimizationState, ir::IRCode)

@timeit "optimizer" ir = run_passes_ipo_safe(opt.src, opt)
ipo_dataflow_analysis!(interp, opt, ir, caller)
return finish(interp, opt, ir, caller)
finishopt!(interp, opt, ir, caller)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
finishopt!(interp, opt, ir, caller)
finishopt!(interp, opt, ir)

Comment on lines 151 to 160
if !discard_src && codegen !== nothing && uncompressed !== nothing
if !(uncompressed isa CodeInfo)
uncompressed = ir_to_codeinf!(uncompressed)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure why this part needed to be changed. Perhaps we could use inferred_result here instead of uncompressed? That way, if we properly define transform_result_for_cache, it would also provide support for codegen_cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also now missing a check for isa CodeInfo, which may cause problems for other consumers

Comment on lines 380 to 381
if may_compress(interp)
return ccall(:jl_compress_ir, String, (Any, Any), def, ci)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transform is very expensive, why is it suddenly not conditional on being useful anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case doesn't reach here anymore.

uncompressed = result.src
const_flag = is_result_constabi_eligible(result)
discard_src = caller.cache_mode === CACHE_MODE_NULL || const_flag
if !discard_src
Copy link
Member

@vtjnash vtjnash Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the implementation of everything after here assumes that transform_result_for_cache returns a CodeInfo for correctness, and needs to be re-written

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like debuginfo may still computed incorrectly on this part of the PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serenity4 take a look?

Although independently it does now feel a bit weird from a dataflow perspective that the debuginfo that goes with the optimized source takes a different path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense to me to take di = src.debuginfo (where src::CodeInfo is extracted from result.src::Union{CodeInfo, OptimizationState}) and override it with what we can extract from inferred_result (essentially doing the same logic as for result.src).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like that: serenity4@318b9a3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased this PR and cherry-picked your commit.

@serenity4
Copy link
Member

Digging into this, it seems that the inlining cost is expected to be set for the source CodeInfo, at least for the case where no new CodeInfo is produced post-optimization (either because results are not cached - isdefined(result, :ci) === false - or because the CodeInfo would not be inlineable, a behavior introduced in this PR).

Essentially I think the failures were related to the fact that although

It seems more sensible to move this code into the cache transformation code, which is the first place that makes an actual decision based on inlineability

the cache transformation code path may not be taken, and in that case we still need to determine inlineability of the source CodeInfo.

I have a working version locally, just needs some tree-shaking to see what are the minimally required changes. @Keno feel free to let me know if you'd like to apply the fix yourself and address review comments, or if I should take it from here (opening a new PR I guess).

@serenity4
Copy link
Member

serenity4 commented Apr 4, 2025

Was this PR motivated by a need to customize inlineability determination for AbstractInterpreters? If so, and if my understanding outlined above is correct, we might need to expose that more explicitly via the AbstractInterpreter interface instead of through transform_result_for_cache as it is not the only place where inlineability determination would happen.

@Keno
Copy link
Member Author

Keno commented Apr 8, 2025

The primary motivation was to nail down what amount of information needs to be in the cache if you don't use CodeInfo. It seems odd to me from a conceptual dataflow perspective that inlineability needs to be set on the original CodeInfo - it's an output from the optimizer. Where you able to determine what situation reads it from there?

@serenity4
Copy link
Member

I see, I haven't looked into what piece of code required it particularly but will investigate with that perspective.

@Keno
Copy link
Member Author

Keno commented Apr 8, 2025

Also, once we do figure that out, I think there is an independent investigation to be had as to why not inlining something leads to a miscompile.

@serenity4
Copy link
Member

serenity4 commented Apr 9, 2025

After taking a look I see that the only situation we need to set the inlining cost is when !isdefined(result, :ci) && isa(result.src, OptimizationState) && caller.cache_mode === CACHE_MODE_LOCAL. IIUC, that's when a CodeInfo is inferred and optimized and cached locally (but not globally). In this case we don't go through transform_result_for_cache, but inference may still use optimization results. Should we also call transform_result_for_cache (or some variant like transform_result_for_local_cache) in this situation? (as it's for a different cache, calling transform_result_for_cache may result in a confusing API)

@Keno
Copy link
Member Author

Keno commented Apr 9, 2025

Yes, I think a transform_result_for_local_cache would be appropriate.

@serenity4
Copy link
Member

serenity4 commented Apr 9, 2025

Great, here is the diff between my branch and this PR if you'd like to have a look: kf/moveinlinemodel...serenity4:julia:kf/moveinlinemodel

@Keno
Copy link
Member Author

Keno commented Apr 9, 2025

Excellent, seems to be working! I'll look into if I can figure out why we miscompile if we don't inline those.

@Keno Keno force-pushed the kf/moveinlinemodel branch from 867c6dc to f3f6a55 Compare April 10, 2025 18:49
@Keno
Copy link
Member Author

Keno commented Apr 10, 2025

I tried a bunch of things and found some other issues, but I can't figure out how to make it miscompile without just going back to the previous version. I think I'll leave it be at this point, although it is a bit mysterious.

@serenity4
Copy link
Member

Thanks for taking a look, at least the additional fixes are always good to have.

Keno and others added 7 commits April 11, 2025 01:45
Currently the inlineability determination is in a bit of an odd spot -
just after the optimizers while everything is still in IRCode. It
seems more sensible to move this code into the cache transformation
code, which is the first place that makes an actual decision based
on inlineability. If an external AbstractInterpreter does not need
to covert to CodeInfo for compilation purposes this also potentially
saves that extra conversion. While we're at it, clean up some naming
to deconflict it with other uses.
Jameson's were already addressed in previous commits
@Keno Keno force-pushed the kf/moveinlinemodel branch from f3f6a55 to e26a8a7 Compare April 11, 2025 01:46
@Keno Keno merged commit b422883 into master Apr 11, 2025
4 of 7 checks passed
@Keno Keno deleted the kf/moveinlinemodel branch April 11, 2025 06:22
serenity4 added a commit to serenity4/Cthulhu.jl that referenced this pull request Apr 11, 2025
serenity4 added a commit to JuliaDebug/Cthulhu.jl that referenced this pull request Apr 11, 2025
)

* 2.17: Fix `.result` -> `.optresult` change for nightly

* Bump version

* Adjust to more changes from JuliaLang/julia/pull/57979

* Retrieve `interp` from `OptimizationState`

---------

Co-authored-by: Cédric Belmant <[email protected]>
Keno added a commit that referenced this pull request Apr 11, 2025
Keno added a commit that referenced this pull request Apr 12, 2025
vtjnash added a commit that referenced this pull request Apr 21, 2025
vtjnash added a commit that referenced this pull request Apr 22, 2025
The point of #57979 was to make inference faster, but it made it instead
much slower
(83524ac#commitcomment-155658124),
so revert back to the fast behavior before it was "optimized" (and
revert the bugfixes for the original commit).
Keno added a commit that referenced this pull request Apr 23, 2025
Keno added a commit that referenced this pull request Apr 30, 2025
Reverts #58182. The API changes were intentional and
desirable. Let's figure out why nansoldier was upset and re-apply this.

---------

Co-authored-by: Cédric Belmant <[email protected]>
serenity4 added a commit to serenity4/julia that referenced this pull request May 1, 2025
Currently the inlineability determination is in a bit of an odd spot -
just after the optimizers while everything is still in IRCode. It seems
more sensible to move this code into the cache transformation code,
which is the first place that makes an actual decision based on
inlineability. If an external AbstractInterpreter does not need to
covert to CodeInfo for compilation purposes this also potentially saves
that extra conversion. While we're at it, clean up some naming to
deconflict it with other uses.

---------

Co-authored-by: Cédric Belmant <[email protected]>
serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
JuliaLang#58182)

The point of JuliaLang#57979 was to make inference faster, but it made it instead
much slower
(JuliaLang@83524ac#commitcomment-155658124),
so revert back to the fast behavior before it was "optimized" (and
revert the bugfixes for the original commit).
serenity4 added a commit to serenity4/julia that referenced this pull request May 1, 2025
JuliaLang#58203)

Reverts JuliaLang#58182. The API changes were intentional and
desirable. Let's figure out why nansoldier was upset and re-apply this.

---------

Co-authored-by: Cédric Belmant <[email protected]>
LebedevRI pushed a commit to LebedevRI/julia that referenced this pull request May 2, 2025
JuliaLang#58182)

The point of JuliaLang#57979 was to make inference faster, but it made it instead
much slower
(JuliaLang@83524ac#commitcomment-155658124),
so revert back to the fast behavior before it was "optimized" (and
revert the bugfixes for the original commit).
charleskawczynski pushed a commit to charleskawczynski/julia that referenced this pull request May 12, 2025
JuliaLang#58203)

Reverts JuliaLang#58182. The API changes were intentional and
desirable. Let's figure out why nansoldier was upset and re-apply this.

---------

Co-authored-by: Cédric Belmant <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants