Skip to content

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 23, 2025

Reverts #58182. The API changes were intentional and desirable. Let's figure out why nansoldier was upset and re-apply this.

@Keno
Copy link
Member Author

Keno commented Apr 23, 2025

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash vtjnash changed the title Revert "Revert #57979 (and following #58083 #58082)" Reland #57979 (and following #58083 #58082) Apr 23, 2025
@serenity4
Copy link
Member

serenity4 commented Apr 23, 2025

@nanosoldier runbenchmarks("inference", vs=":master")

2 similar comments
@serenity4
Copy link
Member

@nanosoldier runbenchmarks("inference", vs=":master")

@Keno
Copy link
Member Author

Keno commented Apr 23, 2025

@nanosoldier runbenchmarks("inference", vs=":master")

@vtjnash
Copy link
Member

vtjnash commented Apr 23, 2025

Interesting to see that this PR has a very strong (positive) impact on system image size, removing roughly 20 MB.

@serenity4
Copy link
Member

Is that still the case after 1e6e948? We were compressing more CodeInfos than intended.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@serenity4
Copy link
Member

Ah, that's much better.

@vtjnash
Copy link
Member

vtjnash commented Apr 23, 2025

Yes, that difference is still visible (I thought that undo should be a no-op anyways), so it might be good to understand why this PR is having such a dramatic effect to make sure it is not discarding something unintentionally.

@serenity4
Copy link
Member

serenity4 commented Apr 23, 2025

I believe the difference may come from the fact that the code instance is updated (jl_update_codeinst) with inferred_result = nothing if the source is not inlineable in the optimized case, whereas previously we still updated it with inferred_result::CodeInfo (it's only that we didn't compress it if not inlineable).

@Keno may know more about this.

EDIT: this is not correct, behavior should be the same as before and not store non-inlineable sources in the code instance; please ignore this comment.

@serenity4
Copy link
Member

It is actually caused by a different storage logic for debuginfo (thanks @Keno for the pointer).

Before these changes, even when we did not keep the optimized CodeInfo, we stored its debuginfo in the code instance. With this PR, we store the debuginfo of the optimized CodeInfo only if we are also storing the optimized CodeInfo in the code instance, otherwise we use the debuginfo of the source CodeInfo. I can reproduce that restoring the previous behavior adds back the ~20 MB in the system image.

@vtjnash Keno and I are not very familiar with the handling of DebugInfo, does that change look correct to you? What is the debuginfo of the optimized source used for exactly? I think it would be good to add some kind of test to make sure we maintain the intended behavior.

@Keno
Copy link
Member Author

Keno commented Apr 25, 2025

@vtjnash can you comment on @serenity4's question above? I'd like to get this merged to unbreak downstream

@vtjnash
Copy link
Member

vtjnash commented Apr 25, 2025

The debuginfo is supposed to correspond to the executing code, for use in handling printing stacktraces. Tests are part of this PR if you wanted to have those first: #53925

@Keno
Copy link
Member Author

Keno commented Apr 25, 2025

The dataflow here is a bit awkward since the codegen is handled separately and in fact, I believe when codegen actually gets around to updating the pointer in the codeinstance, it also sets debuginfo. I think that might imply it's fine not to set it here? But yes, it'd be good to have some tests.

@vtjnash
Copy link
Member

vtjnash commented Apr 25, 2025

Yes, I assume codegen currently might be also change this field, while codegen probably should instead abort if inference made a mistake here, that seemed slightly annoying, so it currently instead silently tries to fix it.

@Keno
Copy link
Member Author

Keno commented Apr 25, 2025

Does it make sense for inference to set this though? In particular, it seems like (as evidenced by the big size decrease) there are cases where we don't actually end up compiling the code, in which case the debuginfo is never usable by anything. Shouldn't the dataflow for the code and the debuginfo be together?

@vtjnash
Copy link
Member

vtjnash commented Apr 25, 2025

Checking the code, codegen relies upon inference having done the correct thing here already, and does not have the assumed ability to fix mistakes

@Keno
Copy link
Member Author

Keno commented Apr 25, 2025

I don't think that really answers the conceptual question though: Why are these taking different dataflow paths if?

@serenity4
Copy link
Member

As a quick experiment, using only the debuginfo from the parent linfo in inference (jl_update_codeinst), we can squeeze another ~17 MB (so roughly 35 to 40 MB total). It seems to impact type-depth printing and location of array comprehension in stacktraces (essentially stacktrace tests pass except these ones) and a quick look reveals that stacktraces are mostly identical including for those containing inlined code. Of course, it's very much possible that other stuff breaks.

I played with the tests from #53925, behavior remains identical without the added feature in that PR, so in the end I don't think it makes sense to include them.

I haven't identified any failure mode for the change to debuginfo introduced in this PR and all tests are passing, so perhaps we can merge to un-break downstream and then reevaluate whether we can have any more size gains by refactoring this debuginfo logic.

@vtjnash
Copy link
Member

vtjnash commented Apr 25, 2025

The implementation of the tests in #53925 is incomplete, so you will need to finish that first before you can claim victory here. Yes, storing debuginfo is expensive. No, you cannot just simply delete it because it is inconvenient for your PR.

The reason the dataflow diverges here is that it splits for several different consumers, including inlining, #53925 WIP, and codegen. Each gets a different copy of a different representation of this metadata.

@Keno
Copy link
Member Author

Keno commented Apr 25, 2025

The reason the dataflow diverges here is that it splits for several different consumers, including inlining, #53925 WIP, and codegen. Each gets a different copy of a different representation of this metadata.

Yes, but it's conceptually odd that #53925 WIP needs the metadata if and only if codegen happened and yet, it expects the metadata to have been stored in the place where otherwise only inlining looks at it (and thus in particular requires it to be stored that that place even if inlining decides it doesn't need it). That's the odd part.

@serenity4
Copy link
Member

you cannot just simply delete it because it is inconvenient for your PR.

To be clear, it's very straightforward to restore the previous behavior. It's not inconvenient, we're just questioning whether the previous behavior made sense before blindly re-applying it and removing what seems to be a considerable and unexpected gain.

The only change required would be to add a branch where if transform_result_for_cache returns nothing we turn the OptimizationState into a CodeInfo and extract its debuginfo just so we can use it for jl_update_codeinst, which is just 5 lines of code. If that's the only way it gets approved for merge so we can un-break downstream code, that's fine by me, but I agree with Keno that the data-flow is somewhat confusing.

@serenity4
Copy link
Member

I reverted to using the previous debuginfo handling for now, if there are no other objections feel free to merge.

@Keno Keno merged commit 3864b18 into master Apr 30, 2025
4 of 7 checks passed
@Keno Keno deleted the revert-58182-jn/revert-57979-58083-58082 branch April 30, 2025 16:35
serenity4 added a commit to serenity4/julia that referenced this pull request May 1, 2025
JuliaLang#58203)

Reverts JuliaLang#58182. The API changes were intentional and
desirable. Let's figure out why nansoldier was upset and re-apply this.

---------

Co-authored-by: Cédric Belmant <[email protected]>
charleskawczynski pushed a commit to charleskawczynski/julia that referenced this pull request May 12, 2025
JuliaLang#58203)

Reverts JuliaLang#58182. The API changes were intentional and
desirable. Let's figure out why nansoldier was upset and re-apply this.

---------

Co-authored-by: Cédric Belmant <[email protected]>
@nsajko nsajko added the re-land This relands a PR that was previously merged but was later reverted. label Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
re-land This relands a PR that was previously merged but was later reverted.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants