Description
Hello! This is a combo question and proposal, where I am hoping to reduce the time it takes to build a statically compiled sysimg or executable, by avoiding compiling everything twice, as we currently do.
My understanding of the current process of static compilation (as run from, say, PackageCompiler.create_app()
) includes at least the following steps:
- Run the provided precompile script(s) with
--trace-compile=tmpname()
, to record as a string representation what functions+types to compile when building the sysimg. This produces a file(s) containing severalprecompile(foo, (Bar, Baz))
statements. - Start a new julia session with
--output-o
, import the Package(s) being compiled, and then load and execute theprecompile
statements generated by the precompile scripts and provided in anyprecompile_statements_file
(s).
My concern is that this means that in order to produce a statically compiled binary, we have to pay for compilation latency twice for every function compiled from in the precompile_execution_file
.
My question is whether we could (maybe optionally) combine these into a single step, running the precompile_execution_file
with --output-o
directly, and then also loading and executing any provided precompile_statements_file
s in the same session.
Could you help me understand why this is currently performed in two separate steps now, and what we might need to do in order to allow us to combine these into one step?
Some problems that we have from the current setup:
- It's slow: we have to pay for compiling all the code twice, and for our software this currently takes >1 hour and we aren't snooping as much as we would like. I'd like us to snoop much more, but we're afraid of making our CI build times too long.
- Since we currently write every method instance to disk as a string and then read it back in a new process, there are currently bugs that cause us to drop some functions.
- For more details, see:
- If we were to avoid this round-trip through a text file, we could get much better recall via static compilation. Currently we have around 3,000 / 15,000 precompile statements not actually working in our build (😢), and I hope avoiding the round trip could help?
Some reasons that I can imagine that might motivate why we currently do this in two steps include:
A) In order to avoid method invalidations, perhaps we want to ensure that we aren't eval'ing new definitions (by loading new packages for example) during execution of the process running with --output-o
?
- My thinking is that if we were to run --output-o
during the main process, we might accidentally invalidate some of the functions we mean to statically compile after we've emitted them by loading some new package halfway through the precompilation script, and then I don't know how --output-o
would handle that. Would that cause problems?
B) Perhaps running with --output-o
makes julia significantly slower, to the point where it might be faster overall to run once without that flag, record the results to disk, and then run again with the flag, only performing the output? But this seems dubious to me when the compilation itself is the bottleneck in a precompilation script (which it hopefully should be, for a well-written precompilation script).
C) A precompilation script may load other packages in order to trigger all the compilations desired to be statically compiled, but we don't necessarily want to precompile the functions from those other packages.
- To solve this case, I imagine that we could perhaps update julia's --output-o
flag to take a list of top-level module names from which to emit object code, and it could ignore methods and/or types coming from outside that list? That should replicate the current behavior.
I'm very interested to hear if there are other things I'm missing! :) Sorry if this is rehashing old discussions; i haven't been able to find anything on this when searching.
If this doesn't make sense all the time, perhaps we could support it with a flag, or something?
Anyway, thanks for your time!
Happy 2021!