Skip to content

Question and Proposal: Allow single-run execution of precompile file #486

Open
@NHDaly

Description

@NHDaly

Hello! This is a combo question and proposal, where I am hoping to reduce the time it takes to build a statically compiled sysimg or executable, by avoiding compiling everything twice, as we currently do.

My understanding of the current process of static compilation (as run from, say, PackageCompiler.create_app()) includes at least the following steps:

  1. Run the provided precompile script(s) with --trace-compile=tmpname(), to record as a string representation what functions+types to compile when building the sysimg. This produces a file(s) containing several precompile(foo, (Bar, Baz)) statements.
  2. Start a new julia session with --output-o, import the Package(s) being compiled, and then load and execute the precompile statements generated by the precompile scripts and provided in any precompile_statements_file(s).

My concern is that this means that in order to produce a statically compiled binary, we have to pay for compilation latency twice for every function compiled from in the precompile_execution_file.

My question is whether we could (maybe optionally) combine these into a single step, running the precompile_execution_file with --output-o directly, and then also loading and executing any provided precompile_statements_files in the same session.

Could you help me understand why this is currently performed in two separate steps now, and what we might need to do in order to allow us to combine these into one step?

Some problems that we have from the current setup:


Some reasons that I can imagine that might motivate why we currently do this in two steps include:
A) In order to avoid method invalidations, perhaps we want to ensure that we aren't eval'ing new definitions (by loading new packages for example) during execution of the process running with --output-o?
- My thinking is that if we were to run --output-o during the main process, we might accidentally invalidate some of the functions we mean to statically compile after we've emitted them by loading some new package halfway through the precompilation script, and then I don't know how --output-o would handle that. Would that cause problems?
B) Perhaps running with --output-o makes julia significantly slower, to the point where it might be faster overall to run once without that flag, record the results to disk, and then run again with the flag, only performing the output? But this seems dubious to me when the compilation itself is the bottleneck in a precompilation script (which it hopefully should be, for a well-written precompilation script).
C) A precompilation script may load other packages in order to trigger all the compilations desired to be statically compiled, but we don't necessarily want to precompile the functions from those other packages.
- To solve this case, I imagine that we could perhaps update julia's --output-o flag to take a list of top-level module names from which to emit object code, and it could ignore methods and/or types coming from outside that list? That should replicate the current behavior.

I'm very interested to hear if there are other things I'm missing! :) Sorry if this is rehashing old discussions; i haven't been able to find anything on this when searching.

If this doesn't make sense all the time, perhaps we could support it with a flag, or something?
Anyway, thanks for your time!
Happy 2021!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions