Skip to content

Compile packages independently, link using LTO #2870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 6 tasks
aykevl opened this issue May 24, 2022 · 5 comments
Open
3 of 6 tasks

Compile packages independently, link using LTO #2870

aykevl opened this issue May 24, 2022 · 5 comments
Labels
core enhancement New feature or request

Comments

@aykevl
Copy link
Member

aykevl commented May 24, 2022

After #285, I'd like to move one step further: by compiling packages entirely separately and doing optimizations across packages using ThinLTO (or, optionally, full LTO if desired). The main benefit is that compilation should be a lot faster. Both with a cold cache (by parallelizing codegen) and with small changes to the source code (by reusing most packages). We should be able to get close to the speed of the go toolchain: TinyGo is currently a lot slower.

How we currently compile packages is as follows:

  1. LLVM IR for packages is generated in parallel (and cached in ~/.cache/tinygo).
  2. This IR is then merged together to create one huge LLVM module with the IR of all packages.
  3. Some generic LLVM optimizations and TinyGo specific transformation passes are applied to all this combined IR.
  4. The IR is then written to a temporary location, either as bitcode (for ThinLTO) or as an object file (for non-ThinLTO builds).
  5. The linker (usually lld) is invoked to link everything together to generate an executable. In the ThinLTO case, lld creates an object file internally and caches it.

What I'd like to see:

  1. LLVM IR for packages is generated in parallel (and cached in ~/.cache/tinygo), as before. TinyGo specific optimizations need to be done in this phase.
  2. The linker (lld) is then used to link all bitcode files together, using ThinLTO.

This means there is no phase in which all IR is combined into one big module, which avoids the serial step that currently takes up most of the compile time.

This is no small task. We currently rely heavily on merging all packages together to perform some (required) optimization passes. These will need to be changed in some way to work well with LTO, by modifying them or replacing them with something else:

  • We don't support ThinLTO yet for some targets (see Use ThinLTO on Windows #2867, darwin: add support for ThinLTO #2865 for example).
  • Some targets need the AddGlobalsBitmap pass to be able to scan global variables in the GC mark phase. It should be possible to convert this to simply scanning the .data/.bss sections everywhere (see Use ThinLTO on Windows #2867, darwin: scan globals by reading MachO header #2869 for example).
  • WebAssembly uses the MakeGCStackSlots pass. We need to make this pass run per package. In the future, the WebAssembly GC would be an alternative.
  • Reflect information is currently processed for the whole program in LowerReflect. I've been working on a replacement in Refactor reflect package #2640 but it's going to cost something. In return, the compiler itself becomes easier to understand and new reflect features are easier to add.
  • Interface method calls are lowered to direct calls in LowerInterfaces. We probably need to switch to vtable style interfaces. The optimizations that we currently do might be replaced by LLVM support for whole program devirtualization for C++.
  • Interrupt handlers are currently combined in LowerInterrupts. This is done late so that unused interrupts can be optimized away. I'm not sure how to do this efficiently in any other way other than at this stage.

Of course, the resulting binaries should remain small. It's hard to avoid a slight increase, but hopefully the benefits of a simpler compiler and (much) faster compile times outweigh the downsides.

@deadprogram deadprogram added enhancement New feature or request core labels May 24, 2022
aykevl added a commit that referenced this issue May 28, 2022
This is a step towards #2870, similar to #2867 and #2869.
deadprogram pushed a commit that referenced this issue May 29, 2022
This is a step towards #2870, similar to #2867 and #2869.
aykevl added a commit that referenced this issue May 30, 2022
Precise globals require a whole program optimization pass that is hard
to support when building packages separately. This patch removes support
for these globals by converting the last use (Linux) to use
linker-defined symbols instead.

For details, see: #2870
aykevl added a commit that referenced this issue May 30, 2022
This shrinks transform.Optimize() a little bit, working towards the goal
of #2870. I ran the smoke
tests and there is no practical downside: one test got smaller (??) and
one had a different .hex hash, but other than that there was no
difference.

This should also make TinyGo a liiitle bit faster but it's probably not
even measurable.
aykevl added a commit that referenced this issue May 30, 2022
This shrinks transform.Optimize() a little bit, working towards the goal
of #2870. I ran the smoke
tests and there is no practical downside: one test got smaller (??) and
one had a different .hex hash, but other than that there was no
difference.

This should also make TinyGo a liiitle bit faster but it's probably not
even measurable.
deadprogram pushed a commit that referenced this issue May 30, 2022
This shrinks transform.Optimize() a little bit, working towards the goal
of #2870. I ran the smoke
tests and there is no practical downside: one test got smaller (??) and
one had a different .hex hash, but other than that there was no
difference.

This should also make TinyGo a liiitle bit faster but it's probably not
even measurable.
deadprogram pushed a commit that referenced this issue Jun 1, 2022
Precise globals require a whole program optimization pass that is hard
to support when building packages separately. This patch removes support
for these globals by converting the last use (Linux) to use
linker-defined symbols instead.

For details, see: #2870
@niaow
Copy link
Member

niaow commented Jun 2, 2022

Another important issue is interp, where we may need to rethink a bit.

@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

@niaow yes. We currently run interp once per package and then again for the whole program. I imagine an initial implementation of this feature would be opt-in and only run interp per package (not for the whole program) which should work in practice with some increase in binary size. We can then look into improving this.
I didn't include it in the list as it isn't a true blocker like most of the other items are.

@aykevl
Copy link
Member Author

aykevl commented Jan 25, 2023

ThinLTO is now supported on all architectures/platforms! 🎉
That's one more checkbox checked.

@aykevl
Copy link
Member Author

aykevl commented Feb 22, 2023

The reflect refactor is in 🎉

aykevl added a commit that referenced this issue Feb 26, 2023
We use ThinLTO for linking, but we use it in a way that doesn't give
most of its benefits: we merge all the bitcode files into a single LLVM
module and run some optimizations on it before linking. Therefore, this
works more like a traditional "full" LTO link rather than a true thin
link.

This commit adds a new experimental -lto=thin option to do a true
ThinLTO link. The main benefit is that linking will be a lot faster,
especially for large programs consisting of many packages.

At the moment, it only works for programs that don't do interface type
asserts and don't call interface methods. It also probably won't work on
WebAssembly and baremetal systems. But it's part of a larger goal
towards a truly incremental build system:
#2870
Once interface type asserts and method calls are converted to a
vtable-like implementation, most programs should just work on
linux/darwin/windows.
@aykevl
Copy link
Member Author

aykevl commented Feb 26, 2023

Managed to run some test programs with a new -lto=thin flag! See: #3489

The next hurdle is refactoring interface type asserts and interface method calls, which is something that will likely be necessary for full reflect support anyway (to implement things like .Method(n)).

aykevl added a commit that referenced this issue Feb 26, 2023
We use ThinLTO for linking, but we use it in a way that doesn't give
most of its benefits: we merge all the bitcode files into a single LLVM
module and run some optimizations on it before linking. Therefore, this
works more like a traditional "full" LTO link rather than a true thin
link.

This commit adds a new experimental -lto=thin option to do a true
ThinLTO link. The main benefit is that linking will be a lot faster,
especially for large programs consisting of many packages.

At the moment, it only works for programs that don't do interface type
asserts and don't call interface methods. It also probably won't work on
WebAssembly and baremetal systems. But it's part of a larger goal
towards a truly incremental build system:
#2870
Once interface type asserts and method calls are converted to a
vtable-like implementation, most programs should just work on
linux/darwin/windows.
aykevl added a commit that referenced this issue Feb 26, 2023
We use ThinLTO for linking, but we use it in a way that doesn't give
most of its benefits: we merge all the bitcode files into a single LLVM
module and run some optimizations on it before linking. Therefore, this
works more like a traditional "full" LTO link rather than a true thin
link.

This commit adds a new experimental -lto=thin option to do a true
ThinLTO link. The main benefit is that linking will be a lot faster,
especially for large programs consisting of many packages.

At the moment, it only works for programs that don't do interface type
asserts and don't call interface methods. It also probably won't work on
WebAssembly and baremetal systems. But it's part of a larger goal
towards a truly incremental build system:
#2870
Once interface type asserts and method calls are converted to a
vtable-like implementation, most programs should just work on
linux/darwin/windows.
aykevl added a commit that referenced this issue Feb 26, 2023
We use ThinLTO for linking, but we use it in a way that doesn't give
most of its benefits: we merge all the bitcode files into a single LLVM
module and run some optimizations on it before linking. Therefore, this
works more like a traditional "full" LTO link rather than a true thin
link.

This commit adds a new experimental -lto=thin option to do a true
ThinLTO link. The main benefit is that linking will be a lot faster,
especially for large programs consisting of many packages.

At the moment, it only works for programs that don't do interface type
asserts and don't call interface methods. It also probably won't work on
WebAssembly and baremetal systems. But it's part of a larger goal
towards a truly incremental build system:
#2870
Once interface type asserts and method calls are converted to a
vtable-like implementation, most programs should just work on
linux/darwin/windows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants