Skip to content

Compile packages independently #285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aykevl opened this issue Apr 16, 2019 · 10 comments
Closed

Compile packages independently #285

aykevl opened this issue Apr 16, 2019 · 10 comments
Labels
enhancement New feature or request

Comments

@aykevl
Copy link
Member

aykevl commented Apr 16, 2019

Packages should be compiled independently and then linked together. This provides several advantages in the long term:

  • Packages can be compiled in parallel so compile time can be shorter.
  • Unchanged packages can be cached, further cutting down on compile time.
  • We can optionally avoid LTO to make incremental (debug) builds much faster.

I would strive for the following architecture:

  • Change the compiler package to compile just a single package (and not dependencies).
  • Add a linker package that links several packages together and performs whole-program optimizations on them (LTO).
  • Add a driver package (or repurpose the loader package?) that orchestrates all this work.
  • Add some subcommands (go tool compile, go tool link) that call the compiler/linker directly. This can be useful for debugging.

I have taken a look at the Go build cache and I think I'd implement a similar system. Every package has a hash of the input (build tags, file list, hashes of files, output hashes of packages it depends on) and produces an output hash, which is a hash over all the artifacts that it builds (mostly the serialized exported types).

My thinking is that a cached package would be an uncompressed zip file stored in the cache directory, with the following contents as files:

  • a list of packages it depends upon and their hashes (plain text)
  • a list of files included in the build and their mtimes / hashes (plain text)
  • a serialized form of the public API (some binary format, like this one)
  • LLVM bytecode for this file (including C/C++ files, they are all linked together in one module / compile unit)

This is most likely too big to do at once, so I'd do it in multiple steps. But I'm creating this issue to give a high-level overview of what I would want to achieve at some point.

@deadprogram deadprogram added the enhancement New feature or request label Apr 17, 2019
This was referenced Apr 22, 2019
@aykevl
Copy link
Member Author

aykevl commented Apr 23, 2019

Some statistics when building -target=circuitplay-express ./testdata/stdlib.go, one of the slowest compiles at the moment:

  load:    37.230169ms
  parse:   90.345719ms
  types:   211.548187ms
  dce:     11.11218ms
  irgen:   700.464742ms
compile: 1.214556923s
interp:  1.264428106s
opt:     4.707457718s
codegen: 475.201528ms
link:    65.829285ms
total:   8.211343631s

The entire compiler runs serially at the moment, while most of this could be parallelized and/or cached.

  • the load step (which determines dependencies and to-be-compiled files) cannot be entirely parallelized, but dependencies can be loaded in parallel. However, there is not much to be gained here.
  • parse can happen fully parallel per file. Dependencies do not have to be loaded yet, although we should probably start to parse leaf packages.
  • typecheck can be done in parallel per package, after all dependent packages are typechecked. So it can start with leaf packages and move to the root.
  • the dce step will probably be removed when building incrementally (it isn't very useful right now anyway except for speed)
  • irgen can be done in parallel per package, after typecheck of that package is finished
  • parts of interp can be parallelized per package, but not all of it. However, it may be possible to cache most of interp for (near)root packages which is a common case.
  • opt can partially be done per package, or can be done entirely per package depending on whether LTO is used (there is no way to turn off LTO at the moment)
  • codegen can be done per package if we skip LTO, or can perhaps be done efficiently using ThinLTO.
  • link is already quite fast

@mewmew
Copy link

mewmew commented Jan 7, 2020

First off, thanks for releasing TinyGo @aykevl. I keep learning so much from this project!

Tonight I was curious to see how TinyGo lowered Go interfaces to LLVM IR and thus wanted to compile the following Go package:

package p

type T int

func (t T) M() int {
	return int(t) + 10
}

type I interface {
	M() int
}

func F(i I) int {
	return i.M()
}

However, doing so, I ran into a type assertion panic, as detailed below.

$ tinygo build -o a.ll p.go
panic: interface conversion: ssa.Member is nil, not *ssa.Function

goroutine 1 [running]:
github.com/tinygo-org/tinygo/ir.(*Program).SimpleDCE(0xc0009f64c0)
	/home/u/goget/src/github.com/tinygo-org/tinygo/ir/passes.go:69 +0xb52

After a bit of troubleshooting it seems like this was due to TinyGo looking for and not finding the main function of the package:

From github.com/tinygo-org/tinygo/ir/passes.go:69:

main := p.mainPkg.Members["main"].(*ssa.Function)

Converting package p into a main package resolved the above type assertion panic, but it seems TinyGo is too cleaver and optimized the interface related code away:

The following LLVM IR

  tail call fastcc void @runtime.printuint64(i64 10), !dbg !42

is all that's left from compiling:

package main

func main() {
	var t T
	println(F(t))
}

type T int

func (t T) M() int {
	return int(t) + 10
}

type I interface {
	M() int
}

func F(i I) int {
	return i.M()
}

My next attempt was to disable optimizations entirely, and to compile with tinygo build -opt 0 -o a.ll p.go. However, this resulted in the following panic:

$ tinygo build -opt 0 -o a.ll p.go
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7faf01b664f9]

runtime stack:
runtime.throw(0x87fa17, 0x2a)
	/home/u/go/src/runtime/panic.go:1112 +0x72
runtime.sigpanic()
	/home/u/go/src/runtime/signal_unix.go:661 +0x46a

goroutine 1 [syscall]:
runtime.cgocall(0x7a5160, 0xc000737420, 0xc000c7dd00)
	/home/u/go/src/runtime/cgocall.go:128 +0x5b fp=0xc0007373f0 sp=0xc0007373b8 pc=0x42040b
tinygo.org/x/go-llvm._Cfunc_LLVMVerifyModule(0x20b5260, 0xc000000001, 0xc000c7dd00, 0x0)
	_cgo_gotypes.go:9179 +0x4d fp=0xc000737420 sp=0xc0007373f0 pc=0x645d6d
tinygo.org/x/go-llvm.VerifyModule.func1(0x20b5260, 0xc000000001, 0xc000c7dd00, 0xc000737490)
	/home/u/goget/pkg/mod/tinygo.org/x/[email protected]/analysis.go:38 +0x9f fp=0xc000737450 sp=0xc000737420 pc=0x6554df
tinygo.org/x/go-llvm.VerifyModule(0x20b5260, 0x1, 0x0, 0x431517)
	/home/u/goget/pkg/mod/tinygo.org/x/[email protected]/analysis.go:38 +0x56 fp=0xc0007374a0 sp=0xc000737450 pc=0x646116
github.com/tinygo-org/tinygo/compiler.(*Compiler).Verify(...)
	/home/u/goget/src/github.com/tinygo-org/tinygo/compiler/compiler.go:2527
github.com/tinygo-org/tinygo/compiler.(*Compiler).Optimize(0xc000206ea0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/u/goget/src/github.com/tinygo-org/tinygo/compiler/optimizer.go:108 +0x5a7 fp=0xc0007376d0 sp=0xc0007374a0 pc=0x772637
github.com/tinygo-org/tinygo/builder.Build(0x7ffed6e88941, 0x4, 0x7ffed6e8893c, 0x4, 0xc0002044e0, 0xc000737cd0, 0x0, 0x0)
	/home/u/goget/src/github.com/tinygo-org/tinygo/builder/build.go:75 +0x2da fp=0xc000737c90 sp=0xc0007376d0 pc=0x79083a
main.Build(0x7ffed6e88941, 0x4, 0x7ffed6e8893c, 0x4, 0xc00018e0e0, 0x0, 0xe)
	/home/u/goget/src/github.com/tinygo-org/tinygo/main.go:85 +0xc5 fp=0xc000737cf8 sp=0xc000737c90 pc=0x79e355
main.main()
	/home/u/goget/src/github.com/tinygo-org/tinygo/main.go:739 +0x187c fp=0xc000737f88 sp=0xc000737cf8 pc=0x7a1dcc
runtime.main()
	/home/u/go/src/runtime/proc.go:203 +0x212 fp=0xc000737fe0 sp=0xc000737f88 pc=0x452682
runtime.goexit()
	/home/u/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc000737fe8 sp=0xc000737fe0 pc=0x47f4b1

Note that no panic occurs on optimization level 1, tinygo build -opt 1 -o a.ll p.go.

The output optimization level 1 still optimizes out the interface related code, and produces:

define internal fastcc void @p.go.main() unnamed_addr section ".text.p.go.main" !dbg !104 {
entry:
  tail call fastcc void @runtime.printint64(), !dbg !105
  tail call fastcc void @runtime.printnl(), !dbg !105
  ret void, !dbg !106
}

define internal fastcc void @runtime.printint64() unnamed_addr section ".text.runtime.printint64" !dbg !41 {
entry:
  call void @llvm.dbg.value(metadata i64 10, metadata !47, metadata !DIExpression()), !dbg !48
  tail call fastcc void @runtime.printuint64(i64 10), !dbg !49
  ret void, !dbg !50
}

While the above code produces the correct output result (10), looking at the produced assembly was a bit weird as one level of function invocations had been inlined and constant propagated. As such @runtime.printint64 takes no arguments and always prints 10. That's probably just an artifact of how the LLVM optimizer works, but at first glance, I thought the output LLVM IR assembly of @p.go.main was incorrect, since it didn't call @F nor sent any arguments to println.

So, to summarize this night-time adventure into exploring interface handling in TinyGo, I still had a lot of fun using TinyGo and feel quite amazed to see how far the project has gotten already. I definitely welcome the intended direction of TinyGo to compile packages independently as this makes it easier to debug, interact with, and use LLVM tools to process the LLVM IR output of TinyGo, not only for main packages, but for all compiled Go packages.

One may envision using this functionality of TinyGo to compile Go packages to LLVM IR, and then invoking specific functions of the compiled LLVM IR from a main function written in C and compiled with Clang. That would be quite amazing too!

I wish you all the best and happy continued coding!

Cheers,
Robin

@aykevl
Copy link
Member Author

aykevl commented Jan 7, 2020

Thank you for your interest in TinyGo internals!

Yes, as you've discovered TinyGo does some whole-program optimizations that often result in interfaces being optimized away entirely. This is very important for code size and to enable further optimizations such as inlining, const propagation, escape analysis, etc.

One may envision using this functionality of TinyGo to compile Go packages to LLVM IR, and then invoking specific functions of the compiled LLVM IR from a main function written in C and compiled with Clang. That would be quite amazing too!

While that would be quite interesting, that is difficult to do in the current design for a few reasons. First of all, most compiled code depends on the runtime package for various reasons (initialization, heap, map operations). Second, whole-program optimizations (which are in fact required in the current design of TinyGo) can obviously not be done for individual packages. However, I could imagine we'd eventually have -buildmode=c-archive to achieve a similar effect.
What this issue tries to achieve is basically to enable caching of partially compiled packages. TinyGo is rather slow at the moment, which you usually don't notice because the compiled programs are so small. But as the capabilities of TinyGo grow, so should the compile speed be improved.

That's probably just an artifact of how the LLVM optimizer works, but at first glance, I thought the output LLVM IR assembly of @p.go.main was incorrect, since it didn't call @F nor sent any arguments to println.

Yes, that has definitely complicated debugging for me sometimes. I think this optimization is the result of the following two optimization passes:
https://llvm.org/docs/Passes.html#ipsccp-interprocedural-sparse-conditional-constant-propagation
https://llvm.org/docs/Passes.html#deadargelim-dead-argument-elimination
Such optimizations are important because they enable further optimizations. In theory, the program you gave above might be optimized to the following given sufficient opportunity, thanks to inlining and const propagation (perhaps with -opt=2, the default is -opt=z for code size):

putchar('1');
putchar('0');
putchar('\r');
putchar('\n');

The fact that TinyGo breaks at -opt=0 is a bug and we should fix that (with an added test, of course). I don't use it much in practice, as -opt=1 is usually sufficient (and disabling optimizations entirely produces far more code making debugging harder as a result).

You may want to look at the -printir (and -no-debug) flags. -printir dumps the IR at a much earlier stage before any optimizations are run, so you can get a better insight in how the compiler works.

I do see that while the printint64 function takes no parameters anymore, the parameter is still there in the debug info:

  call void @llvm.dbg.value(metadata i64 10, metadata !47, metadata !DIExpression()), !dbg !48

That should allow a debugger to still print the 10 in a backtrace when breaking inside runtime.printint64. I can't say for certain, because debugging TinyGo code is still somewhat limited, especially around things like interfaces and goroutines.

I wish you all the best and happy continued coding!

Thanks a lot 😄

@mewmew
Copy link

mewmew commented Jan 7, 2020

Hi @aykevl,

Thanks for the detailed reply!

While that would be quite interesting, that is difficult to do in the current design for a few reasons. First of all, most compiled code depends on the runtime package for various reasons (initialization, heap, map operations).

The way I'd envision using this functionality would be to compile individual Go packages to LLVM IR, and let this LLVM IR module be the build artefact of the compiled package; similar to how $GOPATH/pkg/github.com/foo/bar/bar.a contains object files for the compiled packages, we would now have a $GOPATH/pkg/github.com/foo/bar/bar.ll (or bar.bc).

Second, whole-program optimizations (which are in fact required in the current design of TinyGo) can obviously not be done for individual packages. However, I could imagine we'd eventually have -buildmode=c-archive to achieve a similar effect.

The final step of linking would enable whole-program optimization. No need to do it at an earlier stage. So linking main.ll, pkg/github.com/foo/bar/bar.ll and tinygo/builtin.ll, tinygo/runtime.ll (or something along those lines) with llvm-link would produce a final LLVM IR file upon which we may perform whole-program optimization.

Of course, I understand that if main.ll is not the LLVM IR produced by TinyGo from a Go main package, but rather LLVM IR produced by Clang from a C source file with a function main, then the C programmer would have to hook up and initialize the needed interaction with the TinyGo runtime. I don't think this would be impossible, just another step needed to enable interaction.

The benefit of not relying on c-archive mode for this would be that you could still produce statically compiled C programs which seamlessly interact with Go functions (of course tracking language boundaries at call sites to inform the GC of live pointers passed as arguments between the two.)

You may want to look at the -printir (and -no-debug) flags. -printir dumps the IR at a much earlier stage before any optimizations are run, so you can get a better insight in how the compiler works.

Great, I'll definitely take a look at -printir and -no-debug.

Cheers,
Robin

@aykevl
Copy link
Member Author

aykevl commented Jan 13, 2021

@mewmew what you propose is indeed what I intend to get to eventually, but again the structure of the TinyGo compiler at the moment makes this difficult. Quite a bit refactoring has been done already to get to that point and it's nowhere finished (although #1571 is a significant step).

(this is a late reply but I didn't want to leave this entirely unanswered)

@mewmew
Copy link

mewmew commented Jan 13, 2021

@aykevl, I know progress take time, so no rush! Also, you initiated the issue, so I felt quite certain you would return to it as time would allow.

Very happy to see the work on making the compiler more modular. Wish you the best of hacking and a great start of the new year! :)

Cheers,
Robin

@aykevl
Copy link
Member Author

aykevl commented Mar 25, 2021

This has finally been implemented, in #1612! It only took me two years to get to this point 😅

There are many things that are still left to do, but right now TinyGo compiles packages in parallel and caches the result. This provides a small performance benefit. There are many more optimizations possible:

  • An easy one is doing most of the interp work per package. I already have a local change which does this, but unfortunately it triggers a bug in the interp package. The benefit in one test is around 40% compile time speedup on incremental builds, which is huge.
  • A much more ambitious goal is adding a non-LTO build mode. This could potentially make edit-compile-test cycles super fast, even for large programs. However, this requires a big change to interface and reflect support (and will be especially difficult with -scheduler=coroutines).

Closing now, as the initial goal (compiling packages independently) has been reached.

@aykevl aykevl closed this as completed Mar 25, 2021
@mewmew
Copy link

mewmew commented Mar 25, 2021

This has finally been implemented, in #1612! It only took me two years to get to this point sweat_smile

Yay! That's great. Thanks for working on this @aykevl, it will definitely be useful and also open up new use cases for TinyGo.

Wish you as always happy hacking and the best of springs!

Cheerful regards,
Robin

@deadprogram
Copy link
Member

I will reopen to tag for next release, then close after it is released.

@deadprogram deadprogram reopened this Mar 25, 2021
@deadprogram deadprogram added the next-release Will be part of next release label Mar 25, 2021
@deadprogram
Copy link
Member

This was released with v0.18 so now closing. Thank you!

@deadprogram deadprogram removed the next-release Will be part of next release label May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants