-
Notifications
You must be signed in to change notification settings - Fork 951
wasm: improve malloc/free heap tracking #3162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Should we be concerned about the change from |
Perhaps. But the GC itself is also rather inefficient. We could of course switch this algorithm to an algorithm with the same complexity as the one in the GC (which is probably only marginally better). |
Fair enough. Let's leave the simple implementation for now and if it becomes an issue we can maybe make a build-tag to drop in a map-based implementation for people who care about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions, but otherwise LGTM.
allocs[i] = nil | ||
return | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to complain if we can't find the pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but it's undefined behavior to call free
on an invalid pointer so we're not required to.
But of course we could do it when gcAsserts
is true for example.
panic("free: invalid pointer") | ||
} | ||
removeAlloc(ptr) | ||
free(ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to confirm that with the //export free
line, calling free()
here still does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it calls the function free
in gc_conservative.go. The //export
line does not affect the namespace of the Go package.
Looks like this might be an actual error? https://github.com/tinygo-org/tinygo/actions/runs/3059652993/jobs/4943124870#step:9:58 |
Yes, that looks like a real bug. I'm glad #3148 added some tests! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reference the -opt
flag and pick between slice and map? We'll be executing a lot of logic compiled from C++ or Rust and its unclear what pattern the malloc / free would take. I understand the code size savings is important, but having the chance of pathological behavior makes it scary to run any code compiled with TinyGo doing polyglot. If the slice could be used with -opt=s
or z
, and map used with 2
, that could allow both niches to be satisfied with opts that already reflect those semantics.
var allocs = make(map[uintptr][]byte) | ||
func removeAlloc(ptr unsafe.Pointer) { | ||
// Remove the pointer so it can be garbage collected. | ||
for i, slot := range allocs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider iterating in reverse order, complete heuristic but I think newer pointers tend to be freed sooner than older pointers in real code.
I think the tension here, correct me if I'm wrong, is basically you can't override this (a part of how GC works) with build tags until tinygo adds them. chicken egg. Then, the next tension is proving the performance is worse vs having the next impl prove it isn't worse. Strings are routinely used in plugins. ex regular expression matches etc. It would help in the future balance size/perf in a less conjecturey way. Ex place a representative main.go here that exports regex functions, and benchmark using them. I think that would help balance some discussions in general. A bias towards size can exist with guards against severe performance regressions. my 2p |
It is possible, but wouldn't help much when it comes to algorithmic complexity. The heap allocator itself is also O(n) in the worst case, where n is the number of blocks (heap size divided by 16, usually). This is why I expect this PR won't have a lot of effect in practice. Generally when it comes to performance, I want to act on data. In this case, I do have data on code size (because it is easy to measure) but I do not have data on performance. We can always change the implementation if benchmarks suggest this is a bottleneck. |
I do agree with the sentiment but feel that if there are clear pathological cases then those are also similarly clear. But I didn't know allocation is similarly pathological - in that case indeed this does not need a change. Is it fair to say TinyGo's wasi target shouldn't be used in cases where performance is a concern given the allocator's limitations? |
If performance is a concern, maybe you shouldn't be using WASM... To be fair, we have the gc disabled for some of our services: |
WASM is the only way to extend Envoy for now so we're focusing on it for better or worse 😅 Unfortunately a leaking mode also isn't supported there, but maybe it should be. For a bit of context, we're trying to bring Coraza WAF to Envoy via WASM - for performance reasons, we've swapped out several libraries from Go to C(++) or Rust https://github.com/anuraaga/coraza-wasm-filter/tree/main/lib The biggest was I think this means that while native will of course perform the best, wasm does have potential for reasonable performance. While I am curious what may be causing such drastic performance difference, if TinyGo itself is designed in a way that we expect lower performance than other compilers, it wouldn't be productive to do a deep dive. So I'm just trying to understand the vision there. If small code size is the end all (as tiny would represent which makes a lot of sense!) then I would suggest that browser rather than wasi would be the target that fits better with the vision. But that's just a straw-man's proposal. Sorry if too much noise |
Performance of TinyGo compiled code varies a lot. I've seen cases where TinyGo outperformed standard Go by a wide margin (for integer heavy code that doesn't allocate memory). For other cases, TinyGo will be a lot slower. In general, you can expect C like code to perform well but code that uses the heap, goroutines, etc to perform much worse. In particular, for WebAssembly, TinyGo has to jump through hoops to get it supported. WebAssembly is just a really weird instruction set that is incredibly difficult for a Go-like language to target whereas other architectures are much simpler to support. This is getting improved slowly (with exception handling, stack switching, and a GC) but this will likely take years. Once these features are supported in WebAssembly and used by TinyGo, I expect TinyGo binaries will perform not very different from C and Rust (perhaps a bit slower but not by much). Until then, we have to keep our workarounds in place which inevitably means code will run slower. That said, the GC could certainly be improved. It works, but it wasn't optimized for speed (for example, the heap is fully conservative). I believe @dgryski is investigating how to improve this. |
The two performance improvements I see for the current garbage collector (without a large scale rewrite) are:
The first one is easy (although a bit tricky to get right so as not to affect performance on low-end embedded machines) and the second one requires a bit more work to to track the newly required information. |
Thanks for the context everyone - definitely would be nice if the GC spec goes somewhere especially for polyglot! Those GC optimizations, especially the second one seems quite compelling too. |
TinyGo is likely to benefit from the stack switching proposal (for goroutines) but not the garbage collection one (because Go supports interior pointers which the proposal doesn't allow.) |
It can probably still use the WebAssembly GC if all struct methods (etc) are made heap objects. For example, a struct like this: type Point struct {
X, Y int
} could be compiled like this: type Point struct {
X, Y *int
} This way, it is possible to have interior pointers at the cost of a large heap increase. |
@anuraaga one way you could help here is by making a good (realistic) GC benchmark. Having a good benchmark would help a lot to improve the GC. |
uintptr is not tracked by the GC, while any pointer type (including unsafe.Pointer) is tracked. Make sure to only cast pointers to uintptr when absolutely necessary. This fixes a bug found in #3162.
Using a slice requires a lot less in code size than a map. This is visible when compiling a very small "hello world" style program. Before tracking memory in malloc/free: 2873 bytes With tracking using a map: 6551 bytes With a slice instead of a map: 3532 bytes Of course, most of this code size increase won't be visible with #3142, but it's still a saving of around 3kB in this minimal example.
158cc84
to
df66143
Compare
Updated the PR. The PR was failing because the tests actually contained a bug: they passed pointers around as |
One of the gc-heavy benchmarks I've been playing with is the binarytrees program from the Benchmarks Game: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-go-2.html |
This was intentional - the tests are calling malloc and the pointer should be valid until free is called even treated as an integer. If this code was c++ it would not be tracked either, hence why the allocator needs to track. I'm not sure why but I believe I had issues using |
...you are entirely correct. Yes, the pointer should remain valid even if it is treated as
That's interesting, sounds like a bug actually. I'll need to investigate this. |
Using a slice requires a lot less in code size than a map. This is visible when compiling a very small "hello world" style program.
Before tracking memory in malloc/free: 2873 bytes
With tracking using a map: 6551 bytes
With a slice instead of a map: 3532 bytes
Of course, most of this code size increase won't be visible with #3142, but it's still a saving of around 3kB in this minimal example.