wasm: improve malloc/free heap tracking #3162

aykevl · 2022-09-15T10:21:13Z

Using a slice requires a lot less in code size than a map. This is visible when compiling a very small "hello world" style program.

Before tracking memory in malloc/free: 2873 bytes
With tracking using a map: 6551 bytes
With a slice instead of a map: 3532 bytes

Of course, most of this code size increase won't be visible with #3142, but it's still a saving of around 3kB in this minimal example.

dgryski · 2022-09-15T14:16:06Z

Should we be concerned about the change from O(1) to O(n) time for tracking allocations?

aykevl · 2022-09-15T15:39:20Z

Perhaps. But the GC itself is also rather inefficient. We could of course switch this algorithm to an algorithm with the same complexity as the one in the GC (which is probably only marginally better).

dgryski · 2022-09-15T16:32:47Z

Fair enough. Let's leave the simple implementation for now and if it becomes an issue we can maybe make a build-tag to drop in a map-based implementation for people who care about it.

dgryski

Two questions, but otherwise LGTM.

dgryski · 2022-09-15T16:34:41Z

src/runtime/arch_tinygowasm.go

+			allocs[i] = nil
+			return
+		}
+	}


Do we want to complain if we can't find the pointer?

We could, but it's undefined behavior to call free on an invalid pointer so we're not required to.
But of course we could do it when gcAsserts is true for example.

dgryski · 2022-09-15T16:37:08Z

src/runtime/arch_tinygowasm.go

-		panic("free: invalid pointer")
-	}
+	removeAlloc(ptr)
+	free(ptr)


Just want to confirm that with the //export free line, calling free() here still does the right thing.

Yes, it calls the function free in gc_conservative.go. The //export line does not affect the namespace of the Go package.

deadprogram · 2022-09-15T17:30:28Z

Looks like this might be an actual error? https://github.com/tinygo-org/tinygo/actions/runs/3059652993/jobs/4943124870#step:9:58

aykevl · 2022-09-15T20:57:20Z

Yes, that looks like a real bug. I'm glad #3148 added some tests!

anuraaga

Is it possible to reference the -opt flag and pick between slice and map? We'll be executing a lot of logic compiled from C++ or Rust and its unclear what pattern the malloc / free would take. I understand the code size savings is important, but having the chance of pathological behavior makes it scary to run any code compiled with TinyGo doing polyglot. If the slice could be used with -opt=s or z, and map used with 2, that could allow both niches to be satisfied with opts that already reflect those semantics.

anuraaga · 2022-09-16T00:09:24Z

src/runtime/arch_tinygowasm.go

-var allocs = make(map[uintptr][]byte)
+func removeAlloc(ptr unsafe.Pointer) {
+	// Remove the pointer so it can be garbage collected.
+	for i, slot := range allocs {


Consider iterating in reverse order, complete heuristic but I think newer pointers tend to be freed sooner than older pointers in real code.

codefromthecrypt · 2022-09-16T01:11:51Z

I think the tension here, correct me if I'm wrong, is basically you can't override this (a part of how GC works) with build tags until tinygo adds them. chicken egg. Then, the next tension is proving the performance is worse vs having the next impl prove it isn't worse.

Strings are routinely used in plugins. ex regular expression matches etc. It would help in the future balance size/perf in a less conjecturey way. Ex place a representative main.go here that exports regex functions, and benchmark using them. I think that would help balance some discussions in general. A bias towards size can exist with guards against severe performance regressions.

my 2p

aykevl · 2022-09-29T11:55:52Z

Is it possible to reference the -opt flag and pick between slice and map?

It is possible, but wouldn't help much when it comes to algorithmic complexity. The heap allocator itself is also O(n) in the worst case, where n is the number of blocks (heap size divided by 16, usually). This is why I expect this PR won't have a lot of effect in practice.

Generally when it comes to performance, I want to act on data. In this case, I do have data on code size (because it is easy to measure) but I do not have data on performance. We can always change the implementation if benchmarks suggest this is a bottleneck.

anuraaga · 2022-09-29T14:10:53Z

Generally when it comes to performance, I want to act on data. In this case, I do have data on code size (because it is easy to measure) but I do not have data on performance.

I do agree with the sentiment but feel that if there are clear pathological cases then those are also similarly clear.

But I didn't know allocation is similarly pathological - in that case indeed this does not need a change.

Is it fair to say TinyGo's wasi target shouldn't be used in cases where performance is a concern given the allocator's limitations?

dgryski · 2022-09-29T14:55:10Z

If performance is a concern, maybe you shouldn't be using WASM...

To be fair, we have the gc disabled for some of our services: -gc=leaking is fine for short-lived processes if their initial heap is set correctly.

anuraaga · 2022-09-29T15:03:10Z

If performance is a concern, maybe you shouldn't be using WASM...

WASM is the only way to extend Envoy for now so we're focusing on it for better or worse 😅 Unfortunately a leaking mode also isn't supported there, but maybe it should be.

For a bit of context, we're trying to bring Coraza WAF to Envoy via WASM - for performance reasons, we've swapped out several libraries from Go to C(++) or Rust

https://github.com/anuraaga/coraza-wasm-filter/tree/main/lib

The biggest was regexp itself with ~5x overall performance improvement from swapping in libre2. But other's like aho-corasick also provide 20% - the numbers are e2e not microbenchmark.

I think this means that while native will of course perform the best, wasm does have potential for reasonable performance. While I am curious what may be causing such drastic performance difference, if TinyGo itself is designed in a way that we expect lower performance than other compilers, it wouldn't be productive to do a deep dive. So I'm just trying to understand the vision there.

If small code size is the end all (as tiny would represent which makes a lot of sense!) then I would suggest that browser rather than wasi would be the target that fits better with the vision. But that's just a straw-man's proposal. Sorry if too much noise

aykevl · 2022-09-29T15:15:02Z

Performance of TinyGo compiled code varies a lot. I've seen cases where TinyGo outperformed standard Go by a wide margin (for integer heavy code that doesn't allocate memory). For other cases, TinyGo will be a lot slower. In general, you can expect C like code to perform well but code that uses the heap, goroutines, etc to perform much worse.

In particular, for WebAssembly, TinyGo has to jump through hoops to get it supported. WebAssembly is just a really weird instruction set that is incredibly difficult for a Go-like language to target whereas other architectures are much simpler to support. This is getting improved slowly (with exception handling, stack switching, and a GC) but this will likely take years. Once these features are supported in WebAssembly and used by TinyGo, I expect TinyGo binaries will perform not very different from C and Rust (perhaps a bit slower but not by much). Until then, we have to keep our workarounds in place which inevitably means code will run slower.

That said, the GC could certainly be improved. It works, but it wasn't optimized for speed (for example, the heap is fully conservative). I believe @dgryski is investigating how to improve this.

dgryski · 2022-09-29T19:05:41Z

The two performance improvements I see for the current garbage collector (without a large scale rewrite) are:

increase the size of the mark work queue; an overflow requires an additional scan of the entire heap
make note of which allocations do not contain any pointers and this don't require scanning

The first one is easy (although a bit tricky to get right so as not to affect performance on low-end embedded machines) and the second one requires a bit more work to to track the newly required information.

anuraaga · 2022-09-30T01:26:50Z

Thanks for the context everyone - definitely would be nice if the GC spec goes somewhere especially for polyglot! Those GC optimizations, especially the second one seems quite compelling too.

dgryski · 2022-09-30T01:29:56Z

TinyGo is likely to benefit from the stack switching proposal (for goroutines) but not the garbage collection one (because Go supports interior pointers which the proposal doesn't allow.)

aykevl · 2022-09-30T12:14:03Z

but not the garbage collection one (because Go supports interior pointers which the proposal doesn't allow.)

It can probably still use the WebAssembly GC if all struct methods (etc) are made heap objects. For example, a struct like this:

type Point struct {
    X, Y int
}

could be compiled like this:

type Point struct {
    X, Y *int
}

This way, it is possible to have interior pointers at the cost of a large heap increase.
This will be even worse for things like byte slices (where it is possible to construct a pointer to each individual byte) but there may be workarounds for that too, like fat pointers. Still, it'll be rather difficult to get this working.

aykevl · 2022-09-30T12:19:12Z

@anuraaga one way you could help here is by making a good (realistic) GC benchmark. Having a good benchmark would help a lot to improve the GC.

uintptr is not tracked by the GC, while any pointer type (including unsafe.Pointer) is tracked. Make sure to only cast pointers to uintptr when absolutely necessary. This fixes a bug found in #3162.

Using a slice requires a lot less in code size than a map. This is visible when compiling a very small "hello world" style program. Before tracking memory in malloc/free: 2873 bytes With tracking using a map: 6551 bytes With a slice instead of a map: 3532 bytes Of course, most of this code size increase won't be visible with #3142, but it's still a saving of around 3kB in this minimal example.

aykevl · 2022-09-30T12:30:56Z

Updated the PR. The PR was failing because the tests actually contained a bug: they passed pointers around as uintptr. The uintptr type is an integer, not a pointer, so the GC wasn't tracking it and the memory was freed before all references were gone.
The first commit fixes the bug, the second commit is the original commit of this PR.

dgryski · 2022-09-30T16:14:15Z

One of the gc-heavy benchmarks I've been playing with is the binarytrees program from the Benchmarks Game: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-go-2.html

anuraaga · 2022-09-30T23:34:27Z

The uintptr type is an integer, not a pointer, so the GC wasn't tracking it and the memory was freed before all references were gone.

This was intentional - the tests are calling malloc and the pointer should be valid until free is called even treated as an integer. If this code was c++ it would not be tracked either, hence why the allocator needs to track.

I'm not sure why but I believe I had issues using unsafe.Pointer in trackAlloc, it had to be []byte to work.

aykevl · 2022-10-01T00:04:30Z

This was intentional - the tests are calling malloc and the pointer should be valid until free is called even treated as an integer.

...you are entirely correct. Yes, the pointer should remain valid even if it is treated as uintptr.

I'm not sure why but I believe I had issues using unsafe.Pointer in trackAlloc, it had to be []byte to work.

That's interesting, sounds like a bug actually. I'll need to investigate this.

aykevl mentioned this pull request Sep 15, 2022

wasm,wasi: make sure buffers returned by malloc are not freed until f… #3148

Merged

dgryski approved these changes Sep 15, 2022

View reviewed changes

anuraaga reviewed Sep 16, 2022

View reviewed changes

aykevl added 2 commits September 30, 2022 14:28

tests: do not cast pointers to uintptr

a73e7ff

uintptr is not tracked by the GC, while any pointer type (including unsafe.Pointer) is tracked. Make sure to only cast pointers to uintptr when absolutely necessary. This fixes a bug found in #3162.

aykevl force-pushed the wasm-malloc-track branch from 158cc84 to df66143 Compare September 30, 2022 12:29

aykevl marked this pull request as draft October 1, 2022 00:04

etehtsea mentioned this pull request Oct 17, 2022

Http-go template fails spinframework/spin#820

Closed

This was referenced Oct 21, 2022

Allow custom wasm malloc implementation #3245

Merged

wasm: out of memory with lots of HeapIdle #3237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm: improve malloc/free heap tracking #3162

wasm: improve malloc/free heap tracking #3162

aykevl commented Sep 15, 2022

dgryski commented Sep 15, 2022

aykevl commented Sep 15, 2022 •

edited

Loading

dgryski commented Sep 15, 2022

dgryski left a comment

dgryski Sep 15, 2022

aykevl Sep 15, 2022

dgryski Sep 15, 2022

aykevl Sep 15, 2022

deadprogram commented Sep 15, 2022

aykevl commented Sep 15, 2022

anuraaga left a comment

anuraaga Sep 16, 2022

codefromthecrypt commented Sep 16, 2022

aykevl commented Sep 29, 2022

anuraaga commented Sep 29, 2022

dgryski commented Sep 29, 2022

anuraaga commented Sep 29, 2022

aykevl commented Sep 29, 2022

dgryski commented Sep 29, 2022

anuraaga commented Sep 30, 2022

dgryski commented Sep 30, 2022

aykevl commented Sep 30, 2022

aykevl commented Sep 30, 2022

aykevl commented Sep 30, 2022

dgryski commented Sep 30, 2022

anuraaga commented Sep 30, 2022

aykevl commented Oct 1, 2022

wasm: improve malloc/free heap tracking #3162

Are you sure you want to change the base?

wasm: improve malloc/free heap tracking #3162

Conversation

aykevl commented Sep 15, 2022

dgryski commented Sep 15, 2022

aykevl commented Sep 15, 2022 • edited Loading

dgryski commented Sep 15, 2022

dgryski left a comment

Choose a reason for hiding this comment

dgryski Sep 15, 2022

Choose a reason for hiding this comment

aykevl Sep 15, 2022

Choose a reason for hiding this comment

dgryski Sep 15, 2022

Choose a reason for hiding this comment

aykevl Sep 15, 2022

Choose a reason for hiding this comment

deadprogram commented Sep 15, 2022

aykevl commented Sep 15, 2022

anuraaga left a comment

Choose a reason for hiding this comment

anuraaga Sep 16, 2022

Choose a reason for hiding this comment

codefromthecrypt commented Sep 16, 2022

aykevl commented Sep 29, 2022

anuraaga commented Sep 29, 2022

dgryski commented Sep 29, 2022

anuraaga commented Sep 29, 2022

aykevl commented Sep 29, 2022

dgryski commented Sep 29, 2022

anuraaga commented Sep 30, 2022

dgryski commented Sep 30, 2022

aykevl commented Sep 30, 2022

aykevl commented Sep 30, 2022

aykevl commented Sep 30, 2022

dgryski commented Sep 30, 2022

anuraaga commented Sep 30, 2022

aykevl commented Oct 1, 2022

aykevl commented Sep 15, 2022 •

edited

Loading