Porting Boehm to Emscripten #18251

juj · 2022-11-23T20:35:08Z

I have a fork of Boehm cooking at ivmai/bdwgc@master...juj:bdwgc:emscripten , where I am looking to make the latest Boehm upstream work with Emscripten. (it looks like its Emscripten support has regressed by quite a bit since Unity has last synced with upstream Boehm many many years ago)

My idea is to experiment with a couple of different strategies for Boehm + Emscripten support:

"the way it was intended to work": using Binaryen's --spill-pointers pass that @kripken restored to work this summer, to automatically spill all pointers from wasm locals to the stack, so that Boehm will be able to scan the stack,
"the way that Unity currently uses Boehm": i.e. enable only "asynchronous GC collection", where GC collection only occurs when no wasm code (with managed local functions) is on the stack, for example via a delayed "setTimeout(GC_gcollect, 0);" type of call.
"cooperative garbage collection": this is the idea from Smarter way for preventing LLVM from placing a function local variable on the hidden WebAssembly stack? #17131, experimenting with scenarios where user code is able to manually annotate all GC pointers via some kind of a DECLARE_GC_POINTER(myPtr);. While this would be brittle, it is something that might work well in automatic codegen scenarios where correctness can be verified.

Looking at current Boehm Emscripten support, someone landed a change there that switches Boehm to require -sASYNCIFY to be enabled at the same time, thinking that it would enable stack scanning. ( ivmai/bdwgc@1431bda#diff-f82cc1b932242998b29875a64e8490b12c461ba04ab72e59ef0ac5b2efa8af4a ).

While I am not intimate with the 100% details of ASYNCIFY, I was pondering that maybe that would not be correct at all? @kripken true or false: ASYNCIFY only spills locals to the stack only in those functions that can lead to calling asyncified functions, but not in all of the functions? In other words, enabling ASYNCIFY is not enough to substitute the current need for a --spill-pointers pass? I.e. simply enabling ASYNCIFY would not properly spill pointers on the stack?

Also doesn't ASYNCIFY use its own stack that is separate from the regular stack? (would emscripten_scan_stack() scan this Asyncify stack?)

How would I appropriately enable the --spill-pointers pass in Binaryen? I've so far tried just to do

diff --git a/emcc.py b/emcc.py
index f2dc5821a..28ee19f4d 100755
--- a/emcc.py
+++ b/emcc.py
@@ -594,6 +594,8 @@ def should_run_binaryen_optimizer():

 def get_binaryen_passes():
   passes = []
+  if not settings.BOOTSTRAPPING_STRUCT_INFO:
+    passes += ['--spill-pointers']
   optimizing = should_run_binaryen_optimizer()
   # safe heap must run before post-emscripten, so post-emscripten can apply the sbrk ptr
   if settings.SAFE_HEAP:

though that gets me correctness problems. Looks like printf() starts to fail with that, and most of the core0 tests in the suite as well. I am not sure if there might be some ordering dependencies with other passes that this spilling pointers pass should be run at? Or maybe something might be off with the --spill-pointers pass?

The text was updated successfully, but these errors were encountered:

juj · 2022-11-23T20:39:15Z

I am not sure if there might be some ordering dependencies with other passes that this spilling pointers pass should be run at?

(if I try to add the --spill-pointers pass as the last pass, looks like it does give the same result, so maybe not an ordering issue here)

kripken · 2022-11-28T23:20:08Z

First thing, have you seen emscripten_scan_registers? That uses Asyncify to scan state in "registers" (spilled or not), so it could be an option here.

ASYNCIFY only spills locals to the stack only in those functions that can lead to calling asyncified functions, but not in all of the functions? In other words, enabling ASYNCIFY is not enough to substitute the current need for a --spill-pointers pass? I.e. simply enabling ASYNCIFY would not properly spill pointers on the stack?

Asyncify should be enough if "asyncified functions" == "GCing functions". That is, Asyncify will save stuff on locals all the way up the current stack when you pause/resume, and so if GC functions are async functions then each GC will be able to see locals up the stack. And those are the only relevant locals so it should be enough (aside from the heap of course).

Spill-pointers can still be useful, if you don't have a clear set of GCing functions perhaps, and also spill-pointers has different overhead (Asyncify instruments control flow too, and adds code to read spilled registers, not just write them). I've never compared them but I'd hope spill-pointers could be faster.

Also doesn't ASYNCIFY use its own stack that is separate from the regular stack?

Yes, it uses data in the heap somewhere (which is internally a stack). So you need to scan that region in addition to the normal user stack. I believe emscripten_scan_registers does that.

How would I appropriately enable the --spill-pointers pass in Binaryen?

I think you can use `-sBINARYEN_EXTRA_PASSES=--spill-pointers".

But the correctness problems you saw seem worrying - that sounds like a possible bug in spill-pointers, I can't think of another explanation. It might be worth reducing a testcase, or alternatively adding --spill-pointers to the list of passes in the binaryen fuzzer and running the fuzzer for a while (that will find more easily-reducible things). (For that, you can add to this list, or even just replace the list with just the pass you want.)

juj · 2022-11-30T14:17:20Z

Asyncify should be enough if "asyncified functions" == "GCing functions

Thanks, that is what I was expecting. So basically this means that in order for ASYNCIFY to work as a solution, that means that the function GC_gcollect(); which performs the mark+sweep garbage collection should be marked asynchronous(?).

Or actually, looking at Boehm's internals, the code structure it is currently used with looks like this:

void (*gc_scan_roots_func)(void);

void GC_gcollect()
{
   // register roots:
   // ...
   gc_scan_roots_func();

   // marking:
   ...

   // sweeping:
   ...
}
...
static void scan_regs_cb(void *begin, void *end)
{
  GC_push_all_stack((ptr_t)begin, (ptr_t)end);
}

void GC_default_push_other_roots()
{
  emscripten_scan_registers(scan_regs_cb);
}
gc_scan_roots_func = &GC_default_push_other_roots;

That is, Boehm allows the user to set an extra callback to inject custom roots, and an Emscripten specific callback is registered there to call emscripten_scan_registers().

But that call is done via an indirect function pointer dispatch - does that affect ASYNCIFY's ability to infer that GC_gcollect() should be async? Or should it be manually declared as such on the command line? (or the call from GC_gcollect() -> GC_default_push_other_roots() be made a direct function call?)

Spill-pointers can still be useful, if you don't have a clear set of GCing functions perhaps, and also spill-pointers has different overhead (Asyncify instruments control flow too, and adds code to read spilled registers, not just write them). I've never compared them but I'd hope spill-pointers could be faster.

That makes sense. I am not actually interested at all in using ASYNCIFY in conjunction with GCing at Unity, but the reason I am looking into this is that someone changed upstream Boehm to hard-depend on ASYNCIFY (regressing our use case), so I want to restore our use case there, but in order to do it right, I wanted to double check that the ASYNCIFY implementation there even works like would be expected.

But the correctness problems you saw seem worrying - that sounds like a possible bug in spill-pointers, I can't think of another explanation. It might be worth reducing a testcase

I do have a reduced test case cooking, I'll try to get that down to a minimum.

In one test case I see that --spill-pointers is working ok in -O0, but fails with -O1. Do you anticipate currently that there would be any reason for -O1 to be incompatible with --spill-pointers?

kripken · 2022-11-30T18:00:45Z

So basically this means that in order for ASYNCIFY to work as a solution, that means that the function GC_gcollect(); which performs the mark+sweep garbage collection should be marked asynchronous(?).

Yes. That is automatic if the function is an import. Otherwise, add it to ASYNCIFY_ADD.

But that call is done via an indirect function pointer dispatch - does that affect ASYNCIFY's ability to infer that GC_gcollect() should be async?

If you don't flip the default behavior, all indirect calls are assumed to be relevant (we could track by the function type, but don't atm.) So that should just work, but in a big program indirect calls will mean lots of code gets instrumented, sadly.

Interesting btw that upstream Boehm GC was using emscripten_scan_registers - I didn't know that!

Do you anticipate currently that there would be any reason for -O1 to be incompatible with --spill-pointers?

No. It should be a simple and safe transform. So there must be a weird bug somewhere...

juj · 2022-12-01T14:09:44Z

The smallest test case I currently have is this:

#include <stdio.h>
#include <stdarg.h>

void xprintf(const char *restrict fmt, ...)
{
	va_list ap;
	va_start(ap, fmt);
	vfprintf(stdout, fmt, ap);
	va_end(ap);
}

int main()
{
	xprintf("%d\n", 5);
}

When built with e.g. emcc a.c -o a.js -sBINARYEN_EXTRA_PASSES=--spill-pointers -O1, it incorrectly prints out 0, but with -O0, it prints out 5.

To be able to easier compare what is different with -O0 vs -O1, I applied the following local changes to Emscripten:

diff --git a/emcc.py b/emcc.py
index 55f308f6f..4cc85da8d 100755
--- a/emcc.py
+++ b/emcc.py
@@ -1890,6 +1894,7 @@ def phase_linker_setup(options, state, newargs):
     # However, for debugability is better to have the stack come first
     # (becuase stack overflows will trap rather than corrupting data).
     settings.STACK_FIRST = True
+  settings.STACK_FIRST = False

   # Default to TEXTDECODER=2 (always use TextDecoder to decode UTF-8 strings)
   # in -Oz builds, since custom decoder for UTF-8 takes up space.
@@ -3637,7 +3642,7 @@ def phase_binaryen(target, options, wasm_target):

     wasm2js = building.wasm2js(wasm2js_template,
                                wasm_target,
-                               opt_level=settings.OPT_LEVEL,
+                               opt_level=0,
                                minify_whitespace=minify_whitespace(),
                                use_closure_compiler=options.use_closure_compiler,
                                debug_info=debug_info,

and built the test case with

emcc a.c -o a1.js -sBINARYEN_EXTRA_PASSES=--spill-pointers -O0 -g3 -sWASM=0 -sASSERTIONS
emcc a.c -o a2.js -sBINARYEN_EXTRA_PASSES=--spill-pointers -O1 -g3 -sWASM=0 -sASSERTIONS

and the relevant diff I get is

The different parts are marked in purple. It looks like the relevant issue is the difference on lines

__stack_pointer = $3 - 48 | 0;

vs

__stack_pointer = $3 - 16 | 0;

The other changes don't seem relevant.

I don't understand why there is such a large 48 byte stack frame bump, it seems that is too large compared to what is actually needed. But when -O1 replaces that code with only a 16 byte bump, that breaks the code.

juj · 2022-12-01T14:24:49Z

The behavior with the second stack frame bumping looks odd overall. In the -O0 spill-pointers case on the left, there is

$3 = __stack_pointer;
__stack_pointer = $3 - 48 | 0;
...
HEAP32[($3 + 16 | 0) >> 2] = $2;
...
__stack_pointer = $3;

but that construct does not look like a correct stack frame bump operation? The line where $2 is stored, doesn't that go on the calling function's stack space, since $3 represents the pointer before the stack bump?

(Also the value of $2 is the value of the stack pointer, not sure why it would be spilling the stack pointer itself?)

I would expect that the second stack frame bump should instead read like the first stack frame bump, i.e.:

$3 = __stack_pointer - 48 | 0;
__stack_pointer = $3 | 0;
...
HEAP32[($3 + 16 | 0) >> 2] = $2;
...
__stack_pointer = $3 + 48 | 0;

then the HEAP write would not go out of bounds?

kripken · 2022-12-01T19:13:52Z

Hmm, maybe the error is we run out of stack? Though it's strange it happens in -O1 which uses less stack than -O0. Perhaps it overwrites other data on the stack, then. We should check how the LLVM backend manages the stack pointer, I'm not sure if it needs to be bumped before or after use, in those code fragments.

Btw, we could actually do this in a better way now: the pass could create a new Memory just for spilling, and operate on that (and trap if we run out of space). That would be simple and avoid any risks of bad interactions with the normal stack. We have a multimemory lowering pass now which would let that run in all VMs, even ones without multimemory support. We've recently added that support in Asyncify and wasm-split actually so this would be pretty simple to do.

sbc100 · 2022-12-01T19:30:15Z

For details of how llvm treats the stack pointer see: https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md#the-linear-stack.

juj · 2022-12-01T19:56:20Z

Hmm, maybe the error is we run out of stack?

I don't think this is an out of stack error, but to me it reads that the second stack is being spilled incorrectly? The stack frame generation code pattern for spilling does not seem to be proper.

We should check how the LLVM backend manages the stack pointer, I'm not sure if it needs to be bumped before or after use, in those code fragments.

The regular stack bump without --spill-pointers looks like this:

$0 = __stack_pointer - <stack_frame_size> | 0;
__stack_pointer = $0 | 0;
...
HEAP32[($0 + x | 0) >> 2] = some_data_on_stack;
...
__stack_pointer = $0 + <stack_frame_size> | 0;

I'd expect that the --spill-pointers stack bump logic should look the same as this? (ideally these two would be merged together to create just one bump, but that's probably a separate optimization)

Also the stack bump amount that --spill-pointers does is excessively big, though there shouldn't be any harm in that.

There is also a "to-optimize-later" performance issue with spilling that it seems that the spilled data is later read, but if I understand correctly, the pointers that are spilled on the stack could be treated in a write-only "fire and forget" fashion, there is no reason to read those pointers back later? (they will be the same)

Btw, we could actually do this in a better way now: the pass could create a new Memory just for spilling, and operate on that (and trap if we run out of space)

Hmm, that does not sound that exciting. I think it would probably be best to keep this simple, since there is no real need for multimemories. (how would that extend to multiple threads?)

kripken · 2022-12-01T20:15:46Z

Yes, could be that the simplest path here is to modify the pass to emit the same type of code as the LLVM backend does. That should work. My idea with multimemories was that it would give a guarantee of not interacting badly with anything else since it would be separate (which is a nice property that we now have as an option in Asyncify and wasm-split) but it would be more work, assuming the only needed fix otherwise is how the stack pointer is bumped.

juj · 2022-12-02T14:40:33Z

Ok, found the issue - I'll submit a PR soon.

It looks like Asyncify does not work with spill pointers, not quite sure why though. Anything you think would be incompatible there? (it's not particularly important I suppose, these features are likely never used together, given that asyncify has its own mechanism it can use)

kripken · 2022-12-02T16:45:52Z

Hmm, maybe it's possible that spill-pointers uses the stack pointer local before asyncify restored it during rewinding, or something like that? I can imagine that happening if the order is asyncify and then spill-pointers. But I'd expect the order of spill-pointers and then asyncify to always work.

waywardmonkeys · 2023-02-21T15:14:43Z

@juj Did you end up submitting a PR? Will --spill-pointers work if I'm not using -s ASYNCIFY?

hoodmane · 2023-03-20T09:56:11Z

@waywardmonkeys it looks like @juj submitted WebAssembly/binaryen#5315

juj · 2023-03-20T10:01:19Z

Ops, sorry, missed your original ping @waywardmonkeys .

Yeah, I have the above PR cooking, though I found that that PR alone did not fix Boehm to work without ASYNCIFY, but something else is amiss in either the --spill-pointers pass or elsewhere. Though I was pulled to solve another issue so haven't yet had a moment to come back to debug further what the remaining issues there are.

tanishiking mentioned this issue Jan 18, 2024

Proposal: add support for WebAssembly target scala-native/scala-native#603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Porting Boehm to Emscripten #18251

Porting Boehm to Emscripten #18251

juj commented Nov 23, 2022

juj commented Nov 23, 2022

Uh oh!

kripken commented Nov 28, 2022

Uh oh!

juj commented Nov 30, 2022 •

edited

Loading

Uh oh!

kripken commented Nov 30, 2022

Uh oh!

juj commented Dec 1, 2022 •

edited

Loading

Uh oh!

juj commented Dec 1, 2022

Uh oh!

kripken commented Dec 1, 2022

Uh oh!

sbc100 commented Dec 1, 2022

Uh oh!

juj commented Dec 1, 2022

Uh oh!

kripken commented Dec 1, 2022

Uh oh!

juj commented Dec 2, 2022 •

edited

Loading

Uh oh!

kripken commented Dec 2, 2022

Uh oh!

waywardmonkeys commented Feb 21, 2023

Uh oh!

hoodmane commented Mar 20, 2023

Uh oh!

juj commented Mar 20, 2023

Uh oh!

Porting Boehm to Emscripten #18251

Porting Boehm to Emscripten #18251

Comments

juj commented Nov 23, 2022

juj commented Nov 23, 2022

Uh oh!

kripken commented Nov 28, 2022

Uh oh!

juj commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Nov 30, 2022

Uh oh!

juj commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juj commented Dec 1, 2022

Uh oh!

kripken commented Dec 1, 2022

Uh oh!

sbc100 commented Dec 1, 2022

Uh oh!

juj commented Dec 1, 2022

Uh oh!

kripken commented Dec 1, 2022

Uh oh!

juj commented Dec 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Dec 2, 2022

Uh oh!

waywardmonkeys commented Feb 21, 2023

Uh oh!

hoodmane commented Mar 20, 2023

Uh oh!

juj commented Mar 20, 2023

Uh oh!

juj commented Nov 30, 2022 •

edited

Loading

juj commented Dec 1, 2022 •

edited

Loading

juj commented Dec 2, 2022 •

edited

Loading