-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Porting Boehm to Emscripten #18251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(if I try to add the --spill-pointers pass as the last pass, looks like it does give the same result, so maybe not an ordering issue here) |
First thing, have you seen emscripten_scan_registers? That uses Asyncify to scan state in "registers" (spilled or not), so it could be an option here.
Asyncify should be enough if "asyncified functions" == "GCing functions". That is, Asyncify will save stuff on locals all the way up the current stack when you pause/resume, and so if GC functions are async functions then each GC will be able to see locals up the stack. And those are the only relevant locals so it should be enough (aside from the heap of course). Spill-pointers can still be useful, if you don't have a clear set of GCing functions perhaps, and also spill-pointers has different overhead (Asyncify instruments control flow too, and adds code to read spilled registers, not just write them). I've never compared them but I'd hope spill-pointers could be faster.
Yes, it uses data in the heap somewhere (which is internally a stack). So you need to scan that region in addition to the normal user stack. I believe
I think you can use `-sBINARYEN_EXTRA_PASSES=--spill-pointers". But the correctness problems you saw seem worrying - that sounds like a possible bug in spill-pointers, I can't think of another explanation. It might be worth reducing a testcase, or alternatively adding |
Thanks, that is what I was expecting. So basically this means that in order for ASYNCIFY to work as a solution, that means that the function Or actually, looking at Boehm's internals, the code structure it is currently used with looks like this: void (*gc_scan_roots_func)(void);
void GC_gcollect()
{
// register roots:
// ...
gc_scan_roots_func();
// marking:
...
// sweeping:
...
}
...
static void scan_regs_cb(void *begin, void *end)
{
GC_push_all_stack((ptr_t)begin, (ptr_t)end);
}
void GC_default_push_other_roots()
{
emscripten_scan_registers(scan_regs_cb);
}
gc_scan_roots_func = &GC_default_push_other_roots; That is, Boehm allows the user to set an extra callback to inject custom roots, and an Emscripten specific callback is registered there to call But that call is done via an indirect function pointer dispatch - does that affect ASYNCIFY's ability to infer that
That makes sense. I am not actually interested at all in using ASYNCIFY in conjunction with GCing at Unity, but the reason I am looking into this is that someone changed upstream Boehm to hard-depend on ASYNCIFY (regressing our use case), so I want to restore our use case there, but in order to do it right, I wanted to double check that the ASYNCIFY implementation there even works like would be expected.
I do have a reduced test case cooking, I'll try to get that down to a minimum. In one test case I see that --spill-pointers is working ok in -O0, but fails with -O1. Do you anticipate currently that there would be any reason for -O1 to be incompatible with --spill-pointers? |
Yes. That is automatic if the function is an import. Otherwise, add it to
If you don't flip the default behavior, all indirect calls are assumed to be relevant (we could track by the function type, but don't atm.) So that should just work, but in a big program indirect calls will mean lots of code gets instrumented, sadly. Interesting btw that upstream Boehm GC was using
No. It should be a simple and safe transform. So there must be a weird bug somewhere... |
The smallest test case I currently have is this: #include <stdio.h>
#include <stdarg.h>
void xprintf(const char *restrict fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stdout, fmt, ap);
va_end(ap);
}
int main()
{
xprintf("%d\n", 5);
} When built with e.g. To be able to easier compare what is different with -O0 vs -O1, I applied the following local changes to Emscripten: diff --git a/emcc.py b/emcc.py
index 55f308f6f..4cc85da8d 100755
--- a/emcc.py
+++ b/emcc.py
@@ -1890,6 +1894,7 @@ def phase_linker_setup(options, state, newargs):
# However, for debugability is better to have the stack come first
# (becuase stack overflows will trap rather than corrupting data).
settings.STACK_FIRST = True
+ settings.STACK_FIRST = False
# Default to TEXTDECODER=2 (always use TextDecoder to decode UTF-8 strings)
# in -Oz builds, since custom decoder for UTF-8 takes up space.
@@ -3637,7 +3642,7 @@ def phase_binaryen(target, options, wasm_target):
wasm2js = building.wasm2js(wasm2js_template,
wasm_target,
- opt_level=settings.OPT_LEVEL,
+ opt_level=0,
minify_whitespace=minify_whitespace(),
use_closure_compiler=options.use_closure_compiler,
debug_info=debug_info, and built the test case with
and the relevant diff I get is The different parts are marked in purple. It looks like the relevant issue is the difference on lines
vs
The other changes don't seem relevant. I don't understand why there is such a large 48 byte stack frame bump, it seems that is too large compared to what is actually needed. But when -O1 replaces that code with only a 16 byte bump, that breaks the code. |
The behavior with the second stack frame bumping looks odd overall. In the -O0 spill-pointers case on the left, there is $3 = __stack_pointer;
__stack_pointer = $3 - 48 | 0;
...
HEAP32[($3 + 16 | 0) >> 2] = $2;
...
__stack_pointer = $3; but that construct does not look like a correct stack frame bump operation? The line where $2 is stored, doesn't that go on the calling function's stack space, since (Also the value of $2 is the value of the stack pointer, not sure why it would be spilling the stack pointer itself?) I would expect that the second stack frame bump should instead read like the first stack frame bump, i.e.: $3 = __stack_pointer - 48 | 0;
__stack_pointer = $3 | 0;
...
HEAP32[($3 + 16 | 0) >> 2] = $2;
...
__stack_pointer = $3 + 48 | 0; then the HEAP write would not go out of bounds? |
Hmm, maybe the error is we run out of stack? Though it's strange it happens in Btw, we could actually do this in a better way now: the pass could create a new Memory just for spilling, and operate on that (and trap if we run out of space). That would be simple and avoid any risks of bad interactions with the normal stack. We have a multimemory lowering pass now which would let that run in all VMs, even ones without multimemory support. We've recently added that support in Asyncify and wasm-split actually so this would be pretty simple to do. |
For details of how llvm treats the stack pointer see: https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md#the-linear-stack. |
I don't think this is an out of stack error, but to me it reads that the second stack is being spilled incorrectly? The stack frame generation code pattern for spilling does not seem to be proper.
The regular stack bump without --spill-pointers looks like this: $0 = __stack_pointer - <stack_frame_size> | 0;
__stack_pointer = $0 | 0;
...
HEAP32[($0 + x | 0) >> 2] = some_data_on_stack;
...
__stack_pointer = $0 + <stack_frame_size> | 0; I'd expect that the --spill-pointers stack bump logic should look the same as this? (ideally these two would be merged together to create just one bump, but that's probably a separate optimization) Also the stack bump amount that --spill-pointers does is excessively big, though there shouldn't be any harm in that. There is also a "to-optimize-later" performance issue with spilling that it seems that the spilled data is later read, but if I understand correctly, the pointers that are spilled on the stack could be treated in a write-only "fire and forget" fashion, there is no reason to read those pointers back later? (they will be the same)
Hmm, that does not sound that exciting. I think it would probably be best to keep this simple, since there is no real need for multimemories. (how would that extend to multiple threads?) |
Yes, could be that the simplest path here is to modify the pass to emit the same type of code as the LLVM backend does. That should work. My idea with multimemories was that it would give a guarantee of not interacting badly with anything else since it would be separate (which is a nice property that we now have as an option in Asyncify and wasm-split) but it would be more work, assuming the only needed fix otherwise is how the stack pointer is bumped. |
Ok, found the issue - I'll submit a PR soon. It looks like Asyncify does not work with spill pointers, not quite sure why though. Anything you think would be incompatible there? (it's not particularly important I suppose, these features are likely never used together, given that asyncify has its own mechanism it can use) |
Hmm, maybe it's possible that spill-pointers uses the stack pointer local before asyncify restored it during rewinding, or something like that? I can imagine that happening if the order is asyncify and then spill-pointers. But I'd expect the order of spill-pointers and then asyncify to always work. |
@juj Did you end up submitting a PR? Will |
@waywardmonkeys it looks like @juj submitted WebAssembly/binaryen#5315 |
Ops, sorry, missed your original ping @waywardmonkeys . Yeah, I have the above PR cooking, though I found that that PR alone did not fix Boehm to work without ASYNCIFY, but something else is amiss in either the --spill-pointers pass or elsewhere. Though I was pulled to solve another issue so haven't yet had a moment to come back to debug further what the remaining issues there are. |
I have a fork of Boehm cooking at ivmai/bdwgc@master...juj:bdwgc:emscripten , where I am looking to make the latest Boehm upstream work with Emscripten. (it looks like its Emscripten support has regressed by quite a bit since Unity has last synced with upstream Boehm many many years ago)
My idea is to experiment with a couple of different strategies for Boehm + Emscripten support:
--spill-pointers
pass that @kripken restored to work this summer, to automatically spill all pointers from wasm locals to the stack, so that Boehm will be able to scan the stack,setTimeout(GC_gcollect, 0);
" type of call.DECLARE_GC_POINTER(myPtr);
. While this would be brittle, it is something that might work well in automatic codegen scenarios where correctness can be verified.Looking at current Boehm Emscripten support, someone landed a change there that switches Boehm to require -sASYNCIFY to be enabled at the same time, thinking that it would enable stack scanning. ( ivmai/bdwgc@1431bda#diff-f82cc1b932242998b29875a64e8490b12c461ba04ab72e59ef0ac5b2efa8af4a ).
While I am not intimate with the 100% details of ASYNCIFY, I was pondering that maybe that would not be correct at all? @kripken true or false: ASYNCIFY only spills locals to the stack only in those functions that can lead to calling asyncified functions, but not in all of the functions? In other words, enabling ASYNCIFY is not enough to substitute the current need for a
--spill-pointers
pass? I.e. simply enabling ASYNCIFY would not properly spill pointers on the stack?Also doesn't ASYNCIFY use its own stack that is separate from the regular stack? (would
emscripten_scan_stack()
scan this Asyncify stack?)How would I appropriately enable the
--spill-pointers
pass in Binaryen? I've so far tried just to dothough that gets me correctness problems. Looks like printf() starts to fail with that, and most of the core0 tests in the suite as well. I am not sure if there might be some ordering dependencies with other passes that this spilling pointers pass should be run at? Or maybe something might be off with the
--spill-pointers
pass?The text was updated successfully, but these errors were encountered: