Skip to content

Generate WebAssembly code from C/C++ code [wasm JIT] #7082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexAltea opened this issue Sep 1, 2018 · 11 comments
Closed

Generate WebAssembly code from C/C++ code [wasm JIT] #7082

AlexAltea opened this issue Sep 1, 2018 · 11 comments
Labels

Comments

@AlexAltea
Copy link

AlexAltea commented Sep 1, 2018

This is rather a question or feature request rather than an actual issue. I've made sure to thoroughly search for similar issues before both on GitHub and Google, but I have found no relevant results.

I'm generated on generating WebAssembly from C/C++ code and executing it in seamless manner, i.e. using just function pointers. Let me explain this with an example:

On native targets it is possible to JIT-compile code like this:

int main() {
    const char code[] = {
        0x48, 0x89, 0xF8,    // mov rax, rdi
        0x48, 0x01, 0xC0,    // add rax, rax
        0xC3                 // ret
    };
    void* ptr = mmap(0, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC,
        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    memcpy(ptr, code, sizeof(code));
    long (*function)(long) = (long(*)(long))(ptr);
    function(7); // == 14

    return 0;
}

Question is: Is there something similar we could use from C/C++ code when compiling to WebAssembly?
I'm thinking of something like this:

int main() {
    const char code[] = { /* WASM bytecode of function */ };

#ifdef __EMSCRIPTEN__
    void* ptr = emscripten_map_wasm(code, sizeof(code));
    long (*function)(long) = (long(*)(long))(ptr);
    function(7); // == 14
    emscripten_unmap_wasm(ptr);
#endif // __EMSCRIPTEN__

    return 0;
}

The reason I'm asking for this, is that the only binary translators that can be compiled with Emscripten are interpreters which are known for heavy performance penalties (ranging x20-x100). By having an interface to create WebAssembly functions and seamlessly execute them, we could add a WebAssembly backend to many binary translators and have near-native performance.

As a further example: This is already possible with JavaScript (i.e. eval'ing strings) and some projects use that for performance reasons, e.g. see jslm32.

I wonder whether the WebAssembly specification is ready for such a feature. To be honest I'm not that familiar with it (maybe @kripken can comment about it?).

@kripken
Copy link
Member

kripken commented Sep 1, 2018

This is definitely an important feature. You can do this today, but with some overhead - in time, wasm is expected to add features to make this better.

The way to do this now is to create another wasm module at runtime, passing it the same Memory and Table as the original. Then that new module can add its functions to that Table, and then the original code can call them (indirectly, using a function pointer) and vice versa.

To put it another way, Emscripten already supports loading dynamic libraries containing wasm, and they use the same mechanism - new wasm loaded at runtime, sharing the Memory and Table, and calls between modules are done by function pointers. (Dynamically loaded libraries can also import functions from the original code directly, avoiding the cost of an indirect call.) So basically in the previous paragraph I was describing creating a tiny dynamic library at runtime in a JIT manner.

Practically speaking, to do this you need some sort of library you can run on the client that can emit a wasm file (that is, handle all the binary encoding details). One option here is Binaryen, which can also convert basic blocks + branches to wasm (which has structured control flow), so it can function as a compiler backend in a sense.

I've actually been hoping to find time to do a real-world example of this (perhaps on PyPy), but haven't gotten around to it, yet. If you explore this area I'd be very interested to help out.

@kripken kripken changed the title Generate WebAssembly code from C/C++ code Generate WebAssembly code from C/C++ code [wasm JIT] Sep 1, 2018
@AlexAltea
Copy link
Author

Considering that my intended use case [1] involves dynamically generating 10,000's of functions, having each of them as a separate module is probably not a good idea. In that case I'll wait for the WASM specifications to include features to improve JIT compilation. Once that's available I'd be happy to help at integrating it into Emscripten. Also, thanks for pointing out Binaryen, it's really useful. I'll experiment with that a bit.

[1] My plan was adding a WASM backend to QEMU's binary translator, in order to use Unicorn.js at (hopefully) near-native speeds, avoiding the current x1000-factor slowdown.

@kripken
Copy link
Member

kripken commented Sep 4, 2018

I actually don't think even 10,000 modules is that bad :) Each would just contain 1 function plus an import for the table and an import for the memory, so compiling it is almost the same as just compiling the function. The main downside I can think of is the VM would likely not use multiple CPU cores (which VMs can do if many functions are in a single module).

Over the weekend I did a little proof of concept of this, actually, I got the pypy.js JIT to emit wasm. Seems to work as expected (however, it was just a quick hack, see details there).

@AlexAltea
Copy link
Author

AlexAltea commented Feb 5, 2019

@kripken Thanks for your proof-of-concept. After taking a break the past months regarding this issue, these past days I've went back to it and attempted to create a minimal self-contained example to illustrate your suggestions above:

The way to do this now is to create another wasm module at runtime, passing it the same Memory and Table as the original. Then that new module can add its functions to that Table, and then the original code can call them (indirectly, using a function pointer) and vice versa.

The current implementation looks like: https://gist.github.com/AlexAltea/daf4819856a3f47a58e2a1588dbb1ed5
(Requires compiling with -s ALLOW_TABLE_GROWTH=1, and beware of #8003!)

However there have been two issues so far:

  1. The AOT-module is unable to indirectly-call functions from the JIT-module added to its Table via the index returned by addWasmFunction after casting it to the appropriate function pointer type.
    The only workaround I've found (which I use in the snippet above), is passing that index to an EM_ASM snippet and executing the function via: wasmTable.get($0)(...).

  2. The JIT-module is unable to indirectly-call functions from the AOT-module that have been created via BinaryenCallIndirect using a constant function pointer as target. Once the call_indirect instruction is reached, the following error is thrown:

    main.js:1 Uncaught RuntimeError: function signature mismatch
        at wasm-function[0]:8
    

Judging by your initial explanation, there's nothing in the snippet that strikes me as wrong. Is there anything that I might have overlooked? Thank you.

PS: Note that for the sake of simplicity, I've decided to go with BinaryenCallIndirect even if it incurs in additional overhead.

@kripken
Copy link
Member

kripken commented Feb 5, 2019

The AOT-module is unable to indirectly-call functions from the JIT-module added to its Table via the index returned by addWasmFunction after casting it to the appropriate function pointer type.

Why not, what error does it hit? (same as in the second point?)

You might be hitting a bug in a VM - worth testing in multiple VMs, and latest versions, as bugs may have been fixed. If that's not it, then perhaps you do have the signature wrong? (Calling from JS is more permissive as it will add/remove params, etc.) If that's not it either, if you create a full testcase I can take a look at it.

@AlexAltea
Copy link
Author

AlexAltea commented Feb 5, 2019

Why not, what error does it hit? (same as in the second point?)

@kripken Nope, the 1st issue (AOT-to-JIT indirect calls) is that it seems to call a "different" function that it's supposed to (in earlier tests I also got signature mismatch iirc). I've updated the test to show this issue: https://gist.github.com/AlexAltea/daf4819856a3f47a58e2a1588dbb1ed5.
The relevant code is:

// Link module
uint32_t adder_wrapper_index = EM_ASM_INT({
    var jit_binary = new Uint8Array(wasmMemory.buffer, $0, $1);
    var jit_module = new WebAssembly.Module(jit_binary);
    var jit_instance = new WebAssembly.Instance(jit_module, {
        env: {
            memory: wasmMemory,
            table: wasmTable,
        }
    });
    console.log('WASM Table length: ' + wasmTable.length);
    var adder_wrapper = jit_instance.exports["adder_wrapper"];
    var adder_wrapper_index = addWasmFunction(adder_wrapper);
    console.log('WASM Table length: ' + wasmTable.length);
    return adder_wrapper_index;
}, result.binary, result.binaryBytes);

printf("adder_wrapper_index: %d\n", adder_wrapper_index);
uint32_t res0 = ((uint32_t(*)(uint32_t,uint32_t))adder_wrapper_index)(3, 4); // This calls the wrong function!
printf("result #0: %d\n", res0); // Should be 7 (but is 4, why?)

This JIT-module imports env.memory and env.table, and contains just:

(module
 (type $iii (func (param i32 i32) (result i32)))
 (export "adder_wrapper" (func $adder_wrapper))
 (func $adder_wrapper (; 0 ;) (; has Stack IR ;) (type $iii) (param $0 i32) (param $1 i32) (result i32)
  (call_indirect (type $iii)
   (local.get $0)
   (local.get $1)
   (i32.const 133)    ;; function pointer to `adder`, defined in the AOT-module as:
                      ;; uint32_t adder(uint32_t a, uint32_t b) { return a + b; }
  )
 )
)

The output is:

WASM Table length: 8064
WASM Table length: 8065
adder_wrapper_index: 8064
result #0: 4

This is the case both in latest revisions of Chrome and Firefox, so I doubt it's a bug, but I cannot spot any potential API misuse anywhere either.


then perhaps you do have the signature wrong?

@kripken Regarding the 2nd issue (JIT-to-AOT indirect calls), the signature seems to be correct, the AOT function is defined as:

uint32_t adder(uint32_t a, uint32_t b) {
    return a + b;
}

and the type passed to BinaryenCallIndirect is defined as:

BinaryenType params[2] = { BinaryenTypeInt32(), BinaryenTypeInt32() };
BinaryenFunctionTypeRef iii = BinaryenAddFunctionType(m, "iii", BinaryenTypeInt32(), params, 2);

I don't understand what might be failing here. If you don't see any other reason for this, I'm afraid I'll need to attach a debugger to my browser's WASM engine...

@kripken
Copy link
Member

kripken commented Feb 6, 2019

Calling a different function and calling with the wrong signature are probably the same issue.

How are you compiling that source file? I'd expect problems with asm.js function tables if it's using asm2wasm - since asm.js function tables add offsets to pointers. In that case, building with -s EMULATED_FUNCTION_POINTERS=1 should fix things (that's enabled by default for dynamic linking, so also building your file with -s MAIN_MODULE=1 would work, but would also add more overhead you may not need). Alternatively, the wasm backend should work out of the box.

@AlexAltea
Copy link
Author

AlexAltea commented Feb 7, 2019

@kripken I'm not using asm2wasm, but just Binaryen to generate the JIT-module on the fly (see the snippet). The AOT-module that generates it is just an 80 lines of C, built with Emscripten.

I've built V8/D8 and found the root cause for the issues with JIT-to-AOT indirect calls:
https://github.com/v8/v8/blob/0999709/src/wasm/wasm-interpreter.cc#L3169-L3171

    if (entry.sig_id() != static_cast<int32_t>(expected_sig_id)) {
      return {ExternalCallResult::SIGNATURE_MISMATCH};
    }

Here, expected_sig_id is 0, which makes sense since there's only one signature defined in the JIT-module, namely: (type $iii (func (param i32 i32) (result i32))). However, entry refers to the function in the AOT-module, whose signature identifiers might be different, despite having the same signature.

This issue was reported in WebAssembly/design#452, and fixed in WebAssembly/design#682, which enforces indirect call signature checks to be structural rather than nominal. Browsers still seem to do nominal checks.

Note that while I'm using the V8's WebAssembly interpreter (makes debugging easier), I also replicated this behavior on Firefox and Chrome.

@kripken What's your take on this? Is this indeed a browser bug to report? Thanks a lot for your time. :-)

PS: Sorry for using the Emscripten issue tracker for this, I didn't expect rabbit hole to go this far.

@kripken
Copy link
Member

kripken commented Feb 7, 2019

Oh, I saw #include <emscripten.h> and assumed the whole app was compiled with emcc.

Yeah, those checks should be structural I believe. Perhaps the best thing is to open an issue on the wasm design repo, if you see that browsers do not actually obey this. Might be worth looking in the wasm spec test suite first to see if this is tested or not, and reporting that in the issue - if not, creating a small testcase for them would be good.

@AlexAltea
Copy link
Author

@kripken Thank you, as you mentioned, the -s EMULATED_FUNCTION_POINTERS=1 flag fixed the issue. Somehow, I forgot to try it earlier since I was confused about why asm.js was interfering. Thank you so much for your help!

The minimal working version of my proof-of-concept is here:
https://gist.github.com/AlexAltea/78e8da45e20fdb1bf2afb8816de53bae

Can be compiled via:

emcc -I/path/to/binaryen/src -L/path/to/binaryen/lib -lbinaryen -s WASM=1 -s ALLOW_TABLE_GROWTH=1 -s RESERVED_FUNCTION_POINTERS=1 -s EMULATED_FUNCTION_POINTERS=1 c-test-jit.c

PS: Feel free to close this issue if you want! (Not sure how far away are proper WASM JIT features from making it into the specification).

@stale
Copy link

stale bot commented Feb 12, 2020

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Feb 12, 2020
@stale stale bot closed this as completed Feb 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants