Skip to content

Build Binaryen.js to an ES module #3304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

dcodeIO
Copy link
Contributor

@dcodeIO dcodeIO commented Oct 30, 2020

This is an experiment on what will be necessary to build Binaryen.js as an ES module (see #2990 (comment)), in particular allowing usage like

// ESM
import { Module, Function, ready } from "binaryen";
await ready;
...
// UMD (transpiled)
const { Module, Function, ready } = require("binaryen");
await ready;
...

In general appears to be possible, except that Emscripten automatically enables MODULARIZE when EXPORT_ES6 is given (see emscripten-core/emscripten#12530, cc @RReverser), which is not what we want here, yet we want EXPORT_ES6 (or a new option) to enable export syntax in Emscripten's acorn optimizer which otherwise fails with an error, and usage of import.meta instead of __dirname. Hence, for now, only debug builds build.

var __dirname = ".";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either may derive this from import.meta using a mechanism like node's fileURLToPath, or look into enabling import.meta otherwise, either using EXPORT_ES6 or something new.

Comment on lines +56 to +57
Types[entry[0]] = value;
Types[value] = entry[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: Adds a reverse lookup to the enums, as TypeScript typically does and expects when typed as enum in TS definitions.

wrapModule(MODULE['_BinaryenModuleCreate'](), this);
}

export { BynModule as Module };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little unfortunate, but not a problem. Module is already defined, so we can only export as to expose the proper name.

// Default export one gets upon either
// * `import binaryen from "binaryen"` or
// * `const binaryen = require("binaryen")`.
export default {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeated code here is somewhat typical for hybrid modules, in that it ensures that the module works ergonomically the same when required. When transpiled to CommonJS, this becomes module.exports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most (all?) of the functions here seem to be duplications of inline export statements. Why do you need them both as inline exports and as default object? Ideally you'd only keep the former, and transpilers can handle their conversion to CommonJS fine. It's also preferable, because named exports are tree-shakeable by other tooling, whereas with default object you lose this benefit of ESM.

@RReverser
Copy link
Member

RReverser commented Oct 30, 2020

In general appears to be possible, except that Emscripten automatically enables MODULARIZE when EXPORT_ES6 is given (see emscripten-core/emscripten#12530, cc @RReverser), which is not what we want here

One didn't work without the other anyway - before that PR it was just throwing an error rather than auto-enabling the required option.

I'm not sure I understand what you want and how it's different from MODULARIZE...

@RReverser
Copy link
Member

Oh, you're doing all the exports manually? Hmm. Instead of doing pre/post on the original module, wouldn't it be easier to create a separate wrapper that will import the Emscripten-generated code and export the required shape? It feels like it would be a bit cleaner and easier to maintain...

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Oct 30, 2020

I'm not sure I understand what you want and how it's different from MODULARIZE...

What Binaryen.js is doing is essentially what MODULARIZE_INSTANCE was doing before it was removed, in that it is really a single instance of a library instead of a library one would instantiate multiple times. I guess there are ways to wrap a MODULARIZEd module as a singleton, but that also gives us quite a bit of code we don't necessarily need, still with a custom wrapper around everything. Code-wise, once we can instruct Emscripten to assume export in acorn, not using MODULARIZE should even be cleaner and easier to maintain in that it doesn't require a custom wrapper at all, really just export in the JS code in the post.js.

Instead of doing pre/post on the original module, wouldn't it be easier to create a separate wrapper that will import the Emscripten-generated code and export the required shape?

Aren't we getting two different files then, exporting one from importing the other?

@RReverser
Copy link
Member

Aren't we getting two different files then, exporting one from importing the other?

Yeah, but in most cases it's not a problem, and then if you use something like Rollup for CommonJS build, it will just correctly inline one into another anyway.

@RReverser
Copy link
Member

Code-wise, once we can instruct Emscripten to assume export in acorn

This particular thing should be possible independently of MODULARIZE, we have allowImportExportEverywhere in Acorn for that and Emscripten could use it. Alternatively, and even better, I'd use sourceType: "module" in Emscripten to imply strict mode too, but that has a slight chance to break some user code out there :(

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Oct 30, 2020

I'd use sourceType: "module" in Emscripten to imply strict mode too, but that has a slight chance to break some user code out there :(

Yep, that's the option I was looking at as well 👍 I guess what we could do is to add an ESM option to Emscripten, remove EXPORT_ES6, and allow MODULARIZE & ESM, MODULARIZE and just ESM? Binaryen.js would then use ESM.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Oct 30, 2020

Ah, and of course, to not break existing use cases, we can keep EXPORT_ES6 around as-is and make it behave like MODULARIZE & ESM.

@RReverser
Copy link
Member

Binaryen.js would then use ESM.

I think Binaryen is quite special in this regard because it uses custom pre/post JS with exports. I don't see what the semantics would be for normal Emscripten output under ESM option, since Emscripten does't export individual items.

It needs Modularize because it can only export a factory where user could providing a path to the Wasm file as well as instantiation options, and not because it tries to allow multiple instantiations (it's more of a side effect). For Node.js these concerns might not apply, since it has a real filesystem API, but whatever Emscripten generates should work in all environments.

@RReverser
Copy link
Member

RReverser commented Oct 30, 2020

I really think that the best option in this case is, instead of custom pre/post, to let Emscripten generate regular EXPORT_ES6 as an internal file, and then provide a public API in a separate file that would take care of instantiating Emscripten only once and exporting function wrappers as individual exports.

This would be a path of the least resistance, and has much lower chance of breaking whenever we regenerate the "internal" JS+Wasm.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Oct 30, 2020

I think Binaryen is quite special in this regard because it uses custom pre/post JS with exports.

It only uses the custom extern-pre.js an extern-post.js because Binaryen's use case is not sufficiently covered in Emscripten anymore since the removal of MODULARIZE_INSTANCE. Binaryen itself does not require these extern-xy.js files, it really only needs the post.js to expose a more JS-y API. This PR for example essentially deletes the extern-post.js, and the extern-pre.js is only there to work around __dirname, but that'd go away as well.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Nov 2, 2020

Thinking about this a little more, perhaps the mechanism MODULARIZE provides may more clearly be named INSTANCED, in that it exports a function one can use to instantiate a module multiple times in user code, but otherwise doesn't have a lot in common with what an ES or node module would do (instantiates itself once).

@RReverser
Copy link
Member

RReverser commented Nov 2, 2020

in that it exports a function one can use to instantiate a module multiple times in user code

I think you might be putting too much focus on this possibility (instantiating a module multiple times), even though it's not the goal of MODULARIZE nor is it a very common usage of that mode.

Instead, MODULARIZE was created as an alternative to relying on global predefined Module variable with Emscripten module settings. Relying on such global worked okay when you included Emscripten as an old-style script, but it also led to globals pollution, conflicts between several potential Emscripten outputs on the same page, and so on.

In this regard MODULARIZE fixes the problem by making outputs self-contained and accepting config via a factory instead of relying on and creating a global with potentially conflicting name. In this regard it's quite similar to what any other module system (AMD / ESM / Node) did to fix same problems with old-style scripts - global namespace pollution and conflicts.

The fact that it exports a factory is only a side-effect of having to accept a config from outside before instantiation, and not a design choice to encourage you to create several instances. It's okay to still instantiate it only once in a singleton and re-export only what you need.

@dcodeIO
Copy link
Contributor Author

dcodeIO commented Nov 3, 2020

Yeah, MODULARIZE solves some problems, but it also introduces new ones for ESM specifically by mandating the factory function etc. For instance, the only way to emit import.meta instead of (otherwise invalid) __dirname is EXPORT_ES6 currently, leading to problems like that a bundler cannot tree-shake/dce, because one must import the entire factory if I'm not mistaken, and it doesn't play well with custom JS (can't wrap because there's already an then-inner export statement, can't define a custom default export when appending). What I'm trying to get at is that MODULARIZE conflicts with idiomatic ESM here and there, and perhaps we can work together to make it better. If there's interest?

@kripken
Copy link
Member

kripken commented Nov 3, 2020

This is maybe a stupid question: can an ES6 module be imported ("instantiated") more than once?

If it can't, then it seems like ES6 mode should emit something like what MODULARIZE_INSTANCE used to, that is, the user never needs to think about a factory function.

@RReverser
Copy link
Member

RReverser commented Nov 3, 2020

can an ES6 module be imported ("instantiated") more than once?

No, it can't.

the user never needs to think about a factory function.

I'm not sure that's true / how would that work. User does need a factory function at least to 1) be able to pass in params before any instantiation occurs and 2) to get a Promise back.

You can alleviate (2) by exporting a promise as a variable, or by waiting till top-level await is supported in various engines, but I don't see how you would work around (1) because that's the part user does need to think about and be able to config.

Also, making EXPORT_ES6 export a singleton would be a breaking change for those rare users who do want to instantiate several instances of Emscripten module with different params.

@kripken
Copy link
Member

kripken commented Nov 3, 2020

Interesting, thanks @RReverser

How do normal JS modules deal with those two problems (initialization and async start)?

@RReverser
Copy link
Member

How do normal JS modules deal with those two problems (initialization and async start)?

Two ways: like Emscripten already does, by exposing a factory, or by exporting a separate function to init the module.

In my personal experience, the first one is more common and more intuitive, because such API design ensures that user doesn't attempt to access other exports before initialization, whereas second approach (used by, for example, wasm-bindgen output) makes it too easy to make such mistakes for the user and attempt to access exported variables / functions when the underlying state is not yet initialised.

As another alternative, we could add another mode / compile-time option that would assume that the user doesn't want to configure the Module before initialization, and that all the files can be resolved without custom locateFile and so on, but then we're still left with the 2nd problem of async initializaiton, which won't go away until top-level await (https://github.com/tc39/proposal-top-level-await) is universally supported.

To summarise, I think that, in the current state of things, a factory export like the one Emscripten already uses, is the least error-prone solution as it can be used in any of the given scenarios - whether it's non-configurable init, init with a custom config, single instance or multiple instances of the module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants