Skip to content

Avoid importing Node's modules on demand #17851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 21, 2022

Conversation

kleisauke
Copy link
Collaborator

@kleisauke kleisauke commented Sep 14, 2022

Remove requireNodeFS() in favor of always importing these Node modules.

These require()'s will always be used on Node.js, except when linking with -sSINGLE_FILE. But in that case, stdin usage, or using the -sMAIN_MODULE and/or -sNODERAWFS options would still depend on these imports.

Split from: #17849.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 14, 2022

What about in the case of SINGILE_FILE=1, I think in that case the FS might never be needed.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 14, 2022

We should probably be excluding the _read function completely if the FS really is never needed.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 14, 2022

Can you run./test/runner other.*code_size* other.*metadce* --rebase to see if there are any code size savings here.

@kleisauke
Copy link
Collaborator Author

kleisauke commented Sep 14, 2022

Ah, SINGLE_FILE may not require those Node modules. I thought about doing this:

@@ -199,7 +199,9 @@ if (ENVIRONMENT_IS_NODE) {
     scriptDirectory = __dirname + '/';
   }
 
+#if !SINGLE_FILE
 #include "node_shell_read.js"
+#endif
 
   if (process['argv'].length > 1) {
     thisProgram = process['argv'][1].replace(/\\/g, '/');

But that will probably break when someone uses SINGLE_FILE in combination with NODERAWFS or MAIN_MODULE. Perhaps Closure Compiler is able to DCE these require()s if they are not actually referenced? If so, I could rename the PR description to:

Remove `requireNodeFS()` in favor of always importing these Node
modules. 

These `require()`'s will usually be used on Node.js, and if there
are no references to these variables (e.g. when linking with
`-sSINGLE_FILE`), Closure Compiler can still omit these variables
as part of DCE.

Commit 62b5c25 rebased the codesize tests, it looks like this change saved a few bytes.

@kleisauke kleisauke force-pushed the always-import-node-modules branch from 62b5c25 to fbcd6ea Compare September 16, 2022 15:12
@kleisauke kleisauke changed the title Avoid importing Node's modules on demand. NFC Avoid importing Node's modules on demand Sep 16, 2022
@kleisauke
Copy link
Collaborator Author

... according to https://developers.google.com/closure/compiler/docs/api-tutorial3, Closure Compiler will only do DCE when using the ADVANCED_OPTIMIZATIONS compilation level, which Emscripten already uses by default. Though, it doesn't seem to DCE the require()'s away on this branch.

$ echo -e 'int main(void){return 0;}' | emcc -xc -Oz -sSINGLE_FILE --closure=1 -
$ stat --format='%s' a.out.js
6750
$ grep -o '=require("fs")' a.out.js
=require("fs")
$ grep -o '=require("path")' a.out.js
=require("path")

(I also tried using --closure-args='--module_resolution NODE --process_common_js_modules' to no avail)

In this case, it probably doesn't hurt that these Node modules are imported without being used. Either way, I just modified the PR description and commit. PTAL.

@rchiodo
Copy link

rchiodo commented Sep 20, 2022

@kleisauke I was wondering if you wanted any help in pushing this along? I'm waiting on this PR and #17489 so I can submit a fix for #17829. I can probably fix the merge conflicts here if you're busy.

Remove `requireNodeFS()` in favor of always importing these Node
modules.

These `require()`'s will always be used on Node.js, except when
linking with `-sSINGLE_FILE`. But in that case, `stdin` usage,
or using the `-sMAIN_MODULE` and/or `-sNODERAWFS` options would
still depend on these imports.
This change was generated using:
$ ./test/runner other.*code_size* other.*metadce* --rebase
@kleisauke kleisauke force-pushed the always-import-node-modules branch from fbcd6ea to 217334b Compare September 21, 2022 08:28
@kleisauke
Copy link
Collaborator Author

@rchiodo Just rebased this PR to resolve the merge conflicts. I don't control the merging process as an external contributor.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 21, 2022

I do wonder if folks are going to complain about these imports not being done unconditionally. Maybe we can follow up by somehow making them included only when needed?

@sbc100 sbc100 enabled auto-merge (squash) September 21, 2022 08:35
@curiousdannii
Copy link
Contributor

The whole thing does need to be reworked for ES modules anyway...

@kleisauke
Copy link
Collaborator Author

node_shell_read.js is only conditionally included on Node.js environments.

emscripten/src/shell.js

Lines 189 to 202 in 84ad56f

#if ENVIRONMENT_MAY_BE_NODE
if (ENVIRONMENT_IS_NODE) {
#if ENVIRONMENT
#if ASSERTIONS
if (typeof process == 'undefined' || !process.release || process.release.name !== 'node') throw new Error('not compiled for this environment (did you build to HTML and try to run it not on the web, or set ENVIRONMENT to something - like node - and run it someplace else - like on the web?)');
#endif
#endif
if (ENVIRONMENT_IS_WORKER) {
scriptDirectory = require('path').dirname(scriptDirectory) + '/';
} else {
scriptDirectory = __dirname + '/';
}
#include "node_shell_read.js"

We could exclude this for -sSINGLE_FILE usage during link time, e.g.:

#if !SINGLE_FILE && (!MAIN_MODULE || !NODERAWFS)
#include "node_shell_read.js"
#endif

But I think that's a bit fragile, and doesn't cover the use-case when someone reads the stdin when linking with -sSINGLE_FILE.

bytesRead = fs.readSync(process.stdin.fd, buf, 0, BUFSIZE, -1);

Is -sSINGLE_FILE widely used on Node.js? I suppose base64 encoding of Wasm won't provide any additional benefit, and can actually make binaries larger.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 21, 2022

node_shell_read.js is only conditionally included on Node.js environments.

emscripten/src/shell.js

Lines 189 to 202 in 84ad56f

#if ENVIRONMENT_MAY_BE_NODE
if (ENVIRONMENT_IS_NODE) {
#if ENVIRONMENT
#if ASSERTIONS
if (typeof process == 'undefined' || !process.release || process.release.name !== 'node') throw new Error('not compiled for this environment (did you build to HTML and try to run it not on the web, or set ENVIRONMENT to something - like node - and run it someplace else - like on the web?)');
#endif
#endif
if (ENVIRONMENT_IS_WORKER) {
scriptDirectory = require('path').dirname(scriptDirectory) + '/';
} else {
scriptDirectory = __dirname + '/';
}
#include "node_shell_read.js"

We could exclude this for -sSINGLE_FILE usage during link time, e.g.:

#if !SINGLE_FILE && (!MAIN_MODULE || !NODERAWFS)
#include "node_shell_read.js"
#endif

But I think that's a bit fragile, and doesn't cover the use-case when someone reads the stdin when linking with -sSINGLE_FILE.

bytesRead = fs.readSync(process.stdin.fd, buf, 0, BUFSIZE, -1);

Is -sSINGLE_FILE widely used on Node.js? I suppose base64 encoding of Wasm won't provide any additional benefit, and can actually make binaries larger.

Yes I agree that #if !SINGLE_FILE && (!MAIN_MODULE || !NODERAWFS) is probably too simplistic. It would be nice to have precise dependencies between all the things that use these imports, and the imports themselves but we don't have that yet.

For now this seems reasonable, we can revisit later if needed.

@kleisauke
Copy link
Collaborator Author

Great! Failures in test-browser-chrome doesn't appear to be related to this PR, should I try to rerun it?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 21, 2022

I restarted the failed browser tests. (BTW do you have permission to do that?)

@sbc100 sbc100 merged commit 929a6d5 into emscripten-core:main Sep 21, 2022
@kleisauke
Copy link
Collaborator Author

BTW do you have permission to do that?

I do see the rerun button on CircleCI, so I suppose that would work.

@kripken
Copy link
Member

kripken commented Sep 22, 2022

This may have broken Binaryen's emscripten tests, e.g. https://github.com/WebAssembly/binaryen/actions/runs/3101768285/jobs/5023446794 (WebAssembly/binaryen#5071)

ReferenceError: require is not defined

Those tests wrap the output into an .mjs file, so perhaps this PR interacts with that somehow? In an .mjs file require() is not defined I believe. But this PR should just change when require is called IIUC, so that's odd..?

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

I suggest we revert for now, and add a test for this... I assumed we had a bunch of tests for .mjs modularization, but I guess binaryen must be doing something that none of our tests are..

@kripken
Copy link
Member

kripken commented Sep 22, 2022

This PR seems to fix binaryen,

WebAssembly/binaryen#5073

However, I'm still not sure why the breakage happened. So not sure if we should or should not revert this.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

I think the breakage happend because the use of require was moved from conditional (where it doesn't run on all cases), to top level (where it runs on load).

@kleisauke
Copy link
Collaborator Author

kleisauke commented Sep 22, 2022

Sorry for causing this breakage. I think this issue is tracked in #11792, i.e. the EXPORT_ES6 mode, which is enabled by default when writing *.mjs, is currently broken on Node.js.

Prepending:

import { createRequire } from 'module';
const require = createRequire(import.meta.url);

is probably a feasible solution for now, but eventually we need to solve this in Emscripten. FWIW, wasm-vips is currently doing this post-processing:
https://github.com/kleisauke/wasm-vips/blob/d7f49e2c8a3c03dff62925e462e11ae1c08480c8/build.sh#L457-L469

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

It seems to me that the problem extends to Binaryen.js in browsers. Previously require would only ever be called when ENVIRONMENT_IS_NODE (feature checked as typeof process == 'object'), but as has already been said executes unconditionally now, both in Node.js (where in ESM require must first be created) and in browsers (where require does not exist). Suggesting to revert this change and keep use of require behind a feature check. Specifically, the createRequire workaround will work in Node.js, but not in browsers.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

I think we should revert this change given that it breaks users of EXPORT_ES6. Previously it seems that not all users of EXPORT_ES6 ran into the issue of require being missing but this change made it effect all of them.

(@dcodeIO, this code is all only run if ENVIRONMENT_IS_NODE I think).

@kleisauke
Copy link
Collaborator Author

Feel free to revert.

It seems to me that the problem extends to Binaryen.js in browsers.

node_shell_read.js is only conditionally included on Node.js environments, see: #17851 (comment).

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

I see. If it is behind a feature check still, then I guess the createRequire workaround should become a part of the Binaryen.js artifact itself, so it works everywhere, not only in the test suite. That would work, even though I guess that using import (here: a dynamic import(...) call) instead of require would be the right thing to do if ESM is enabled in Emscripten.

@kleisauke
Copy link
Collaborator Author

Previously it seems that not all users of EXPORT_ES6 ran into the issue of require being missing but this change made it effect all of them.

It looks like Binaryen already had a workaround for __dirname, see:
https://github.com/WebAssembly/binaryen/blob/8f385b04afd6e968fa83dfade7f7858ba0824e37/src/js/binaryen.js-extern-pre.js

So, perhaps reverting this is not the right way to resolve this. For example, applying this patch:

--- a/test/test_other.py
+++ b/test/test_other.py
@@ -234,12 +234,11 @@ class other(RunnerCore):
         os.remove(config_path)
 
   def test_emcc_output_mjs(self):
-    self.run_process([EMCC, '-o', 'hello_world.mjs', test_file('hello_world.c')])
+    create_file('extern-post.js', 'await Module();')
+    self.run_process([EMCC, '-o', 'hello_world.mjs', test_file('hello_world.c'), '--extern-post-js', 'extern-post.js'])
     output = read_file('hello_world.mjs')
     self.assertContained('export default Module;', output)
-    # TODO(sbc): Test that this is actually runnable.  We currently don't have
-    # any tests for EXPORT_ES6 but once we do this should be enabled.
-    # self.assertContained('hello, world!', self.run_js('hello_world.mjs'))
+    self.assertContained('hello, world!', self.run_js('hello_world.mjs'))
 
   @parameterized({
     '': (True, [],),

Will cause a runtime error on that __dirname thing.

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

Indeed, there are additional challenges to make this work in any case. The general picture I've seen is that producing ESM with Emscripten still uses some CommonJS things. where __dirname is one of them and require is similar. Can't yet explain how using require in ESM used to work before, though. Almost as if require wasn't executed before this PR, even in Node.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

If you want to propose a fix, and enable that test that would be fine too. But this issue has been around for a while which makes me think fixing my be non-trivial. Do you want to propose a fix foward? I will this we should revert unless such as fix can be found in the next day or so.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

Indeed, there are additional challenges to make this work in any case. The general picture I've seen is that producing ESM with Emscripten still uses some CommonJS things. where __dirname is one of them and require is similar. Can't yet explain how using require in ESM used to work before, though. Almost as if require wasn't executed before this PR, even in Node.

Thats exactly how the old code worked.. it only used require if some runtime function actually wanted to use the filesystem.

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

Hmm, perhaps Binaryen.js is somewhat special in using ESM + filesystem, while actually not utilizing the filesystem so this issue didn't surface until require was used unconditionally under Node. Might be that (most) existing code actually utilizing the filesystem still uses CommonJS (like wasm-opt.js for instance), or that related problems exist and can be found in issues, but so far the cause wasn't obvious.

@kripken
Copy link
Member

kripken commented Sep 22, 2022

Yes, I believe that binaryen.js actually never uses the filesystem... so when this was on-demand we never reached the error. So this PR can break code that is just pure computation, but happens to be run in node, and happens to be pasted into an .mjs file.

Perhaps one solution here is to wrap the require in a try-catch? Then the error would be ignored during startup, and only happen if the filesystem is actually used.

Ideally perhaps such pure computational code would have a "none" environment, so it doesn't try to use node APIs. But that's a larger question.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 22, 2022

Yes, I believe that binaryen.js actually never uses the filesystem... so when this was on-demand we never reached the error. So this PR can break code that is just pure computation, but happens to be run in node, and happens to be pasted into an .mjs file.

Perhaps one solution here is to wrap the require in a try-catch? Then the error would be ignored during startup, and only happen if the filesystem is actually used.

Ideally perhaps such pure computational code would have a "none" environment, so it doesn't try to use node APIs. But that's a larger question.

One could also wrap in if (typeof require != 'undefined') I guess? But it seems like we could also just remove these lines completely in EXPORT_ES6 mode?

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

Overall it looks like the shell code should replace require(...)s with (await import(...)).defaults or so when EXPORT_ES6 (perhaps rename to ESM?) is set. Similarly, it shouldn't use __dirname (or __filename, both not present under ESM) but import.meta.url, which together would remove the necessity for the externs-pre file we use as a workaround at Binaryen.js to build to ESM. Of course, createRequire works as well, but is somewhat a workaround for where ESM cannot be fully utilized yet. Not sure if that's the case in Emscripten artifacts. Could be necessary if top-level await wouldn't work universally yet.

@kleisauke
Copy link
Collaborator Author

It looks like binaryen.js tests link with -sSINGLE_FILE when CMAKE_BUILD_TYPE is not Debug, see:
https://github.com/WebAssembly/binaryen/blob/8f385b04afd6e968fa83dfade7f7858ba0824e37/CMakeLists.txt#L306

I suppose building with -DCMAKE_BUILD_TYPE=Debug would cause the same runtime error before this PR landed?

@kripken
Copy link
Member

kripken commented Sep 22, 2022

@dcodeIO

Overall it looks like the shell code should replace require(...)s with (await import(...)).defaults or so when EXPORT_ES6 (perhaps rename to ESM?) is set.

Ah, nice, await import works in .mjs files I guess? But yeah, we'd need to wait for universal top-level await. I tried it now and even closure fails on it,

SyntaxError: Cannot use keyword 'await' outside an async function (116:10)
var fs = (await import('fs')).defaults;

@kleisauke

What is the connection to SINGLE_FILE - does it avoid or force the require somehow? (at a glance I don't see a connection in the code)

@kleisauke
Copy link
Collaborator Author

What is the connection to SINGLE_FILE - does it avoid or force the require somehow? (at a glance I don't see a connection in the code)

Previously when linking with -sSINGLE_FILE it caused requireNodeFS() never to be called and thus it didn't cause a runtime error when linking with -sEXPORT_ES6 and -sSINGLE_FILE + using that __dirname workaround. The temporary require() stub in PR WebAssembly/binaryen#5075 is probably the best way to make the Binaryen CI green for now.

Long term solution would indeed be that Emscripten emit proper import() statements and avoids using __dirname when using that setting. I opened (draft-)PR #17915 to hopefully avoid further ES6 regressions on Node.js.

@kripken
Copy link
Member

kripken commented Sep 22, 2022

Ok, I think I see, thanks @kleisauke

For fixing binaryen CI right now, I think we can pin the latest stable version (WebAssembly/binaryen#5077). But we should undo that and resolve this issue before a new emscripten release so users don't get broken.

My sense atm is that since top-level await isn't universal yet, perhaps the best options are to either

  1. Revert this (go back to the old lazy behavior) or
  2. Add a check before the two requires on typeof require === 'function' or something to that effect.

I'd be ok with either.

(Long-term, a "none" environment for pure computational code might be a good idea, and eventually we will be able to use top-level await.)

@dcodeIO
Copy link
Contributor

dcodeIO commented Sep 22, 2022

Regarding

SyntaxError: Cannot use keyword 'await' outside an async function (116:10)
var fs = (await import('fs')).defaults;

this is expected when using top-level await in a function. One could say that await is contagious, that is when one uses it in a function the whole async context spreads to the first call of such a function when it is supposed to be called synchronously, typically ending up in something like

async function someFunc() {
  return await someOtherFunc();
}
await someFunc();

Not necessarily a blocker, as long as all such code is under ones control, but easily leading to extensive refactoring. In Emscripten, I guess this leads to complications if both ESM and CommonJS are targeted, with the ESM one requiring async and await all over the place. Good news is that, if top-level await is considered a given in all environments, after the refactoring, this can be mixed with synchronous code since await somethingSynchronous() still works.

kripken added a commit to WebAssembly/binaryen that referenced this pull request Sep 22, 2022
kripken pushed a commit to WebAssembly/binaryen that referenced this pull request Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants