Skip to content

Replace ENVIRONMENT_* logic with Web default + polyfills? #12203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kripken opened this issue Sep 13, 2020 · 12 comments
Open

Replace ENVIRONMENT_* logic with Web default + polyfills? #12203

kripken opened this issue Sep 13, 2020 · 12 comments

Comments

@kripken
Copy link
Member

kripken commented Sep 13, 2020

Right now we support various "environments": web, node, other JS shells, etc. We need to use different code paths for loading a file, for example - fetch on the web versus node's fs package, and so forth. We have a layer of indirection for this, where in shell.js the environment is detected and hooks like readBinary, readAsync are set up.

This has two current downsides. First, that while you can specify that only a single target environment is supported, the detection + indirection layer may still end up increasing code size. Second, we get requests for more environments to be supported now and then, like electron variants or deno, and they require adding an entire new "environment".

Instead, what if we made the following change:

  • The default code path uses Web APIs directly.
  • To support other environments, they would polyfill the Web environment. For example, the default code path would use fetch, and other environments would need to implement a minimal polyfill for that. So for node, fs.readFile would be used in the polyfill, etc.

Benefits:

  • In practice, the web target is the one that cares the most about code size. By having the default code path use Web APIs, code size would be minimal there.
  • Supporting more environments can be done in a pluggable way: add a file with code to polyfill Web APIs for whatever you want. We can have env/node.js, env/deno.js etc. as needed, and a user can also add a custom one of their own just by including the code with --pre-js.
@curiousdannii
Copy link
Contributor

curiousdannii commented Sep 14, 2020

Great idea. But a few thoughts.

Traditionally polyfills have aimed to completely implement the relevant standard by implementing it as a global variable or by modifying a built in object. So a polyfill for fetch would need to completely implement the full spec (because if we write to global objects other code may try to use it) . I think that would be very inoptimal. We only care about a small subset of the full specification. So rather than talking in terms of polyfills I think it would be better to think of a small Emscripten Environment API. Such an API like that would only have the things Emscripten itself uses. So it wouldn't need to handle all the options of fetch, but just the few options we need.

Can I suggest that each env file pushes a function to an array, and then at the appropriate time the shell code will run the functions in series. Each function should check if a component in the API has already been supplied, and if not it checks to see if it can supply it. The function should be an async function - this will allow us to fix #11792.

My last thought is whether the default should be for modern browsers, or whether it should be for a more minimalistic and agnostic environment that might support the core ES standard but not much else. For example for many people Math.random might be adequate randomness, and the extra code for using the Crypto API might not be worth it (probably not a good example as I don't think it actually uses too much code.) On the other hand, as you said, file size is mostly only a concern for web, not other environments.

I'm not at my dev system now, but I'll take a look later to try to put together a list of functions this environment API would need.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 14, 2020

I agree we your point that we don't want a full polyfil. However, I think there are good reasons not to invent a new wrapper API but instead to subset the Web API.

This has the effect that no polyfil at all is required for our main target platform, and also helps keep code size minimal when targeting the web where we care a lot about code size.

@kripken
Copy link
Member Author

kripken commented Sep 14, 2020

I agree "polyfill" is probably the wrong word, good point @curiousdannii . Yes, we just want a a small subset of fetch etc. in this case. Just enough to implement the usage we actually make of fetch.

But I do think the API we use should be the actual fetch. That does make it a little more awkward to write the implementations for non-Web environments, as they need to work to implement a subset of fetch as opposed to a new clean API. But the code size benefits are worth it I think as @sbc100 said.

I don't think it would be that hard to implement these APIs. But a prototype is necessary to be sure.

I didn't understand

The function should be an async function - this will allow us to fix #11792.

I'd guess we need the entire module to be async - why is there a specific issue for the ENVIRONMENT logic?

@curiousdannii
Copy link
Contributor

curiousdannii commented Sep 14, 2020

However, I think there are good reasons not to invent a new wrapper API but instead to subset the Web API.

@sbc100 Not sure what you mean by this - just to keep the relevant part of the API the same? That's reasonable, especially as most recent APIs are simpler than the older ones (fetch vs XMLHttpRequest).

But I do think the API we use should be the actual fetch.

@kripken Well in this case the fetch API is the simplest an API could possibly be, you pass a URL and nothing else to a function! I was just suggesting that if any of the other browser APIs we use is more complex, we could consider abstracting it in this new system.

An example where I think making our own API would make sense is that rather than mocking performance.now, we could have one function that returns the timestamp, and set the resolution and whether or not its monotonic as properties of that function. Everything we care about is then neatly encapsulated.

I'd guess we need the entire module to be async - why is there a specific issue for the ENVIRONMENT logic?

We don't need to start making the whole of Emscripten's output async, but if we can put this environment stuff into an async addRunDependency or the like, then it can solve our Node ES modules issues. Those issues being that the import function is async.

@curiousdannii
Copy link
Contributor

curiousdannii commented Sep 15, 2020

Okay, all the things currently gated behind ENVIRONMENT checks (except for PTHREAD):

  • base64
    • base64Decode.js / base64Utils.js
      • NODE: base64 decoding using Buffer
  • crypto
    • library_fs.js
      • NODE: uses the crypto module to setup /dev/(u)random
    • library_uuid.js
      • NODE: uses crypto for making a UUID
  • date/time/performance
    • deterministic.js
      • NODE: overwrites process.hrtime?? Very weird.
    • library.js
      • BROWSER: emscripten_get_now uses performance.now
      • NODE: emscripten_get_now uses process.hrtime
      • SHELL: emscripten_get_now uses dateNow
      • and associated functions emscripten_get_now_res and emscripten_get_now_is_monotonic
  • devicePixelRatio
    • library_html5.js
      • just checks for existence
  • DOM
    • library_html5.js
      • non-BROWSER: tests for existence of document and window for handling them in events
    • shell.js
      • BROWSER: setWindowTitle function
  • Error handling
    • preamble.js
      • BROWSER: catches errors on window
  • executable name
    • library.js
      • in MINIMAL_RUNTIME and NODE it checks process.argv
  • Node FS in library_nodefs.js and library_noderawfs.js
  • global object
    • library.js
      • gates checks to global - could consider moving to globalThis
  • HTTP
    • library_fs.js
      • !WORKER: throws an error if no XMLHttpRequest
    • preamble.js
      • BROWSER/WORKER: uses fetch to get binary
      • WEBVIEW: fetches synchronously
    • shell.js
      • various methods of getting _scriptDir/scriptDirectory
    • source_map_support.js
      • WEB/WORKER get source maps with fetch
  • system
    • library.js
      • NODE: handles system calls with child_process
  • timers
    • library_browser.js
      • WORKER: emulates setImmediate using postMessage
  • TTY
    • library_tty.js
      • NODE: accesses process.stdin
      • otherwise tries window.prompt and readline
  • WASM caching
    • preamble.js
      • NODE: uses fs to cache wasm files
  • WebSocket
    • library_sockfs.js
      • NODE: use ws module, looks like a slightly different API too

@kripken
Copy link
Member Author

kripken commented Sep 15, 2020

@curiousdannii

Thanks for the writeup!

Overall I think that looks good. I think most work could be for fetch and XMLHttpRequest. Otherwise I am a little unsure about node-only things like NODE: handles system calls with child_process - without a Web API to fill in, we may just want to create a new API for those. But I think for most using a Web API would be feasible and best.

deterministic.js - NODE: overwrites process.hrtime?? Very weird.

The DETERMINISTIC flag does that, yeah - it removes all timing (Date.now, etc.) so every run of the program returns the same result. Can be useful for debugging, and probably nothing else.

We don't need to start making the whole of Emscripten's output async, but if we can put this environment stuff into an async addRunDependency or the like, then it can solve our Node ES modules issues. Those issues being that the import function is async.

I think we can do that in the approach suggested here. Specifically something like

// impl of some API
var internalImplDetail; // filled in from an async import
function impl() {
  internalImplDetail(); // assumes it was filled in
}

addRunDependency("node.js async imports");

// Just a sketch, I have no idea what this looks like in reality!
waitForNodeAsyncImport().then((imports) => {
  internalImplDetail = imports.someDetail; // fill it in now
  removeRunDependency("node.js async imports");
});

@curiousdannii
Copy link
Contributor

Would that be the contents of env/node.js? Would that be okay to just copy into the source, it wouldn't need to be wrapped? If the run dependency system will wait for the dependency to be finished I guess it would.

@kripken
Copy link
Member Author

kripken commented Sep 17, 2020

@curiousdannii yes, exactly, something like that would go in env/node.js.

@stale
Copy link

stale bot commented Sep 21, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Sep 21, 2021
@curiousdannii
Copy link
Contributor

Bump

@kripken
Copy link
Member Author

kripken commented May 26, 2023

This approach may eventually be very simple to implement, as the environments keep converging:

  • Node/Deno/Bun support more and more Web APIs, so perhaps we can just default to -sENVIRONMENT=web at some point, when that runs in all places. (see Check globalThis.Deno when setting ENVIRONMENT_IS_NODE #19359, that might already be the case for some things)
  • Deno and Bun support more and more Node APIs (in particular the crucial fs API that we use a lot), so perhaps all we need aside from web is node.

In that case maybe we'd just write all code for the web, and have a setting "also support node" which would add sync fs stuff etc. for Node/Deno/Bun.

@Sec-ant
Copy link

Sec-ant commented Dec 26, 2023

Bun has evolved into 1.0, a gentle bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants