Skip to content

Consolidate explanation of modules into a new Modules.md and improve explanation #270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 3, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ even before there is any native support.

As explained in the [high-level goals](HighLevelGoals.md), to achieve a Minimum Viable Product, the
initial focus is on [C/C++](CAndC++.md).
However, by [integrating with JS at the ES6 Module interface](MVP.md#modules),
However, by [integrating with JS at the ES6 Module interface](Modules.md#integration-with-es6-modules),
web developers don't need to write C++ to take advantage of libraries that others have written;
reusing a modular C++ library can be as simple as [using a module from JS](http://jsmodules.io).

Expand Down
88 changes: 17 additions & 71 deletions MVP.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,84 +11,15 @@ even on mobile devices, which leads to roughly the same functionality as
This document explains the contents of the MVP at a high-level. There are also
separate docs with more precise descriptions of:

* [Modules](Modules.md)
* [Polyfill to JavaScript](Polyfill.md);
* [AST semantics](AstSemantics.md);
* [Binary encoding](BinaryEncoding.md);
* [Text format](TextFormat.md);
* Implementation [in the browser](Web.md) and [outside the browser](NonWeb.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this note for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to, I was just thinking that this was originally added when V1.md had a bunch of text and so now it seems a bit ad hoc and asymmetric since most files don't have it. The important thing is the Readme.md has it bold. Still want to keep?


**Note**: This content is still in flux and open for discussion.

## Modules

* The primary unit of loadable, executable code is a **module**.
* A module can declare a subset of its functions and global variables to be
**exports**. The meaning of exports (how and when they are called) is defined
by the host environment. For example, `_start` and `init` can be the only
meaningful exports.
* A module can declare a set of **imports**. An import is a tuple containing a
module name, export name, and the type to use for the import within the
module. The host environment controls the mapping from module name to which
module is loaded.
* The spec defines the semantics of loading and calling exports of a *single*
module. The meaning of a call to an import is defined by the host environment.
* In a minimal shell environment, imports could be limited to builtin modules
(implemented by the shell) and/or shell scripts.
* The [dynamic linking](FutureFeatures.md#dynamic-linking) post-MVP feature
would extend the semantics to include multiple modules and thus allow sharing
linear memory and pointers. Dynamic linking would be semantically distinct from
importing, however.
* When compiling from C++, imports would be generated for unresolved `extern`
functions and calls to those `extern` functions would call the import.
* Host environments can define builtin modules that are implemented natively but
can otherwise be imported like [other modules](MVP.md#modules). As examples:
* A WebAssembly shell might define a builtin `stdio` library with an export
`puts`.
* In the browser, the WebIDL support mentioned in
[future features](FutureFeatures.md).
* Any [ABI](https://en.wikipedia.org/wiki/Application_binary_interface) for
statically linked libraries will be specific to your source language compiler.
In the future, [standard ABIs may be defined](FutureFeatures.md#dynamic-linking)
to allow for compatibility between compilers and versions of compilers.
* **TODO**: there is more to discuss here concerning APIs.

## Module structure

* At the top level, a module is ELF-like: a sequence of sections which declare
their type and byte-length.
* Sections with unknown types would be skipped without error.
* Standardized section types:
* module import section;
* globals section (constants, signatures, variables);
* code section;
* memory initialization section.

## Code section

* The code section begins with a table of functions containing the signatures
and offsets of each function followed by the list of function bodies. This
allows parallel and streaming decoding, validation and compilation.
* A function body consists of a set of typed variable bindings and an AST
closed under these bindings.
* The [Abstract Syntax Tree](AstSemantics.md) is composed of two primary kinds
of nodes: statements and expressions.
* [Control flow](AstSemantics.md#control-flow-structures) is structured (no
`goto`).

## Binary format

* A [binary format](BinaryEncoding.md) provides efficiency: it reduces download
size and accelerates decoding, thus enabling even very large codebases to have
quick startup times. Towards that goal, the binary format will be natively
decoded by browsers.
* The binary format has an equivalent and isomorphic
[text format](MVP.md#text-format). Conversion from one format to the other is
both straightforward and causes no loss of information in either direction.

## Text format

The [text format](TextFormat.md) provides readability to developers, and is
isomorphic to the [binary format](BinaryEncoding.md).

## Linear Memory

* In the MVP, when a WebAssembly module is loaded, it creates a new linear memory which
Expand All @@ -105,6 +36,21 @@ isomorphic to the [binary format](BinaryEncoding.md).
detaches any existent `ArrayBuffer`.
* See the [AST Semantics linear memory section](AstSemantics.md#linear-memory)
for more details.

## Binary format

* A [binary format](BinaryEncoding.md) provides efficiency: it reduces download
size and accelerates decoding, thus enabling even very large codebases to have
quick startup times. Towards that goal, the binary format will be natively
decoded by browsers.
* The binary format has an equivalent and isomorphic
[text format](MVP.md#text-format). Conversion from one format to the other is
both straightforward and causes no loss of information in either direction.

## Text format

The [text format](TextFormat.md) provides readability to developers, and is
isomorphic to the [binary format](BinaryEncoding.md).

## Security

Expand Down
127 changes: 127 additions & 0 deletions Modules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Modules

The distributable, loadable, and executable unit of code in WebAssembly
is called a **module**. A module contains:
* a set of [imports and exports](Modules.md#imports-and-exports);
* a section defining the [initial state of linear memory](Modules.md#initial-state-of-linear-memory);
* a section containing [code](Modules.md#code-section);
* after the MVP, sections containing [debugging/symbol information](Tooling.md) or
a reference to separate files containing them; and
* possibly other sections in the future.
Sections declare their type and byte-length. Sections with unknown types are
silently ignored.

While WebAssembly modules are designed to interoperate with ES6 modules
in a Web environment (more details [below](Modules.md#integration-with-es6-modules)),
WebAssembly modules are defined independently of JavaScript and do not require
the host environment to include a JavaScript VM.

## Imports and Exports

A module defines a set of functions in its
[code section](Modules.md#code-section) and can declare and name a subset of
these functions to be **exports**. The meaning of exports (how and when they are
called) is defined by the host environment. For example, a minimal shell
environment might only probe for and call a `_start` export when given a module
to execute.

A module can declare a set of **imports**. An import is a tuple containing a
module name, the name of an exported function to import from the named module,
and the signature to use for that import within the importing module. Within a
module, the import can be [directly called](AstSemantics.md#calls) like a
function (according to the signature of the import). When the imported
module is also WebAssembly, it would be an error if the signature of the import
doesn't match the signature of the export.

The WebAssembly spec does not define how imports are interpreted:
* the host environment can interpret the module name as a file path, a URL,
a key in a fixed set of builtin modules or the host environment may invoke a
user-defined hook to resolve the module name to one of these;
* the module name does not need to resolve to a WebAssembly module; it
could resolve to a builtin module (implemented by the host environment) or a
module written in another, compatible language; and
* the meaning of calling an imported function is host-defined.

The open-ended nature of module imports allow them to be used to expose
arbitrary host environment functionality to WebAssembly code, similar to a
native `syscall`. For example, a shell environment could define a builtin
`stdio` module with an export `puts`.

In C/C++, an undefined `extern` declaration (perhaps only when given the
magic `__attribute__` or declared in a separate list of imports) could be
compiled to an import and C/C++ calls to this `extern` would then be compiled
to calls to this import. This is one way low-level C/C++ libraries could call
out of WebAssembly in order to implement portable source-level interfaces
(e.g., POSIX, OpenGL or SDL) in terms of host-specific functionality.

### Integration with ES6 modules

While ES6 defines how to parse, link and execute a module, ES6 does not
define when this parsing/linking/execution occurs. An additional extension
to the HTML spec is required to say when a script is parsed as a module instead
of normal global code. This work is [ongoing](http://TODO:link-to-loader-level-0-repo).
Currently, the following entry points for modules are being considered:
* `<script type="module">`;
* an overload to the `Worker` constructor;
* an overload to the `importScripts` Worker API;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these what wasm would want?

We kind of discussed this in #84, it would be good to update that issue and close it if we have a believable solution.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mention below, using an ES module loading mechanism to load wasm is something I don't see fitting into a level 0 Loader spec, so I think it still might be prudent for wasm to offer some more explicit hooks for loading from HTML (such as those discussed in #84). That really depends on the timelines for various parts of this: loading wasm code, importing ES from wasm, importing wasm from ES.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfbastien Ideally yes, in my mind. That would mean, e.g., it should be possible to swap out a JS module with a wasm module (or vice versa) by simply changing the contents of the file. If we agree on this, happy to comment on #84.

@ajklein If we know what we want in "level 1", then it doesn't seem great to add a separate way to load a wasm module (unless we also wanted that separate mechanism to load ES6). My expectation here is that we can do "level 0" first (purely in terms of ES6 modules), get that shipped, and follow on fast with wasm.


Additionally, an ES6 module can recursively import other modules via `import`
statements.

For WebAssembly/ES6 module integration, the idea is that all the above module
entry points could also load WebAssembly modules simply by passing the URL of a
WebAssembly module. The distinction of whether the module was WebAssembly or ES6
code could be made by namespacing or by content sniffing the first bytes of the
fetched resource (which, for WebAssembly, would be a non-ASCII&mdash;and thus
illegal as JavaScript&mdash;[magic number](https://en.wikipedia.org/wiki/Magic_number_%28programming%29)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's beautiful and hacky at the same time :-)

💩wasm

:-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previously discussed offline, this needs some work on the ES Loader side to ensure that it can be specified in terms of whatever hooks end up in Loader "level 1", rather than requiring a dependency from the ES Loader on WebAssembly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this is a spec-factoring problem, not a fundamental implementation problem, is that right? That is, if these were all the same spec, I don't think there would be an issue. Assuming that's right, then I expect we could define some host hooks where the ES6 spec says "if the first byte isn't ASCII, then an error is raised unless the host environment has a separate way to decode this file".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that none of these are the same spec, and we want to keep it that way (so saying it's just a spec factoring problem seems overly dismissive). I'd like to make sure that ES proper need know nothing about wasm, and I'd also hope that the ES Loader wouldn't need to know about it either: whatever sits right above the ES Loader should be able to add the right hooks to the loader (conceptually, anyway).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to be dismissive but if we can agree on an intended behavior, then it seems like it's then becomes a matter of spec engineering to decide how to cut up the spec so that we don't have unintended dependencies. E.g., there is definitely going to be an HTML5 portion of the whole specified loader pipeline so it seems like that is where we could mention both JS and wasm.

Thus, the whole module-loading pipeline (resolving the name to a URL, fetching
the URL, any other [loader hooks](http://whatwg.github.io/loader/)) would be
shared and only the final stage would fork into either the JavaScript parser or
the WebAssembly decoder.

Any non-builtin imports from within a WebAssembly module would be treated as
if they were `import` statements of an ES6 module. If an ES6 module `import`ed
a WebAssembly module, the WebAssembly module's exports would be linked as if
they were the exports of an ES6 module. Once parsing and linking phases
were complete, a WebAssembly module would have its `_start` function called in
place of executing the ES6 module top-level script. By default, multiple
loads of the same module URL (in the same realm) reuse the same singleton
module instance. It may be worthwhile in the future to consider extensions to
allow applications to load/compile/link a module once and instantiate multiple
times (each with a separate heap and global state).

This integration strategy should allow WebAssembly modules to be fairly
interchangeable with ES6 modules (ignoring
[GC/Web API](FutureFeatures.md#gc/dom-integration) signature restrictions of the
WebAssembly MVP) and thus it should be natural to compose a single application
from both kinds of code. This goal motivates the
[semantic design](AstSemantics.md#linear-memory) of giving each WebAssembly
module its own disjoint linear memory. Otherwise, if all modules shared a single
linear memory (all modules with the same realm? origin? window?&mdash;even the
scope of "all" is a nuanced question), a single app using multiple
independent libraries would have to hope that all the WebAssembly modules
transitively used by those libraries "played well" together (e.g., explicitly
shared `malloc` and coordinated global address ranges). Instead, the
[dynamic linking future feature](FutureFeatures.md#dynamic-linking) is intended
to allow *explicitly* sharing linear memory between multiple modules.

## Initial state of linear memory

A module will contain a section declaring the linear memory size (initial and
maximum size allowed by `sbrk`) and the initial contents of memory (analogous
to `.data`, `.rodata`, `.bss` sections in native executables).

## Code section

The WebAssembly spec defines the code section of a module in terms of an
[Abstract Syntax Tree](AstSemantics.md) (AST). Additionally, the spec defines
two concrete representations of the AST: a [binary format](BinaryEncoding.md)
which is natively decoded by the browser and a [text format](TextFormat.md)
which is intended to be read and written by humans. A WebAssembly environment
is only required to understand the binary format; the text format is defined so
that WebAssembly modules can be written by hand (and then converted to binary
with an offline tool) and so that developer tools have a well-defined text
projection of a binary WebAssembly module. This design separates the concerns
of specifying and reasoning about behavior, over-the-wire size and compilation
speed, and ergonomic syntax.

8 changes: 5 additions & 3 deletions NonWeb.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ JavaScript VM present.
The WebAssembly spec will not try to define any large portable libc-like
library. However, certain features that are core to WebAssembly semantics that
are found in native libc *would* be part of the core WebAssembly spec as either
primitive opcodes or a special builtin module (e.g., `sbrk`, `dlopen`).
primitive opcodes or a function exported by a
[builtin module](Modules.md#imports-and-exports) (e.g., `sbrk`, `dlopen`).

Where there is overlap between the Web and popular non-Web environments,
shared specs could be proposed, but these would be separate from the WebAssembly
Expand All @@ -32,8 +33,9 @@ However, for most cases it is expected that, to achieve portability at the
source code level, communities would build libraries that mapped from a
source-level interface to the host environment's builtin capabilities
(either at build time or runtime). WebAssembly would provide the raw building
blocks (feature testing, dynamic loading) to make these libraries possible.
Two early expected examples are POSIX and SDL.
blocks (feature testing, [builtin modules](Modules.md#imports-and-exports) and
dynamic loading) to make these libraries possible. Two early expected examples
are POSIX and SDL.

In general, by keeping the non-Web path such that it doesn't require
Web APIs, WebAssembly could be used as a portable binary format on many
Expand Down
31 changes: 8 additions & 23 deletions Web.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,36 +9,22 @@ the Web's security model, preserving the Web's portability, and designing in
room for evolutionary development. Many of these goals are clearly
reflected in WebAssembly's [high-level goals](HighLevelGoals.md).

# Implementation Details

We've identified interesting implementation approaches which help convince us
that the design, especially that of the [MVP](MVP.md), are sensible:
More concretely, the following is a list of points of contact between WebAssembly
and the rest of the Web platform that have been considered:

* WebAssembly's [modules](Modules.md) allow for natural [integration with
the ES6 module system](Modules.md#integration-with-es6-modules) and allow
synchronous calling to and from JavaScript.
* WebAssembly's security model should depend on [CORS][] and
[subresource integrity][] to enable distribution, especially through content
distribution networks and to implement
[dynamic linking](FutureFeatures.md#dynamic-linking).
* A [module](MVP.md#modules) can be loaded in the same way as an ES6 module
(`import` statements, `Reflect` API, `Worker` constructor, etc) and the result
is reflected to JS as an ES6 module object.
- Exports are the ES6 module object exports.
- An import first passes the module name to the [module loader pipeline][] and
resulting ES6 module (which could be implemented in JS or WebAssembly) is
queried for the export name.
- There is no special case for when one WebAssembly module imports another:
they have separate [memory](MVP.md#linear-memory) and pointers cannot be passed
between the two. Module imports encapsulate the importer and
importee. [Dynamic linking](FutureFeatures.md#dynamic-linking) should be
used to share memory and pointers across modules.
- To synchronously call into JavaScript from C++, the C++ code would declare
and call an undefined `extern` function and the target JavaScript function
would be given the (mangled) name of the `extern` and put inside the
imported ES6 module.
* Once [threads are supported](PostMVP.md#threads), a WebAssembly module would
initially be distributed between workers via `postMessage()`.
shared (including its heap) between workers via `postMessage()`.
- This also has the effect of explicitly sharing code so that engines don't
perform N fetches and compile N copies.
- May later standardize a more direct way to create a thread from WebAssembly.
- WebAssembly may later standardize a more direct way to create a thread that
doesn't involve creating a new Worker.
* Once [SIMD is supported](PostMVP.md#fixed-width-simd), a Web implementation of
WebAssembly would:
- Be statically typed analogous to [SIMD.js-in-asm.js][];
Expand All @@ -47,6 +33,5 @@ that the design, especially that of the [MVP](MVP.md), are sensible:

[CORS]: https://www.w3.org/TR/cors/
[subresource integrity]: https://www.w3.org/TR/SRI/
[module loader pipeline]: https://whatwg.github.io/loader
[SIMD.js-in-asm.js]: http://discourse.specifiction.org/t/request-for-comments-simd-js-in-asm-js