Skip to content

Remove more references to AST #806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 28, 2016
8 changes: 4 additions & 4 deletions FutureFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ operators the possibility of having side effects.
Debugging techniques are also important, but they don't necessarily need to be
in the spec itself. Implementations are welcome (and encouraged) to support
non-standard execution modes, enabled only from developer tools, such as modes
with alternate rounding, or evaluation of floating point expressions at greater
with alternate rounding, or evaluation of floating point operators at greater
precision, to support [techniques for detecting numerical instability]
(https://www.cs.berkeley.edu/~wkahan/Mindless.pdf), or modes using alternate
NaN bitpattern rules, to carry diagnostic information and help developers track
Expand Down Expand Up @@ -370,8 +370,8 @@ general-purpose use on several of today's popular hardware architectures.
## Better feature testing support

The [MVP feature testing situation](FeatureTest.md) could be improved by
allowing unknown/unsupported AST operators to decode and validate. The runtime
semantics of these unknown operators could either be to trap or call a
allowing unknown/unsupported instructions to decode and validate. The runtime
semantics of these unknown instructions could either be to trap or call a
same-signature module-defined polyfill function. This feature could provide a
lighter-weight alternative to load-time polyfilling (approach 2 in
[FeatureTest.md](FeatureTest.md)), especially if the [specific layer](BinaryEncoding.md)
Expand Down Expand Up @@ -442,7 +442,7 @@ see [JavaScript's `WebAssembly.Table` API](JS.md#webassemblytable-objects)).
It would be useful to be able to do everything from within WebAssembly so, e.g.,
it was possible to write a WebAssembly dynamic loader in WebAssembly. As a
prerequisite, WebAssembly would need first-class support for
[GC references](GC.md) in expressions and locals. Given that, the following
[GC references](GC.md) on the stack and in locals. Given that, the following
could be added:
* `get_table`/`set_table`: get or set the table element at a given dynamic
index; the got/set value would have a GC reference type
Expand Down
6 changes: 3 additions & 3 deletions JS.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ asynchronous, background, streaming compilation.
A `WebAssembly.Module` object represents the stateless result of compiling a
WebAssembly binary-format module and contains one internal slot:
* [[Module]] : an [`Ast.module`](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/ast.ml#L208)
which is the spec definition of a validated module AST
which is the spec definition of a validated module

### `WebAssembly.Module` Constructor

Expand All @@ -82,8 +82,8 @@ If the given `bytes` argument is not a
a `TypeError` exception is thrown.

Otherwise, this function performs synchronous compilation of the `BufferSource`:
* The byte range delimited by the `BufferSource` is first logically decoded into
an AST according to [BinaryEncoding.md](BinaryEncoding.md) and then validated
* The byte range delimited by the `BufferSource` is first logically decoded
according to [BinaryEncoding.md](BinaryEncoding.md) and then validated
according to the rules in [spec/check.ml](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/check.ml#L325).
* The spec `string` values inside `Ast.module` are decoded as UTF8 as described in
[Web.md](Web.md#names).
Expand Down
8 changes: 4 additions & 4 deletions MVP.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ The major design components of the MVP have been broken up into separate
documents:
* The distributable, loadable and executable unit of code in WebAssembly
is called a [module](Modules.md).
* The behavior of WebAssembly code in a module is specified in terms of an
[AST](AstSemantics.md).
* The behavior of WebAssembly code in a module is specified in terms of
[instructions](AstSemantics.md) for a structured stack machine.
* The WebAssembly binary format, which is designed to be natively decoded by
WebAssembly implementations, is specified as a
[binary serialization](BinaryEncoding.md) of a module's AST.
[binary encoding](BinaryEncoding.md) of a module's structure and code.
* The WebAssembly text format, which is designed to be read and written when
using tools (e.g., assemblers, debuggers, profilers), is specified as a
[textual projection](TextFormat.md) of a module's AST.
[textual projection](TextFormat.md) of a module's structure and code.
* WebAssembly is designed to be implemented both [by web browsers](Web.md)
and [completely different execution environments](NonWeb.md).
* To ease the transition to WebAssembly while native support is still
Expand Down
73 changes: 39 additions & 34 deletions Rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,18 @@ codebases, we'll revisit the alternatives listed below, reevaluate the tradeoffs
and update the [design](AstSemantics.md) before the MVP is finalized.


## Why AST?

Why not a register- or SSA-based bytecode?
* Trees allow a smaller binary encoding: [JSZap][], [Slim Binaries][].
## Why a stack machine?

Why not an AST, or a register- or SSA-based bytecode?

* We started with an AST and generalized to a [structured stack machine](AstSemantics.md). ASTs allow a
dense encoding and efficient decoding, compilation, and interpretation.
The structured stack machine of WebAssembly is a generalization of ASTs allowed in previous versions while allowing
efficiency gains in interpretation and baseline compilation, as well as a straightforward
design for multi-return functions.
* The stack machine allows smaller binary encoding than registers or SSA [JSZap][], [Slim Binaries][],
and structured control flow allows simpler and more efficient verification, including decoding directly
to a compiler's internal SSA form.
* [Polyfill prototype][] shows simple and efficient translation to asm.js.

[JSZap]: https://research.microsoft.com/en-us/projects/jszap/
Expand All @@ -26,15 +34,12 @@ Why not a register- or SSA-based bytecode?

## Why not a fully-general stack machine?

Stack machines have all the code size advantages as expression trees represented
in post-order. However, we wish to avoid requiring an explicit expression stack at
runtime, because many implementations will want to use registers rather than an
actual stack for evaluation. Consequently, while it's possible to think about
wasm expression evaluation in terms of a conceptual stack machine, the stack
machine would be constrained such that one can always statically know the types,
definitions, and uses of all operands on the stack, so that an implementation can
connect definitions with their uses through whatever mechanism they see fit.

The WebAssembly stack machine is restricted to structured control flow and structured
use of the stack. This greatly simplifies one-pass verification, avoiding a fixpoint computation
like that of other stack machines such as the Java Virtual Machine (prior to [stack maps](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html)).
This also simplifies compilation and manipulation of WebAssembly code by other tools.
Further generalization of the WebAssembly stack machine is planned post-MVP, such as the
addition of multiple return values from control flow constructs and function calls.

## Basic Types Only

Expand All @@ -44,7 +49,7 @@ WebAssembly only represents [a few types](AstSemantics.md#Types).
language compiler to express its own types in terms of the basic machine
types. This allows WebAssembly to present itself as a virtual ISA, and lets
compilers target it as they would any other ISA.
* These types are efficiently executed by all modern CPU architectures.
* These types are directly representable on all modern CPU architectures.
* Smaller types (such as `i8` and `i16`) are usually no more efficient and in
languages like C/C++ are only semantically meaningful for memory accesses
since arithmetic get widened to `i32` or `i64`. Avoiding them at least for MVP
Expand Down Expand Up @@ -177,7 +182,7 @@ See [#107](https://github.com/WebAssembly/spec/pull/107).
## Control Flow

Structured control flow provides simple and size-efficient binary encoding and
compilation. Any control floweven irreduciblecan be transformed into structured
compilation. Any control flow--even irreducible--can be transformed into structured
control flow with the
[Relooper](https://github.com/kripken/emscripten/raw/master/docs/paper.pdf)
[algorithm](http://dl.acm.org/citation.cfm?id=2048224&CFID=670868333&CFTOKEN=46181900),
Expand Down Expand Up @@ -280,17 +285,18 @@ segregating the table per signature to require only a bounds check could be cons
in the future. Also, if tables are small enough, an engine can internally use per-signature
tables filled with failure handlers to avoid one check.

## Expressions with Control Flow
## Control Flow Instructions with Values

Expression trees offer significant size reduction by avoiding the need for
`set_local`/`get_local` pairs in the common case of an expression with only one
immediate use. Control flow "statements" are in fact expressions with result
values, thus allowing even more opportunities to build bigger
expression trees and further reduce `set_local`/`get_local` usage (which
constitute 30-40% of total bytes in the
Control flow instructions such as `br`, `br_if`, `br_table`, `if` and `if-else` can
transfer stack values in WebAssembly. These primitives are useful building blocks for
WebAssembly producers, e.g. in compiling expression languages. It offers significant
size reduction by avoiding the need for `set_local`/`get_local` pairs in the common case
of an expression with only one immediate use. Control flow instructions can then model
expressions with result values, thus allowing even more opportunities to further reduce
Copy link
Member

@rossberg rossberg Sep 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also mention that branches with (especially multiple) arguments can model phis?

`set_local`/`get_local` usage (which constitute 30-40% of total bytes in the
[polyfill prototype](https://github.com/WebAssembly/polyfill-prototype-1)).
Additionally, these primitives are useful building blocks for
WebAssembly-generators (including the JavaScript polyfill prototype).
`br`-with-value and `if` constructs that return values can model also model `phis` which
appear in SSA representations of programs.


## Limited Local Nondeterminism
Expand Down Expand Up @@ -324,12 +330,11 @@ and local manner. This prevents the entire program from being invalid, as would
be the case with C++ undefined behavior.

As WebAssembly gets implemented and tested with multiple languages on multiple
architectures there may be a need to revisit some of the decisions:
architectures we may revisit some of the design decisions:

* When all relevant hardware implement features the same way then there's no
need to add nondeterminism to WebAssembly when realistically there's only one
mapping from WebAssembly expression to ISA-specific operators. One such
example is floating-point: at a high-level most basic instructions follow
* When all relevant hardware implements an operation the same way, there's no
need for nondeterminism in WebAssembly semantics. One such
example is floating-point: at a high-level most operators follow
IEEE-754 semantics, it is therefore not necessary to specify WebAssembly's
floating-point operators differently from IEEE-754.
* When different languages have different expectations then it's unfortunate if
Expand Down Expand Up @@ -470,20 +475,20 @@ Yes:
[this demo](https://github.com/lukewagner/AngryBotsPacked), comparing
*just* parsing in SpiderMonkey (no validation, IR generation) to *just*
decoding in the polyfill (no asm.js code generation).
* A binary format enables optimizations that reduce the memory usage of decoded
ASTs without increasing size or reducing decode speed.
* A binary format allows many optimizations for code size and decoding speed that would
not be possible on a source form.


## Why a layered binary encoding?
* We can do better than generic compression because we are aware of the AST
* We can do better than generic compression because we are aware of the code
structure and other details:
* For example, macro compression that
[deduplicates AST trees](https://github.com/WebAssembly/design/issues/58#issuecomment-101863032)
can focus on AST nodes + their children, thus having `O(nodes)` entities
can focus on ASTs + their children, thus having `O(nodes)` entities
to worry about, compared to generic compression which in principle would
need to look at `O(bytes*bytes)` entities. Such macros would allow the
logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be
parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
parameterized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
features like constant pools.
* Another example is reordering of functions and some internal nodes, which
we know does not change semantics, but
Expand Down