From 04bb14ff8e0ad187f342e9ba9394469617bf6984 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:13:59 +0200 Subject: [PATCH 01/14] Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. --- JS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/JS.md b/JS.md index 5d01edb0..1c235f7e 100644 --- a/JS.md +++ b/JS.md @@ -66,7 +66,7 @@ asynchronous, background, streaming compilation. A `WebAssembly.Module` object represents the stateless result of compiling a WebAssembly binary-format module and contains one internal slot: * [[Module]] : an [`Ast.module`](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/ast.ml#L208) - which is the spec definition of a validated module AST + which is the spec definition of a validated module ### `WebAssembly.Module` Constructor @@ -82,8 +82,8 @@ If the given `bytes` argument is not a a `TypeError` exception is thrown. Otherwise, this function performs synchronous compilation of the `BufferSource`: -* The byte range delimited by the `BufferSource` is first logically decoded into - an AST according to [BinaryEncoding.md](BinaryEncoding.md) and then validated +* The byte range delimited by the `BufferSource` is first logically decoded + according to [BinaryEncoding.md](BinaryEncoding.md) and then validated according to the rules in [spec/check.ml](https://github.com/WebAssembly/spec/blob/master/ml-proto/spec/check.ml#L325). * The spec `string` values inside `Ast.module` are decoded as UTF8 as described in [Web.md](Web.md#names). From d2ae37a79b38d895010ef1f048c52a048c8f6276 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:16:36 +0200 Subject: [PATCH 02/14] Use "instruction" instead of "AST operator" --- FutureFeatures.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index 0ae0a631..934facc3 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -370,8 +370,8 @@ general-purpose use on several of today's popular hardware architectures. ## Better feature testing support The [MVP feature testing situation](FeatureTest.md) could be improved by -allowing unknown/unsupported AST operators to decode and validate. The runtime -semantics of these unknown operators could either be to trap or call a +allowing unknown/unsupported instructions to decode and validate. The runtime +semantics of these unknown instructions could either be to trap or call a same-signature module-defined polyfill function. This feature could provide a lighter-weight alternative to load-time polyfilling (approach 2 in [FeatureTest.md](FeatureTest.md)), especially if the [specific layer](BinaryEncoding.md) From d518661ea1a9d371d16f7a4cdcb3f4876544e254 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:31:26 +0200 Subject: [PATCH 03/14] Update rationale for stack machine --- Rationale.md | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/Rationale.md b/Rationale.md index ee29f952..7a455394 100644 --- a/Rationale.md +++ b/Rationale.md @@ -13,10 +13,17 @@ codebases, we'll revisit the alternatives listed below, reevaluate the tradeoffs and update the [design](AstSemantics.md) before the MVP is finalized. -## Why AST? +## Why a stack machine? -Why not a register- or SSA-based bytecode? -* Trees allow a smaller binary encoding: [JSZap][], [Slim Binaries][]. +Why not an AST, or a register- or SSA-based bytecode? + +* We started with an AST and generalized to a (restricted) stack machine. ASTs allow a + dense encoding and (with postorder) an efficient decoding, compilation, and interpretation. + The stack machine is a generalization of ASTs allowed in previous versions while allowing + efficiency gains in interpretation and baseline compilation, as well as a straightforward + design for multi-return functions. +* The stack machine allows smaller binary encoding than registers or SSA, and structured control + flow preserves the size advantages of an AST: [JSZap][], [Slim Binaries][]. * [Polyfill prototype][] shows simple and efficient translation to asm.js. [JSZap]: https://research.microsoft.com/en-us/projects/jszap/ @@ -26,15 +33,10 @@ Why not a register- or SSA-based bytecode? ## Why not a fully-general stack machine? -Stack machines have all the code size advantages as expression trees represented -in post-order. However, we wish to avoid requiring an explicit expression stack at -runtime, because many implementations will want to use registers rather than an -actual stack for evaluation. Consequently, while it's possible to think about -wasm expression evaluation in terms of a conceptual stack machine, the stack -machine would be constrained such that one can always statically know the types, -definitions, and uses of all operands on the stack, so that an implementation can -connect definitions with their uses through whatever mechanism they see fit. - +The WebAssembly stack machine is restricted to structured control flow and structured +use of the stack. This greatly simplifies one-pass verification, avoiding a fixpoint computation +like that of the Java Virtual Machine, as well as compilation and manipulating of +WebAssembly code by other tools. ## Basic Types Only @@ -44,7 +46,7 @@ WebAssembly only represents [a few types](AstSemantics.md#Types). language compiler to express its own types in terms of the basic machine types. This allows WebAssembly to present itself as a virtual ISA, and lets compilers target it as they would any other ISA. -* These types are efficiently executed by all modern CPU architectures. +* These types are directly representable on all modern CPU architectures. * Smaller types (such as `i8` and `i16`) are usually no more efficient and in languages like C/C++ are only semantically meaningful for memory accesses since arithmetic get widened to `i32` or `i64`. Avoiding them at least for MVP @@ -470,20 +472,20 @@ Yes: [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing *just* parsing in SpiderMonkey (no validation, IR generation) to *just* decoding in the polyfill (no asm.js code generation). -* A binary format enables optimizations that reduce the memory usage of decoded - ASTs without increasing size or reducing decode speed. +* A binary format allows many optimizations for code size and decoding speed that would + not be possible on a source form. ## Why a layered binary encoding? -* We can do better than generic compression because we are aware of the AST +* We can do better than generic compression because we are aware of the code structure and other details: * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/design/issues/58#issuecomment-101863032) - can focus on AST nodes + their children, thus having `O(nodes)` entities + can focus on ASTs + their children, thus having `O(nodes)` entities to worry about, compared to generic compression which in principle would need to look at `O(bytes*bytes)` entities. Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be - parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful + parameterized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful features like constant pools. * Another example is reordering of functions and some internal nodes, which we know does not change semantics, but From b92cb8d51eeaffa4a46191d57144f7287fa02a86 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:31:55 +0200 Subject: [PATCH 04/14] Update Rationale.md --- Rationale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Rationale.md b/Rationale.md index 7a455394..0aa5b728 100644 --- a/Rationale.md +++ b/Rationale.md @@ -18,7 +18,7 @@ and update the [design](AstSemantics.md) before the MVP is finalized. Why not an AST, or a register- or SSA-based bytecode? * We started with an AST and generalized to a (restricted) stack machine. ASTs allow a - dense encoding and (with postorder) an efficient decoding, compilation, and interpretation. + dense encoding and efficient decoding, compilation, and interpretation. The stack machine is a generalization of ASTs allowed in previous versions while allowing efficiency gains in interpretation and baseline compilation, as well as a straightforward design for multi-return functions. From 3eed3576f75f59e4bc9faf2bb734e0233e4adb1b Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:37:11 +0200 Subject: [PATCH 05/14] Update discussion of expression trees --- Rationale.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/Rationale.md b/Rationale.md index 0aa5b728..3ce0f2ff 100644 --- a/Rationale.md +++ b/Rationale.md @@ -282,17 +282,16 @@ segregating the table per signature to require only a bounds check could be cons in the future. Also, if tables are small enough, an engine can internally use per-signature tables filled with failure handlers to avoid one check. -## Expressions with Control Flow - -Expression trees offer significant size reduction by avoiding the need for -`set_local`/`get_local` pairs in the common case of an expression with only one -immediate use. Control flow "statements" are in fact expressions with result -values, thus allowing even more opportunities to build bigger -expression trees and further reduce `set_local`/`get_local` usage (which -constitute 30-40% of total bytes in the +## Control Flow Instructions with Values + +Control flow instructions such as `br`, `br_if`, `br_table`, `if` and `if-else` can +transfer stack values in WebAssembly. These primitives are useful building blocks for +WebAssembly producers, e.g. in compiling expression languages. It offers significant +size reduction by avoiding the need for `set_local`/`get_local` pairs in the common case +of an expression with only one immediate use. Control flow instructions can then model +expressions with result values, thus allowing even more opportunities to further reduce +`set_local`/`get_local` usage (which constitute 30-40% of total bytes in the [polyfill prototype](https://github.com/WebAssembly/polyfill-prototype-1)). -Additionally, these primitives are useful building blocks for -WebAssembly-generators (including the JavaScript polyfill prototype). ## Limited Local Nondeterminism From 51d502af688a59425de7314e4e938bd17916b9bc Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:39:57 +0200 Subject: [PATCH 06/14] Update MVP.md --- MVP.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/MVP.md b/MVP.md index 7340c565..439c333c 100644 --- a/MVP.md +++ b/MVP.md @@ -12,14 +12,14 @@ The major design components of the MVP have been broken up into separate documents: * The distributable, loadable and executable unit of code in WebAssembly is called a [module](Modules.md). -* The behavior of WebAssembly code in a module is specified in terms of an - [AST](AstSemantics.md). +* The behavior of WebAssembly code in a module is specified in terms of + [instructions](AstSemantics.md) for a structured stack machine. * The WebAssembly binary format, which is designed to be natively decoded by WebAssembly implementations, is specified as a - [binary serialization](BinaryEncoding.md) of a module's AST. + [binary encoding](BinaryEncoding.md) of the module's instructions. * The WebAssembly text format, which is designed to be read and written when using tools (e.g., assemblers, debuggers, profilers), is specified as a - [textual projection](TextFormat.md) of a module's AST. + [textual projection](TextFormat.md) of a module's instructions. * WebAssembly is designed to be implemented both [by web browsers](Web.md) and [completely different execution environments](NonWeb.md). * To ease the transition to WebAssembly while native support is still From 315b6d7b5650712eb7eac6981ab6579752c82c34 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:41:40 +0200 Subject: [PATCH 07/14] Update Rationale.md --- Rationale.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Rationale.md b/Rationale.md index 3ce0f2ff..4c5933c1 100644 --- a/Rationale.md +++ b/Rationale.md @@ -17,7 +17,7 @@ and update the [design](AstSemantics.md) before the MVP is finalized. Why not an AST, or a register- or SSA-based bytecode? -* We started with an AST and generalized to a (restricted) stack machine. ASTs allow a +* We started with an AST and generalized to a [structured stack machine][AstSemantics.md]. ASTs allow a dense encoding and efficient decoding, compilation, and interpretation. The stack machine is a generalization of ASTs allowed in previous versions while allowing efficiency gains in interpretation and baseline compilation, as well as a straightforward @@ -35,8 +35,8 @@ Why not an AST, or a register- or SSA-based bytecode? The WebAssembly stack machine is restricted to structured control flow and structured use of the stack. This greatly simplifies one-pass verification, avoiding a fixpoint computation -like that of the Java Virtual Machine, as well as compilation and manipulating of -WebAssembly code by other tools. +like that of other stack machines such as the Java Virtual Machine. +This also simplifies compilation and manipulating of WebAssembly code by other tools. ## Basic Types Only From 464fe0b480f9060e4c96c93ef60211b09c0c2043 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:42:09 +0200 Subject: [PATCH 08/14] Update Rationale.md --- Rationale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Rationale.md b/Rationale.md index 4c5933c1..c25bb375 100644 --- a/Rationale.md +++ b/Rationale.md @@ -36,7 +36,7 @@ Why not an AST, or a register- or SSA-based bytecode? The WebAssembly stack machine is restricted to structured control flow and structured use of the stack. This greatly simplifies one-pass verification, avoiding a fixpoint computation like that of other stack machines such as the Java Virtual Machine. -This also simplifies compilation and manipulating of WebAssembly code by other tools. +This also simplifies compilation and manipulation of WebAssembly code by other tools. ## Basic Types Only From fad3c17b4397aca34fd896b51daf3f03dd312176 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:44:32 +0200 Subject: [PATCH 09/14] Remove references to expressions --- FutureFeatures.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index 934facc3..3f373bd8 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -315,7 +315,7 @@ operators the possibility of having side effects. Debugging techniques are also important, but they don't necessarily need to be in the spec itself. Implementations are welcome (and encouraged) to support non-standard execution modes, enabled only from developer tools, such as modes -with alternate rounding, or evaluation of floating point expressions at greater +with alternate rounding, or evaluation of floating point operators at greater precision, to support [techniques for detecting numerical instability] (https://www.cs.berkeley.edu/~wkahan/Mindless.pdf), or modes using alternate NaN bitpattern rules, to carry diagnostic information and help developers track @@ -442,7 +442,7 @@ see [JavaScript's `WebAssembly.Table` API](JS.md#webassemblytable-objects)). It would be useful to be able to do everything from within WebAssembly so, e.g., it was possible to write a WebAssembly dynamic loader in WebAssembly. As a prerequisite, WebAssembly would need first-class support for -[GC references](GC.md) in expressions and locals. Given that, the following +[GC references](GC.md) on the stack and in locals. Given that, the following could be added: * `get_table`/`set_table`: get or set the table element at a given dynamic index; the got/set value would have a GC reference type From 1201fa8ea5e0bc12cb0193ce159d6bbae683f946 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:48:15 +0200 Subject: [PATCH 10/14] Update Rationale.md --- Rationale.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/Rationale.md b/Rationale.md index c25bb375..f76e309f 100644 --- a/Rationale.md +++ b/Rationale.md @@ -325,12 +325,11 @@ and local manner. This prevents the entire program from being invalid, as would be the case with C++ undefined behavior. As WebAssembly gets implemented and tested with multiple languages on multiple -architectures there may be a need to revisit some of the decisions: +architectures we may revisit some of the design decisions: -* When all relevant hardware implement features the same way then there's no - need to add nondeterminism to WebAssembly when realistically there's only one - mapping from WebAssembly expression to ISA-specific operators. One such - example is floating-point: at a high-level most basic instructions follow +* When all relevant hardware implements an operation the same way, there's no + need for nondeterminism in WebAssembly semantics. One such + example is floating-point: at a high-level most operators follow IEEE-754 semantics, it is therefore not necessary to specify WebAssembly's floating-point operators differently from IEEE-754. * When different languages have different expectations then it's unfortunate if From 67a2253b07d4414692902ff56b7ba01174bf5c4a Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 22 Sep 2016 11:51:24 +0200 Subject: [PATCH 11/14] Update Rationale.md --- Rationale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Rationale.md b/Rationale.md index f76e309f..eaf5876f 100644 --- a/Rationale.md +++ b/Rationale.md @@ -17,7 +17,7 @@ and update the [design](AstSemantics.md) before the MVP is finalized. Why not an AST, or a register- or SSA-based bytecode? -* We started with an AST and generalized to a [structured stack machine][AstSemantics.md]. ASTs allow a +* We started with an AST and generalized to a [structured stack machine](AstSemantics.md). ASTs allow a dense encoding and efficient decoding, compilation, and interpretation. The stack machine is a generalization of ASTs allowed in previous versions while allowing efficiency gains in interpretation and baseline compilation, as well as a straightforward From 6b0f4d7d2f4984ce161342c6cfb6d6b3449cb996 Mon Sep 17 00:00:00 2001 From: titzer Date: Fri, 23 Sep 2016 11:16:56 +0200 Subject: [PATCH 12/14] Address review comments --- Rationale.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/Rationale.md b/Rationale.md index eaf5876f..2ec2d767 100644 --- a/Rationale.md +++ b/Rationale.md @@ -19,11 +19,12 @@ Why not an AST, or a register- or SSA-based bytecode? * We started with an AST and generalized to a [structured stack machine](AstSemantics.md). ASTs allow a dense encoding and efficient decoding, compilation, and interpretation. - The stack machine is a generalization of ASTs allowed in previous versions while allowing + The structured stack machine of WebAssembly is a generalization of ASTs allowed in previous versions while allowing efficiency gains in interpretation and baseline compilation, as well as a straightforward design for multi-return functions. -* The stack machine allows smaller binary encoding than registers or SSA, and structured control - flow preserves the size advantages of an AST: [JSZap][], [Slim Binaries][]. +* The stack machine allows smaller binary encoding than registers or SSA [JSZap][], [Slim Binaries][], + and structured control flow allows simpler and more efficient verification, including decoding directly + to a compiler's internal SSA form. * [Polyfill prototype][] shows simple and efficient translation to asm.js. [JSZap]: https://research.microsoft.com/en-us/projects/jszap/ @@ -35,8 +36,10 @@ Why not an AST, or a register- or SSA-based bytecode? The WebAssembly stack machine is restricted to structured control flow and structured use of the stack. This greatly simplifies one-pass verification, avoiding a fixpoint computation -like that of other stack machines such as the Java Virtual Machine. +like that of other stack machines such as the Java Virtual Machine (prior to [stack maps](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html)). This also simplifies compilation and manipulation of WebAssembly code by other tools. +Further generalization of the WebAssembly stack machine is planned post-MVP, such as the +addition of multiple return values from control flow constructs and function calls. ## Basic Types Only From 19ade4b44ffb1d41e2b8b021762b1ac86caa62e2 Mon Sep 17 00:00:00 2001 From: titzer Date: Fri, 23 Sep 2016 11:19:05 +0200 Subject: [PATCH 13/14] Address review comments --- Rationale.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Rationale.md b/Rationale.md index 2ec2d767..fd194bd0 100644 --- a/Rationale.md +++ b/Rationale.md @@ -182,7 +182,7 @@ See [#107](https://github.com/WebAssembly/spec/pull/107). ## Control Flow Structured control flow provides simple and size-efficient binary encoding and -compilation. Any control flow—even irreducible—can be transformed into structured +compilation. Any control flow--even irreducible--can be transformed into structured control flow with the [Relooper](https://github.com/kripken/emscripten/raw/master/docs/paper.pdf) [algorithm](http://dl.acm.org/citation.cfm?id=2048224&CFID=670868333&CFTOKEN=46181900), @@ -295,6 +295,8 @@ of an expression with only one immediate use. Control flow instructions can then expressions with result values, thus allowing even more opportunities to further reduce `set_local`/`get_local` usage (which constitute 30-40% of total bytes in the [polyfill prototype](https://github.com/WebAssembly/polyfill-prototype-1)). +`br`-with-value and `if` constructs that return values can model also model `phis` which +appear in SSA representations of programs. ## Limited Local Nondeterminism From 0a3c03cff02c97e99be14cdb13a756fd47b57dd7 Mon Sep 17 00:00:00 2001 From: titzer Date: Fri, 23 Sep 2016 11:20:39 +0200 Subject: [PATCH 14/14] Address review comments --- MVP.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MVP.md b/MVP.md index 439c333c..1d3a978c 100644 --- a/MVP.md +++ b/MVP.md @@ -16,10 +16,10 @@ documents: [instructions](AstSemantics.md) for a structured stack machine. * The WebAssembly binary format, which is designed to be natively decoded by WebAssembly implementations, is specified as a - [binary encoding](BinaryEncoding.md) of the module's instructions. + [binary encoding](BinaryEncoding.md) of a module's structure and code. * The WebAssembly text format, which is designed to be read and written when using tools (e.g., assemblers, debuggers, profilers), is specified as a - [textual projection](TextFormat.md) of a module's instructions. + [textual projection](TextFormat.md) of a module's structure and code. * WebAssembly is designed to be implemented both [by web browsers](Web.md) and [completely different execution environments](NonWeb.md). * To ease the transition to WebAssembly while native support is still