diff --git a/TextFormat.md b/TextFormat.md index 93608b0f..4a926fe6 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -1,6 +1,7 @@ # Text Format The purpose of this text format is to support: + * View Source on a WebAssembly module, thus fitting into the Web (where every source can be viewed) in a natural way. * Presentation in browser development tools when source maps aren't present @@ -62,11 +63,10 @@ forums, but for now we're keeping it simple. - Use JS-style sensibilities when there aren't reasons otherwise. - It's a compiler target, not a programming language, but readability still counts. - ## High-level summary: - Curly braces for function bodies, blocks, etc., `/* */`-style and `//`-style - comments, and whitespace is not significant. Also, no semicolons. + comments, and whitespace is not significant. (TODO: Should `/* */`-style comments nest properly?) - `get_local` looks like a simple reference; `set_local` looks like an @@ -75,47 +75,71 @@ forums, but for now we're keeping it simple. - Infix syntax for arithmetic, with simple overloading. Explicit grouping via parentheses. Concise and familiar with JS and others. (TODO: Use C/JS-style - operator precedence, or fix + operator precedence, or fix [an old mistake](http://www.lysator.liu.se/c/dmr-on-or.html)?) - - Prefix syntax with comma-separated operands for all other operators. For less - frequent opcodes, prefer just presenting operator names, so that they're easy - to identify. + - Prefix syntax with operands in parentheses for most other operators (e.g. + `i32.rotl($0, 8)`). For less frequent opcodes, prefer just presenting operator + names, so that they're easy to identify. - Typescript-style `name : type` declarations. - - Parentheses around call arguments, eg. `call $functionname(arg, arg, arg)`, - and `if` conditions, eg. `if ($condition) { call $then() } else { call $else() }`, + - Parentheses around call arguments, eg. `$functionname(arg, arg, arg)`, + and `if` conditions, eg. `if ($condition) { $then() } else { $else() }`, because they're familiar to many people and not too intrusive. - Allow highly complex trees to be syntactically split up into readable parts. - Put labels "where they go". + - The text format will be compatible with the [LES](http://loyc.net/les) text + format. It _is not_ compatible with the current LES specification, but LES + is in beta and can still be tweaked to wasm's needs. Based on the wasm text + format, a third version of LES (LESv3) will be drafted before the end of 2016. + Meanwhile, the wasm text format will be syntactically constrained in such a + way that it will be an appropriate basis for LESv3. For the MVP, parsers of + the wasm text format will be able to choose whether to use a custom parser + dedicated to wasm or a generic LES parser. + + - TODO: should semicolons should be required at the end of each expression + in a block? If newlines are the primary separator, then LES will cease to + be a superset of JSON (since JSON ignores newlines), but there are benefits + on the flip side (such as eliminating the need for semicolons!). In this + document it is assumed that a newline **does** mark the end of an + expression if the newline does not appear directly inside parentheses (as + inside parentheses, expressions are always terminated by commas or by a + closing parenthesis). In any case it would be useful to _allow_ semicolons, + so that one can write multiple expressions on a single line. ## Examples: ### Basics ``` - function $fac-opt ($a:i64) : (i64) { - var $x:i64 + function $@fac-opt($a:i64) : i64 { + $x:i64 $x = 1 - br_if $end ? $a s 1 + br_if loop ? $a > 1 } - $end: + :end $x } ``` (hand-translated from [fac.wast](https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wast)) -The function return type has parentheses for symmetry with the parameter types, -anticipating adding multiple return values to wasm in the future. +The `$` sigil on function and variable names cleanly ensures that they never +collide with wasm keywords, present or future. The `@` sign on `fac-opt` allows +certain special characters to appear in identifiers, such as `-` which would +otherwise be treated as a subtraction operator. + +The function return type can have parentheses (`: (i64)`) for symmetry with the +parameter types, since we anticipate adding multiple return values to wasm in the +future, but they are not required. The curly braces around the function body are not a `block` node; they are part of the function syntax, reflecting how function bodies in wasm are block-like. @@ -123,21 +147,18 @@ of the function syntax, reflecting how function bodies in wasm are block-like. The last expression of the function body here acts as its return value. This works in all block-like constructs (`block`, function body, `if`, etc.) -`>s` means *signed* greater-than. explicit unsigned or signed operators will be -suffixed with 'u' or 's', respectively. - -The `$` sigil on user names cleanly ensures that they never collide with wasm -keywords, present or future. +`>` means *signed* greater-than. Unsigned operators will have a `|` before the last character of the operator, so `|>` is *unsigned* greater-than. `br_if` uses a question mark to announce the condition operand. `select` does -also. (TODO: Is this too cute?) +also. (TODO: Is this too cute? Also, should the order be reversed as in +`br_if $a < 2 ? end`?) ### Linear memory addresses ``` - function $test_redundant_load () : (i32) { + function $test_redundant_load() : (i32) { i32.load [8,+0] - f32.store [5,+0], -0x0p0 + f32.store [5,+0] = -0x0p0 i32.load [8,+0] } ``` @@ -146,7 +167,8 @@ also. (TODO: Is this too cute?) Addresses are printed as `[base,+offset]`. It could be shortened to `[base]` when there is no offset; I made the offset explicit above just to illustrate the syntax. -There can also be an optional `:align=…` for non-natural alignments. +There can also be an optional `align …` for non-natural alignments, e.g. +`i32.load [8,+0, align 2]`. ### A slightly larger example: @@ -212,9 +234,9 @@ Here's the corresponding LLVM wasm backend output + binaryen + slight tweaks: And here's the proposed text syntax: ``` - function $Q_rsqrt ($0:f32) : (f32) { - var $1:f32 - $1 = f32.reinterpret/i32 (1597463007 - ((i32.reinterpret/f32 $0) >> 1)) + function $Q_rsqrt($0:f32) : (f32) { + $1:f32 + $1 = f32.reinterpret'i32(1597463007 - (i32.reinterpret'f32($0) >> 1)) $1 = $1 * (0x1.8p0 - $1 * ($0 = $0 * 0x1p-1) * $1) $1 * (0x1.8p0 - $1 * $0 * $1) } @@ -246,14 +268,14 @@ Corresponding proposed text syntax: ``` function $loop3 () : (i32) { - var $i:i32 + $i:i32 $i = 0 loop $cont { $i = $i + 1 if ($i == 5) { - br $exit, $i + br exit => $i } - $exit: + :exit } } ``` @@ -262,24 +284,28 @@ Note that the curly braces are part of the `if`, rather than introducing a block. This reflects how `if` essentially provides `block`-like capabilities in the wasm binary format. +Due to syntactic requirements of LES, the colon `:` appears before the label +name (`:exit`) rather than afterward. + ### Nested blocks -Label definitions, like the `$exit:` above, introduce additional blocks nested -within the nearest `{`, without requiring their own `{`. This allows the deep -nesting of `br_table` to be printed in a relatively flat manner: +Label definitions that do not appear at the end of the enclosing block, such as +the `:exit` above, introduce additional blocks nested within the nearest `{`, +without requiring their own `{`. This allows the deep nesting of `br_table` to +be printed in a relatively flat manner: ``` { - br_table [$red, $orange, $yellow, $green], $default, $index - $red: + br_table [red, orange, yellow, green, default] : $index + :red // ... - $orange: + :orange // ... - $yellow: + :yellow // ... - $green: + :green // ... - $default: + :default } ``` @@ -312,38 +338,37 @@ a new block; it just provides a name for the enclosing block's label. ## Operators with special syntax As mentioned earlier, basic arithmetic operators use an infix notation, some -operators require explicit parentheses, and some operators use `?` to introduce -boolean conditions. The following is a table of special syntax: - +operators require explicit parentheses, and some operators with boolean +conditions use `?`. The following is a table of special syntax: ## Control flow operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md)) -| Name | Syntax | Examples -| ---- | ---- | ---- | -| `block` | *label*: | `{ br $a a: }` -| `loop` | `loop` *label* `{` … `}` | `loop $a { br $a }` -| `if` | `if` (*expr*) `{` *expr** `}` | `if (0) { 1 }` -| `if_else` | `if` (*expr*) `{` *expr** `} else {` *expr**`}` | `if (0) { 1 } else { 2 }` -| `select` | `select` *expr*, *expr* ? *expr* | `select 1, 2 ? $x < $y` -| `br` | `br` *label* | `br $a` -| `br_if` | `br` *label* `?` *expr* | `br $a`, `br $a ? $x < $y` -| `br_table` | `br_table [` *case-label* `,` … `] ,` *default-label* `,` *expr* | `br_table [$x, $y], $z, 0` +| Name | Syntax | Examples +| ---------- | -------------------------- | -------- +| `block` | :*label* | `{ br a; :a }` +| `loop` | `loop` *label* `{` … `}` | `loop a { br a }` +| `if` | `if (`*expr*`)` `{` *expr** `}` | `if ($x) { $f($x) }` +| `if_else` | `if (`*expr*`)` `{` *expr** `} else {` *expr** `}` | `if (0) { 1 } else { 2 }` +| `select` | `select` *expr* `:` *expr* `?` *expr*`)` | `select 1 : 2 ? $x < $y` +| `br` | `br` *label* [=> $result] | `br a`, `br a => $x` +| `br_if` | `br_if` *label* `(if` *expr*`)` [`=>` *expr*] | `br a (if $x < $y) => 0` +| `br_table` | `br_table {` *case-label* `,` … `,` *default-label*] `} from` *expr* | `br_table [a, b, c] : $x` (TODO: as above, are the `?`s too cute?) ## Basic operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#constants)) -| Name | Syntax | Example -| ---- | ---- | ---- | -| `i32.const` | … | `234`, `0xfff7` -| `i64.const` | … | `234`, `0xfff7` -| `f64.const` | … | `0.1p2`, `infinity`, `nan:0x789` -| `f32.const` | … | `0.1p2`, `infinity`, `nan:0x789` -| `get_local` | *name* | `$x + 1` +| Name | Syntax | Example +| ----------- | ----------- | ---- | +| `i32.const` | see example | `234`, `0xfff7` +| `i64.const` | see example | `234L`, `0xfff7L` +| `f64.const` | see example | `0.1p2`, `@inf`, `@nan'0x789` +| `f32.const` | see example | `0.1p2f`, `@inf_f`, `@nan'0x789` +| `get_local` | *name* (including the `$`) | `$x` | `set_local` | *name* `=` *expr* | `$x = 1` -| `call` | `call` *name* `(`*expr* `,` … `)` | `call $min(0, 2)` -| `call_import` | `call_import` *name* `(`*expr* `,` … `)` | `call_import $max(0, 2)` -| `call_indirect` | `call_indirect` *signature-name* `[` *expr* `] (`*expr* `,` … `)` | `call_indirect $foo [1] $min(0, 2)` +| `call` | *name* `(`*expr* `,` … `)` | `$min(0, 2)` +| `call_import` | `$` *name* `(`*expr* `,` … `)` | `$$max(0, 2)` +| `call_indirect` | *expr* `::` *signature-name* [`[` *expr* `]`] `(`*expr* `,` … `)` | `$func::$signature(0, 2)` ## Memory-related operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#linear-memory-accesses)) @@ -351,9 +376,9 @@ boolean conditions. The following is a table of special syntax: | ---- | ---- | ---- | | *memory-immediate* | `[` *base-expression* `,` *offset* `]` | `[$base, 4]` | `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]` | `i32.load8_s [$base, +4]` -| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]:align=` *align* | `i32.load8_s [$base, +4]:align=2` +| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `, align ` *align* `]` | `i32.load8_s [$base, +4, align 2]` | `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]`, *expr* | `i32.store8 [$base, +4], $value` -| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]:align=` *align* `,` *expr* | `i32.store8 [$base, +4]:align=2, $value` +| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `, align ` *align* `]` `=` *expr* | `i32.store8 [$base, +4, align 2] = $value` The other forms of `load` and `store` are similar. @@ -364,50 +389,50 @@ The other forms of `load` and `store` are similar. | `i32.add` | … `+` … | `i32.sub` | … `-` … | `i32.mul` | … `*` … -| `i32.div_s` | … `/s` … -| `i32.div_u` | … `/u` … -| `i32.rem_s` | … `%s` … -| `i32.rem_u` | … `%u` … +| `i32.div_s` | … `/` … +| `i32.div_u` | … `|/` … +| `i32.rem_s` | … `%` … +| `i32.rem_u` | … `|%` … | `i32.and` | … `&` … | `i32.or` | … `|` … | `i32.xor` | … `^` … | `i32.shl` | … `<<` … -| `i32.shr_u` | … `>>u` … -| `i32.shr_s` | … `>>s` … +| `i32.shr_s` | … `>>` … +| `i32.shr_u` | … `>|>` … | `i32.eq` | … `==` … | `i32.ne` | … `!=` … -| `i32.lt_s` | … `s` … -| `i32.ge_s` | … `>=s` … -| `i32.gt_u` | … `>u` … -| `i32.ge_u` | … `>=u` … +| `i32.lt_s` | … `<` … +| `i32.le_s` | … `<=` … +| `i32.lt_u` | … `|<` … +| `i32.le_u` | … `<|=` … +| `i32.gt_s` | … `>` … +| `i32.ge_s` | … `>=` … +| `i32.gt_u` | … `|>` … +| `i32.ge_u` | … `>|=` … | `i32.eqz` | `!` … | `i64.add` | … `+` … | `i64.sub` | … `-` … | `i64.mul` | … `*` … -| `i64.div_s` | … `/s` … -| `i64.div_u` | … `/u` … -| `i64.rem_s` | … `%s` … -| `i64.rem_u` | … `%u` … +| `i64.div_s` | … `/` … +| `i64.div_u` | … `|/` … +| `i64.rem_s` | … `%` … +| `i64.rem_u` | … `|%` … | `i64.and` | … `&` … | `i64.or` | … `\|` … | `i64.xor` | … `^` … | `i64.shl` | … `<<` … -| `i64.shr_u` | … `>>u` … -| `i64.shr_s` | … `>>s` … +| `i64.shr_s` | … `>>` … +| `i64.shr_u` | … `>|>` … | `i64.eq` | … `==` … | `i64.ne` | … `!=` … -| `i64.lt_s` | … `s` … -| `i64.ge_s` | … `>=s` … -| `i64.gt_u` | … `>u` … -| `i64.ge_u` | … `>=u` … +| `i64.lt_s` | … `<` … +| `i64.le_s` | … `<=` … +| `i64.lt_u` | … `|<` … +| `i64.le_u` | … `<|=` … +| `i64.gt_s` | … `>` … +| `i64.ge_s` | … `>=` … +| `i64.gt_u` | … `|>` … +| `i64.ge_u` | … `>|=` … | `i64.eqz` | `!` … | `f32.add` | … `+` … | `f32.sub` | … `-` … @@ -437,6 +462,7 @@ All other operators use their actual name in a prefix notation, such as ## Answers to anticipated questions + Q: JS avoids sigils, and uses context-sensitive keywords to avoid trouble. Can wasm do this? @@ -456,12 +482,16 @@ A: The `br_table` construct has multiple labels, and there may be a mix of to special-case them. -Q: Why not permit optional semicolons? - -A: We don't want people arguing over which way is better. If we don't forbid - semicolons, the next best option would be to require semicolons. I've - subjectively chosen to forbid semicolons for now. +Q: Why is, for example, the unsigned shift operator called `>|>` rather than + the more logical `|>>`, or even `|>>|`? +A: None of the "unsigned" operators are built into LES. The precedence of + non-built-in operators is derived in a predictable way from the built-in + operators, so that for example `>|>` has the same precedence as `>>`, + whereas `|>>` has the same precedence as `>`, and `|>>|` has the same + precedence as `||`. Placing the vertical bar in the middle allows the + operator to keep the same precedence as the built-in operator. + # Debug symbol integration @@ -471,3 +501,8 @@ therefore synthesize new names. However, as part of the [tooling](Tooling.md) story, a lightweight, optional "debug symbol" global section may be defined which associates names with each indexed entity and, when present, these names will be used in the text format projected from a binary WebAssembly module. + +Since LES allows "attribute" expressions to be attached to any expression, +these could be used someday to represent additional debug information, +comments, or other "side-channel" information that may be stored in the +binary format in the future.