diff --git a/BinaryEncoding.md b/BinaryEncoding.md index c0eacf0f..e8797bc7 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -57,19 +57,48 @@ A single-byte unsigned integer indicating a [value type](AstSemantics.md#types). # Definitions -### Post-order encoding +### "Post-order" syntax tree encoding + Refers to an approach for encoding syntax trees, where each node begins with an identifying binary sequence, then followed recursively by any child nodes. -* Examples - * Given a simple AST node: `i32.add(left: AstNode, right: AstNode)` - * First recursively write the left and right child nodes. - * Then write the opcode for `i32.add` (uint8) +WASM syntax trees are encoded in a variation of post-order traversal. Nodes with fixed arity will be encoded in post-order. +Nodes with variadic control-flow opcodes, such as but not limited to `block`s, `loop`s, `if`, and `if_else`, will be bracketed with start and end markers (in the `if` and `if_else` case, this applies to the branches). Other variadic opcodes such as `call` will be post-order without brackets since the arity immediate is sufficient. Thus, for fixed arity and non-control flow variadic opcodes, the binary sequence begins with child subtrees followed by the opcode. +For variadic control-flow opcodes, the binary sequence begins with an opcode-specific start marker, followed by the the child subtrees (encoded in this same variant of post-order), then the opcode, which serves as an end marker. +This encoding is immediately amenable to one-pass decoding with an explicit stack without need for shift-reduce parsing. + +####Examples: + +* Given a simple AST node: `i32.add(left: AstNode, right: AstNode)` + * First recursively write the left and right child nodes. + * Then write the opcode for `i32.add` (uint8) + +* Given a call AST node: `call(args: AstNode[], callee_index: varuint32)` + * First recursively write each argument node. + * Then write the (variable-length) integer `callee_index` (varuint32) + * Finally write the opcode of `Call` (uint8) + +* Given an if_else-expression: `if_else(expr: AstNode, thenExpr: AstNode, elseExpr: AstNode)` + * Recursively encode expr + * Write the opcode for `if` + * Recursively encode thenExpr + * Write the opcode for `else` + * Recursively encode elseExpr + * Write the opcode for `end` + +* Given an if-expression: `if(expr: AstNode, thenExpr: AstNode)` + * Recursively encode expr + * Write the opcode for `if` + * Recursively encode thenExpr + * Write the opcode for `end` + +* Given nested block AST nodes: `block(2, [Block(2, [I32.Const 2, I32.Const 3]), I32.Const 4])` + * First, write the opcode marker `block` + * Write the the arity immediate of the outer block: `2` + * Recursively write the first subnode: `block 2 i32const 3 i32const end` + * Recursively the write second subnode: `4 i32const` + * Write the opcode for `end` - * Given a call AST node: `call(args: AstNode[], callee_index: varuint32)` - * First recursively write each argument node. - * Then write the (variable-length) integer `callee_index` (varuint32) - * Finally write the opcode of `Call` (uint8) # Module structure