Skip to content

Move to post-order encoding of syntax trees #611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 38 additions & 9 deletions BinaryEncoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,48 @@ A single-byte unsigned integer indicating a [value type](AstSemantics.md#types).

# Definitions

### Post-order encoding
### "Post-order" syntax tree encoding

Refers to an approach for encoding syntax trees, where each node begins with an identifying binary
sequence, then followed recursively by any child nodes.

* Examples
* Given a simple AST node: `i32.add(left: AstNode, right: AstNode)`
* First recursively write the left and right child nodes.
* Then write the opcode for `i32.add` (uint8)
WASM syntax trees are encoded in a variation of post-order traversal. Nodes with fixed arity will be encoded in post-order.
Nodes with variadic control-flow opcodes, such as but not limited to `block`s, `loop`s, `if`, and `if_else`, will be bracketed with start and end markers (in the `if` and `if_else` case, this applies to the branches). Other variadic opcodes such as `call` will be post-order without brackets since the arity immediate is sufficient. Thus, for fixed arity and non-control flow variadic opcodes, the binary sequence begins with child subtrees followed by the opcode.
For variadic control-flow opcodes, the binary sequence begins with an opcode-specific start marker, followed by the the child subtrees (encoded in this same variant of post-order), then the opcode, which serves as an end marker.
This encoding is immediately amenable to one-pass decoding with an explicit stack without need for shift-reduce parsing.

####Examples:

* Given a simple AST node: `i32.add(left: AstNode, right: AstNode)`
* First recursively write the left and right child nodes.
* Then write the opcode for `i32.add` (uint8)

* Given a call AST node: `call(args: AstNode[], callee_index: varuint32)`
* First recursively write each argument node.
* Then write the (variable-length) integer `callee_index` (varuint32)
* Finally write the opcode of `Call` (uint8)

* Given an if_else-expression: `if_else(expr: AstNode, thenExpr: AstNode, elseExpr: AstNode)`
* Recursively encode expr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Can you also give an example for an if with no else?

* Write the opcode for `if`
* Recursively encode thenExpr
* Write the opcode for `else`
* Recursively encode elseExpr
* Write the opcode for `end`

* Given an if-expression: `if(expr: AstNode, thenExpr: AstNode)`
* Recursively encode expr
* Write the opcode for `if`
* Recursively encode thenExpr
* Write the opcode for `end`

* Given nested block AST nodes: `block(2, [Block(2, [I32.Const 2, I32.Const 3]), I32.Const 4])`
* First, write the opcode marker `block`
* Write the the arity immediate of the outer block: `2`
* Recursively write the first subnode: `block 2 i32const 3 i32const end`
* Recursively the write second subnode: `4 i32const`
* Write the opcode for `end`

* Given a call AST node: `call(args: AstNode[], callee_index: varuint32)`
* First recursively write each argument node.
* Then write the (variable-length) integer `callee_index` (varuint32)
* Finally write the opcode of `Call` (uint8)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example of how high-level if is encoded:

I propose it be encoded as follows:

(expr)
if
(true-exprs)
else
(false-exprs)
end

The else bytecode and clause is optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, I'll get right on it.

# Module structure

Expand Down