-
Notifications
You must be signed in to change notification settings - Fork 695
Move to post-order encoding of syntax trees #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to post-order encoding of syntax trees #611
Conversation
|
||
####Examples: | ||
|
||
* Given a simple AST node: I32Add(left: AstNode, right: AstNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use code quotes
?
Thank you for opening a discussion. Here's a show-stopper I see for post-order encoding. Humble apologies in advance if I lack technical understanding of the proposal. Let an operator or function be defined that accepts one value and returns none, let it be named
Let the following be its post-order encoding:
Now how are you going to validate post-order that the following is invalid?
Pre-order has some nice properties for the wasm AST design. Each sub-expression can be validated on it's own. It might cost a shift-reduce algorithm, but can you propose a post-order validation algorithm that is simpler? |
Seems straightforward if you keep a type stack and push a void type in that case:
Maybe there's a nicer way, but this seems reasonable to me. |
A possible variant of that rule: In the MVP, where return value counts are limited to 0 or 1 and there're no projection/destructure operators: after seeing an expression with 0 return types, discard all values from the top of the stack that are defined in the current block (or block-like thing, etc.). Discussion: If those
The post-order encoding might look like this:
However, what if someone encoded this?
A stack machine would evaluate The rule above eliminates those cases where the two evaluation strategies would differ, because it requires that if a node has no result value, it doesn't appear between a node that does have a result value and that node's user. Looking ahead, there are more questions, but with a rule like what's described here, we wouldn't need to answer all of them right away. |
@binji Yes, thank you. So the entries on the parse stack are an object representing the AST, even one with no result values, and it seems natural that there would also be only one entry for multiple values - that's one matter cleared up, thanks. @sunfishcode I don't think that block-zero-value optimization would work as some operators consume zero value ast nodes, or might? For example
I have another query about the top level expressions of a That seems to explain why a block needs and I guess the block With all the block top-level expression being consumed only at the end of the block, it might be a property of this design that the expression stack could grow as large as the block. Might the stack be larger than for a pre-order encoding which could discard values earlier. Might it lead to producers using dead operators to consume values, or to limiting block sizes and breaking them up. |
- Recursively write the first operand of the outer Call: 0 | ||
- Recursively write the second operand: CALL_START 1 2 I32CONST 3 I32CONST CALL_END | ||
- Recursively the write third operand: 4 I32CONST | ||
- Write the opcode CALL_END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could some rationale be given for the need to have a call_start
marker? I can see the need to know the start of control flow forks, for the single-pass SSA conversion, and for blocks to know the break depths and perhaps other reasons, but no need comes to mind for calls and it seems adequate to just post-order encode them?
On 16 March 2016 at 03:30, JSStats [email protected] wrote:
expr ::= There is no ambiguity, and there is no problem syntactically detecting the Moreover, there already is a prototype decoder for V8 ( |
Is that example invalid? If the deserializer is just a stack machine then it should be equivalent to the other ordering. There are a few things that are harder to validate with post-order, though. For example, a branch needs to know the argument type of a target, which is only known from the use of the node introducing the target:
In post-order:
Bottom up validation here would have to derive a type for the block from its tail expression, or branches that target it. You could make it work, but maybe it would be easier to just explicitly declare the type in the |
On 16 March 2016 at 12:31, Andrew Scheidecker [email protected]
We are specifying a binary format, not a deserializer. The stack machine is There are a few things that are harder to validate with post-order, though.
Yes, if you do decoding and validation in a single pass, then post-order |
@AndrewScheidecker, ah, pardon me, now I see what you mean. Yes you are right, the latter example is not actually invalid. Instead, it is the encoding of a different program, namely
|
You were right the first time: I meant that you could define the deserializer to make both orderings equivalent. @sunfishcode's comments on evaluation order apply if the |
@rossberg-chromium Thank you. I think your answer is basically the same as @binji and I hope I understand it now. The key for me was that it is not a stack of values (not like the JVM) rather a stack of AST node references, and there is always a single result AST node even if it has zero result values. I also understand the v8 implementation a little better: it records the stack_depth at the start of a block and at the end |
The postorder prototype that I developed for V8 works exactly as Andreas Postorder is also better for interpretation, since the serialization order I plan to officially propose a switch to postorder along the lines of my V8 -B On Wed, Mar 16, 2016 at 1:08 PM, JSStats [email protected] wrote:
|
@titzer Thank you. One point that is not yet clear to me in the v8 implementation is if it catches an empty stack wrt the stack_depth at the start of the block. E.g. does it detect the following invalid code:
|
On Wed, Mar 16, 2016 at 1:30 PM, JSStats [email protected] wrote:
|
4776116
to
de4b365
Compare
The solution to the "expressions crossing block boundaries" problem is to disallow popping expressions across a block boundary. I.e. every block begins with an "empty" stack. |
The latest v8 patch https://codereview.chromium.org/1830663002/ |
* Recursively write the first subnode: `BLOCK_START 2 I32CONST 3 I32CONST BLOCK_END` | ||
* Recursively the write second subnode: `4 I32CONST` | ||
* Write the opcode `BLOCK_END` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example of how high-level if is encoded:
I propose it be encoded as follows:
(expr)
if
(true-exprs)
else
(false-exprs)
end
The else
bytecode and clause is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, I'll get right on it.
I was going to start a PR that postorder in its full glory, but I think we can mold this one into the the right shape, or we can land it as is and iterate. I have a separate PR that proposes bytecodes for the encoding. |
@titzer I'm happy and ready to mold this to the consensus design. |
* Then write the opcode for `i32.add`, which is sufficient since `i32.add` is a binary opcode. | ||
|
||
* Given an if-expression: `if_else(expr: AstNode, true-exprs: AstNode, false_exprs: AstNode)` | ||
* Recursively encode expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
Can you also give an example for an if with no else?
With #675 now merged, this is done. |
Or actually, it looks like BinaryEncoding.md still talks about a "Pre-order encoding". Reopening this. |
Is there further feedback on this one? Is there perhaps value to adding some additional examples to summarize the above discussion or perhaps just keep the current set of examples? |
I would like to work through the rationale and explanation of a post-order encoding of syntax trees that would simplify robust decoding. The goal is to avoid having to lean on recursive descent or shift-reduce parsers.
ae42bf1
to
0d1b9ae
Compare
The BinaryEncoding doc's current explanation of post-order encoding is a good start. This PR does round it out with a coverage of all the exceptions to post-order along with more examples. |
It looks like this is a moot point now with all the implementations and other documentation out there, plus the convergence on stack machine. |
I would like to work through the rationale and explanation of a post-order encoding of syntax trees that would simplify robust decoding. The goal is to avoid having to lean on recursive descent or shift-reduce parsers.