-
Notifications
You must be signed in to change notification settings - Fork 695
Clarify that wasm may be viewed as either an AST or a stack machine. #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
This document explains the high-level design of the AST: its types, constructs, and | ||
semantics. For full details consult [the formal Specification](https://github.com/WebAssembly/spec), | ||
for file-level encoding details consult [Binary Encoding](BinaryEncoding.md), | ||
and for the human-readable text representation consult [Text Format](TextFormat.md). | ||
|
||
Each function body consists of a list of expressions. All expressions and | ||
operators are typed, with no implicit conversions or overloading rules. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may well be more efficient for the encoding to take advantage of implicit conversions and/or to specialize the operators based on the argument types with the post-order encoding. I don't see the rationale for excluding these options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is preexisting text that this PR is just moving, so I'd like to suggest raising this question in a new issue.
I think this is where we want to go, but it's probably premature to add this text. So far, we haven't fully demonstrated the stack machine interpretation, and in fact, until we decide to have explicit drops, it doesn't actually hold. |
@rossberg-chromium It's not clear that Some stack machine code does not translate simply to an expression and I think the difference is enough that a stack interpretation does not have a clear parallel in an AST without some significant restrictions on the stack code. For example:
I think we do need this
The restriction here is that values popped must have only one use in order for them to have a simple expression parallel, and values used multiple times would be let block top level expressions and not popped until the end of the block. Would we want to validate such a restriction, or choose a more flexible stack machine code - perhaps a choice needs to be made and they are not parallel interpretations, or perhaps some other encoding can capture the single use pattern efficiently without needing a stack machine. |
@rossberg-chromium I've found it useful to be able to think about wasm as a stack machine for several months now. As someone who has done would I hope can be considered relevant work on both the producer and consumer side with the help of this interpretation, I can confirm that it does hold, with or without explicit drops, multi-value, void, any, invalid, empty, or other details. |
In particular, w/o the explicit drop, you have to pretend that all void-returning operators push a special "void" value. So lgtm and this matches how I've been looking at things too. |
@lukewagner, I didn't count that as a proper stack machine interpretation (as you guys said, it's weird), but fair enough. LGTM |
All expressions and operators are typed, with no implicit conversions or overloading rules. | ||
WebAssembly code is represented in a form that can be interpreted either as an | ||
Abstract Syntax Tree (AST) where each node represents an expression, or as a | ||
typed stack machine, where each node represents an instruction. This document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the second part of this sentence true? How does each node represent an instruction in a typed stack machine, for if_else
for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 0xb encoding, if_else
was split into if
, else
, and end
. In a stack-machine interpretation, these each become independent opcodes that each have a well-defined effect in terms of the value stack and the control flow stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this like forth's IF ELSE THEN?
@flagxor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on what those "well-defined effects" are? I'd like to understand this, and it isn't obvious to me. I remember we talked about this a long time ago, and there wasn't a simple answer - has that changed? Or is my memory wrong? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can take a look at the implementation here: https://github.com/WebAssembly/sexpr-wasm-prototype/blob/master/src/wasm-binary-reader-interpreter.c#L1247. It converts from wasm opcodes to a very similar stack machine, but with gotos instead of structured control flow. IF
becomes BR_UNLESS
, ELSE
becomes a BR
and a branch fixup for BR_UNLESS
, and END
does a branch fixup of the BR
.
The only other slightly tricky part is determining whether the result of the expression needs to be discarded. With the current opcodes, that must be done in a prepass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
I see the value in this information, but I believe it's being added to the wrong place. @sunfishcode, you mentioned that in 0xb things are as they are, but that's the binary format - this page is about the abstract operations and types and so forth. Your stack machine interpretation is for the binary format, so having it in AST semantics is misleading. Specifically, when you say "form" in the first sentence, you mean "binary format", but we shouldn't talk about the binary format - nor the text format, for that matter - in the AST semantics document. |
I think the placement makes sense since the comment pertains to both the AST and semantics and for someone new to wasm who shows up at this doc (which is sortof the nexus of the whole repo), if they have any familiarity with stack machines, it could help them establish context. |
As the nexus, I agree it makes sense to mention this and refer to the binary format doc for more details. Perhaps something like "For simplicity we define and describe these semantics in terms of an AST. VMs can implement these semantics in various ways, for example as a concrete AST, a stack machine (see more details in BinaryFormat), or something else." |
@kripken The stack machine approach to semantics can be a pure abstraction the same way the AST is. It can describe how WebAssembly works, independently of how it's encoded. The postorder encoding makes the mapping to the binary format much more obvious, but the mapping was always there. For example, I wrote some code which found these concepts useful well before the switch to postorder happened. I'd prefer to avoid adding "for simplicity" because I happen to think the stack machine semantics are simple too, overall -- simpler in some areas and more complex in others.
Does this sentence: "Implementations need not build an actual AST or maintain an actual stack; they need only behave as if they did so." cover that? I'd prefer to avoid linking to BinaryFormat.md here, because the stack machine concept exists independently of any particular encoding. |
@sunfishcode I agree that a stack machine could define the AST semantics, but at present it strictly does not. There are strictly stack machine codes that do no translate to the current AST semantics. So what you are staying is just that the AST can be compiled to a stack machine, but it can also be compiled to an expressionless encoding or to machine code and these could also define the AST semantics but they do not at present either. Stack machines generally do not represent zero-value stack entries, rather they would just not have a stack entry if there were no value, and also multiple values are generally pushed to multiple stack slots. Take a look at the sexp-wasm stack machine - the wasm AST is compiled to this stack machine. You would need to demonstrate a trivial lossless translation from this stack code back to the AST for them to be equivalent? Part of the problem is that 'stack machine' is not defined. People once called a language with a call stack a stack machine, and wasm has a call stack, but does that make it useful to describe it as a stack machine? No one can argue against the description because it is effectively defined as 'a stack machine restricted to match the AST semantics', which is just a product of the one way compilation to a stack machine, but how does that help the reader without defining the necessary restrictions? How does the reader translate from their general understanding of a stack machine to the AST representation of this? I gave one challenge above using
And another using
And another:
|
@kripken I reworded the paragraph along the lines we discussed offline. Does this work? |
All expressions and operators are typed, with no implicit conversions or overloading rules. | ||
This document describes WebAssembly semantics. The description here is written | ||
in terms of an Abstract Syntax Tree (AST), however it is also possible to | ||
understand WebAssembly semantics in terms of a typed stack machine. In practice, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put the "In practice" sentence in parens, as this document is not about implementation details. I agree it's worth mentioning, just it's an aside for clarification purposes, and as such, parens seem suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it is also possible to X" suggests X is the only alternative. is it?
instead, how about "however, it is also possible to understand WebAssembly semantics in other ways, for example, in terms of a stack machine".
Thanks, looks a lot nicer imo :) Still I have some comments. Sorry for all the details, but as @lukewagner said, this is a central page and every word counts. |
@JSStats Yes; there are several semantic restrictions on the stack machine. Also, stack machines can be built to handle void types, or not, tuple types, or not, dup instructions, or not, and any number of other variations or restrictions. These are interesting details for wasm, but they don't change the overall observation being made here. @kripken Added parens and removed "typed". I kept the "it is also possible to X" there though. There are obviously many other ways one could describe wasm semantics, but stack machines aren't just one example among a crowd of theoretical alternatives: they're an important concept that's contributing to the overall WebAssembly design effort. |
Thanks, lgtm. |
Merging with lgtms. |
…686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed".
* Clarify that wasm may be viewed as either an AST or a stack machine. (#686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed". * Make opcode 0x00 `unreachable`. (#684) Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here. * BinaryEncoding.md changes implied by #682 * Fix thinko in import section * Rename definition_kind to external_kind for precision * Rename resizable_definition to resizable_limits * Add opcode delimiter to init_expr * Add Elem section to ToC and move it before Data section to reflect Table going before Memory * Add missing init_expr to global variables and undo the grouped representation of globals * Note that only immutable globals can be exported * Change the other 'mutability' flag to 'varuint1' * Give 'anyfunc' its own opcode * Add note about immutable global import requirement * Remove explicit 'default' flag; make memory/table default by default * Change (get|set)_global opcodes * Add end opcode to functions * Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. * Use LEB for br_table (#738) * Describe operand order of call_indirect (#758) * Remove arities from call/return (#748) * Limit varint sizes in Binary Encoding. (#764) * Global section (#771) global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections. * Make name section a user-string section. * Update BinaryEncoding.md * Update BinaryEncoding.md * Use positive section code byte * Remove specification of name strings for unknown sections * Update BinaryEncoding.md * Remove repetition in definition of var(u)int types (#768) * Fix typo (#781) * Move the element section before the code section (#779) * Binary format identifier is out of date (#785) * Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800) * Add string back * Block signatures (#765) * Replace branch arities with block and if signatures. Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front. * Remove the arity operand from br_table too. * Remove mentions of "arguments". * Make string part of the payload * Remove references to post-order AST in BinaryEncoding.md (#801) * Simplify loop by removing its exit label. This removes loop's bottom label. * Move description of `return` to correct column (#804) * type correction and missing close quote (#805) * Remove more references to AST (#806) * Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. * Use "instruction" instead of "AST operator" * Update rationale for stack machine * Update Rationale.md * Update discussion of expression trees * Update MVP.md * Update Rationale.md * Update Rationale.md * Remove references to expressions * Update Rationale.md * Update Rationale.md * Address review comments * Address review comments * Address review comments * Delete h
This PR just adds some additional introductory information. Any given document needs to pick one perspective, but it is useful to acknowledge other commonly-used perspectives to explain how they relate.