Skip to content

Clarify that wasm may be viewed as either an AST or a stack machine. #686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 17, 2016

Conversation

sunfishcode
Copy link
Member

This PR just adds some additional introductory information. Any given document needs to pick one perspective, but it is useful to acknowledge other commonly-used perspectives to explain how they relate.


This document explains the high-level design of the AST: its types, constructs, and
semantics. For full details consult [the formal Specification](https://github.com/WebAssembly/spec),
for file-level encoding details consult [Binary Encoding](BinaryEncoding.md),
and for the human-readable text representation consult [Text Format](TextFormat.md).

Each function body consists of a list of expressions. All expressions and
operators are typed, with no implicit conversions or overloading rules.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may well be more efficient for the encoding to take advantage of implicit conversions and/or to specialize the operators based on the argument types with the post-order encoding. I don't see the rationale for excluding these options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preexisting text that this PR is just moving, so I'd like to suggest raising this question in a new issue.

@rossberg
Copy link
Member

I think this is where we want to go, but it's probably premature to add this text. So far, we haven't fully demonstrated the stack machine interpretation, and in fact, until we decide to have explicit drops, it doesn't actually hold.

@ghost
Copy link

ghost commented May 12, 2016

@rossberg-chromium It's not clear that drop is necessary to make it a stack machine? We already have block which unwinds the stack. I thought an issue was that operators could consume only one value per AST node to have a value stack rather than node stack interpretation, that node with no values could not be consumed.

Some stack machine code does not translate simply to an expression and I think the difference is enough that a stack interpretation does not have a clear parallel in an AST without some significant restrictions on the stack code. For example:

...
i32.load
dup 0
i32.add

I think we do need this dup operator as it helps avoid using local variables and seems consistent with SSA. The above could be encoded as follows which would have a simple expression parallel:

...
i32.load
dup 0
dup 1
i32.add
(let (($v1 (i32.load ...))
      ($v2 (i32.add (get_value $v1) (get_value $v1)))
      )
  )

The restriction here is that values popped must have only one use in order for them to have a simple expression parallel, and values used multiple times would be let block top level expressions and not popped until the end of the block. Would we want to validate such a restriction, or choose a more flexible stack machine code - perhaps a choice needs to be made and they are not parallel interpretations, or perhaps some other encoding can capture the single use pattern efficiently without needing a stack machine.

@sunfishcode
Copy link
Member Author

@rossberg-chromium I've found it useful to be able to think about wasm as a stack machine for several months now. As someone who has done would I hope can be considered relevant work on both the producer and consumer side with the help of this interpretation, I can confirm that it does hold, with or without explicit drops, multi-value, void, any, invalid, empty, or other details.

@lukewagner
Copy link
Member

In particular, w/o the explicit drop, you have to pretend that all void-returning operators push a special "void" value.

So lgtm and this matches how I've been looking at things too.

@rossberg
Copy link
Member

@lukewagner, I didn't count that as a proper stack machine interpretation (as you guys said, it's weird), but fair enough.

LGTM

All expressions and operators are typed, with no implicit conversions or overloading rules.
WebAssembly code is represented in a form that can be interpreted either as an
Abstract Syntax Tree (AST) where each node represents an expression, or as a
typed stack machine, where each node represents an instruction. This document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the second part of this sentence true? How does each node represent an instruction in a typed stack machine, for if_else for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 0xb encoding, if_else was split into if, else, and end. In a stack-machine interpretation, these each become independent opcodes that each have a well-defined effect in terms of the value stack and the control flow stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this like forth's IF ELSE THEN?
@flagxor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate on what those "well-defined effects" are? I'd like to understand this, and it isn't obvious to me. I remember we talked about this a long time ago, and there wasn't a simple answer - has that changed? Or is my memory wrong? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can take a look at the implementation here: https://github.com/WebAssembly/sexpr-wasm-prototype/blob/master/src/wasm-binary-reader-interpreter.c#L1247. It converts from wasm opcodes to a very similar stack machine, but with gotos instead of structured control flow. IF becomes BR_UNLESS, ELSE becomes a BR and a branch fixup for BR_UNLESS, and END does a branch fixup of the BR.

The only other slightly tricky part is determining whether the result of the expression needs to be discarded. With the current opcodes, that must be done in a prepass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

@kripken
Copy link
Member

kripken commented May 12, 2016

I see the value in this information, but I believe it's being added to the wrong place. @sunfishcode, you mentioned that in 0xb things are as they are, but that's the binary format - this page is about the abstract operations and types and so forth. Your stack machine interpretation is for the binary format, so having it in AST semantics is misleading.

Specifically, when you say "form" in the first sentence, you mean "binary format", but we shouldn't talk about the binary format - nor the text format, for that matter - in the AST semantics document.

@lukewagner
Copy link
Member

I think the placement makes sense since the comment pertains to both the AST and semantics and for someone new to wasm who shows up at this doc (which is sortof the nexus of the whole repo), if they have any familiarity with stack machines, it could help them establish context.

@kripken
Copy link
Member

kripken commented May 12, 2016

As the nexus, I agree it makes sense to mention this and refer to the binary format doc for more details. Perhaps something like "For simplicity we define and describe these semantics in terms of an AST. VMs can implement these semantics in various ways, for example as a concrete AST, a stack machine (see more details in BinaryFormat), or something else."

@sunfishcode
Copy link
Member Author

@kripken The stack machine approach to semantics can be a pure abstraction the same way the AST is. It can describe how WebAssembly works, independently of how it's encoded. The postorder encoding makes the mapping to the binary format much more obvious, but the mapping was always there. For example, I wrote some code which found these concepts useful well before the switch to postorder happened.

I'd prefer to avoid adding "for simplicity" because I happen to think the stack machine semantics are simple too, overall -- simpler in some areas and more complex in others.

VMs can implement these semantics in various ways, for example as a concrete AST, a stack machine (see more details in BinaryFormat), or something else.

Does this sentence:

"Implementations need not build an actual AST or maintain an actual stack; they need only behave as if they did so."

cover that? I'd prefer to avoid linking to BinaryFormat.md here, because the stack machine concept exists independently of any particular encoding.

@ghost
Copy link

ghost commented May 12, 2016

@sunfishcode I agree that a stack machine could define the AST semantics, but at present it strictly does not. There are strictly stack machine codes that do no translate to the current AST semantics. So what you are staying is just that the AST can be compiled to a stack machine, but it can also be compiled to an expressionless encoding or to machine code and these could also define the AST semantics but they do not at present either.

Stack machines generally do not represent zero-value stack entries, rather they would just not have a stack entry if there were no value, and also multiple values are generally pushed to multiple stack slots. Take a look at the sexp-wasm stack machine - the wasm AST is compiled to this stack machine. You would need to demonstrate a trivial lossless translation from this stack code back to the AST for them to be equivalent?

Part of the problem is that 'stack machine' is not defined. People once called a language with a call stack a stack machine, and wasm has a call stack, but does that make it useful to describe it as a stack machine? No one can argue against the description because it is effectively defined as 'a stack machine restricted to match the AST semantics', which is just a product of the one way compilation to a stack machine, but how does that help the reader without defining the necessary restrictions? How does the reader translate from their general understanding of a stack machine to the AST representation of this?

I gave one challenge above using dup. Here's another piece of stack machine code that has no AST representation:

call_returning_two_values
i32.const 1
call_with_three_args

And another using drop:

call_returning_const_1
call_returning_const_2
drop
i32.eqz

And another:

call_returning_const_1
call_returning_no_values
i32.eqz

@sunfishcode
Copy link
Member Author

@kripken I reworded the paragraph along the lines we discussed offline. Does this work?

All expressions and operators are typed, with no implicit conversions or overloading rules.
This document describes WebAssembly semantics. The description here is written
in terms of an Abstract Syntax Tree (AST), however it is also possible to
understand WebAssembly semantics in terms of a typed stack machine. In practice,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the "In practice" sentence in parens, as this document is not about implementation details. I agree it's worth mentioning, just it's an aside for clarification purposes, and as such, parens seem suitable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it is also possible to X" suggests X is the only alternative. is it?

instead, how about "however, it is also possible to understand WebAssembly semantics in other ways, for example, in terms of a stack machine".

@kripken
Copy link
Member

kripken commented May 12, 2016

Thanks, looks a lot nicer imo :) Still I have some comments. Sorry for all the details, but as @lukewagner said, this is a central page and every word counts.

@sunfishcode
Copy link
Member Author

@JSStats Yes; there are several semantic restrictions on the stack machine. Also, stack machines can be built to handle void types, or not, tuple types, or not, dup instructions, or not, and any number of other variations or restrictions. These are interesting details for wasm, but they don't change the overall observation being made here.

@kripken Added parens and removed "typed". I kept the "it is also possible to X" there though. There are obviously many other ways one could describe wasm semantics, but stack machines aren't just one example among a crowd of theoretical alternatives: they're an important concept that's contributing to the overall WebAssembly design effort.

@kripken
Copy link
Member

kripken commented May 13, 2016

Thanks, lgtm.

@sunfishcode
Copy link
Member Author

Merging with lgtms.

@sunfishcode sunfishcode merged commit 754f5b9 into binary_0xc May 17, 2016
@sunfishcode sunfishcode deleted the stack-interpretation branch May 17, 2016 18:19
lukewagner pushed a commit that referenced this pull request Jul 27, 2016
…686)

* Clarify that wasm may be viewed as either an AST or a stack machine.

* Reword the introductory paragraph.

* Add parens, remove "typed".
titzer pushed a commit that referenced this pull request Sep 29, 2016
* Clarify that wasm may be viewed as either an AST or a stack machine. (#686)

* Clarify that wasm may be viewed as either an AST or a stack machine.

* Reword the introductory paragraph.

* Add parens, remove "typed".

* Make opcode 0x00 `unreachable`. (#684)

Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode.

All-zeros is one of the more common patterns of corrupted data. This
change makes it more likely that code that is accidentally zeroed, in
whole or in part, will be noticed when executed rather than silently
running through a nop slide.

Obviously, this doesn't matter when an opcode table is present, but
if there is a default opcode table, it would presumably use the
opcodes defined here.

* BinaryEncoding.md changes implied by #682

* Fix thinko in import section

* Rename definition_kind to external_kind for precision

* Rename resizable_definition to resizable_limits

* Add  opcode delimiter to init_expr

* Add Elem section to ToC and move it before Data section to reflect Table going before Memory

* Add missing init_expr to global variables and undo the grouped representation of globals

* Note that only immutable globals can be exported

* Change the other 'mutability' flag to 'varuint1'

* Give 'anyfunc' its own opcode

* Add note about immutable global import requirement

* Remove explicit 'default' flag; make memory/table default by default

* Change (get|set)_global opcodes

* Add end opcode to functions

* Use section codes instead of section names

(rebasing onto 0xC instead of master)

This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder.
It allows for user-defined sections that have string names to be encoded in the same manner as before.
The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being.

* Use LEB for br_table (#738)

* Describe operand order of call_indirect (#758)

* Remove arities from call/return (#748)

* Limit varint sizes in Binary Encoding. (#764)

* Global section (#771)

global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections.

* Make name section a user-string section.

* Update BinaryEncoding.md

* Update BinaryEncoding.md

* Use positive section code byte

* Remove specification of name strings for unknown sections

* Update BinaryEncoding.md

* Remove repetition in definition of var(u)int types (#768)

* Fix typo (#781)

* Move the element section before the code section (#779)

* Binary format identifier is out of date (#785)

* Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800)

* Add string back

* Block signatures (#765)

* Replace branch arities with block and if signatures.

Moving arities to blocks has the nice property of giving implementations
useful information up front, however some anticipated uses of this
information would really want to know the types up front too.

This patch proposes replacing block arities with function signature indices,
which would provide full type information about a block up front.

* Remove the arity operand from br_table too.

* Remove mentions of "arguments".

* Make string part of the payload

* Remove references to post-order AST in BinaryEncoding.md (#801)

* Simplify loop by removing its exit label.

This removes loop's bottom label.

* Move description of `return` to correct column (#804)

* type correction and missing close quote (#805)

* Remove more references to AST (#806)

* Remove reference to AST in JS.md

Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now.

* Use "instruction" instead of "AST operator"

* Update rationale for stack machine

* Update Rationale.md

* Update discussion of expression trees

* Update MVP.md

* Update Rationale.md

* Update Rationale.md

* Remove references to expressions

* Update Rationale.md

* Update Rationale.md

* Address review comments

* Address review comments

* Address review comments

* Delete h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants