Clarify that wasm may be viewed as either an AST or a stack machine. #686

sunfishcode · 2016-05-12T04:55:36Z

This PR just adds some additional introductory information. Any given document needs to pick one perspective, but it is useful to acknowledge other commonly-used perspectives to explain how they relate.

ghost · 2016-05-12T06:17:43Z

AstSemantics.md


 This document explains the high-level design of the AST: its types, constructs, and
 semantics. For full details consult [the formal Specification](https://github.com/WebAssembly/spec),
 for file-level encoding details consult [Binary Encoding](BinaryEncoding.md),
 and for the human-readable text representation consult [Text Format](TextFormat.md).

+Each function body consists of a list of expressions. All expressions and
+operators are typed, with no implicit conversions or overloading rules.
+


It may well be more efficient for the encoding to take advantage of implicit conversions and/or to specialize the operators based on the argument types with the post-order encoding. I don't see the rationale for excluding these options.

This is preexisting text that this PR is just moving, so I'd like to suggest raising this question in a new issue.

rossberg · 2016-05-12T11:02:35Z

I think this is where we want to go, but it's probably premature to add this text. So far, we haven't fully demonstrated the stack machine interpretation, and in fact, until we decide to have explicit drops, it doesn't actually hold.

ghost · 2016-05-12T13:18:15Z

@rossberg-chromium It's not clear that drop is necessary to make it a stack machine? We already have block which unwinds the stack. I thought an issue was that operators could consume only one value per AST node to have a value stack rather than node stack interpretation, that node with no values could not be consumed.

Some stack machine code does not translate simply to an expression and I think the difference is enough that a stack interpretation does not have a clear parallel in an AST without some significant restrictions on the stack code. For example:

...
i32.load
dup 0
i32.add

I think we do need this dup operator as it helps avoid using local variables and seems consistent with SSA. The above could be encoded as follows which would have a simple expression parallel:

...
i32.load
dup 0
dup 1
i32.add

(let (($v1 (i32.load ...))
      ($v2 (i32.add (get_value $v1) (get_value $v1)))
      )
  )

The restriction here is that values popped must have only one use in order for them to have a simple expression parallel, and values used multiple times would be let block top level expressions and not popped until the end of the block. Would we want to validate such a restriction, or choose a more flexible stack machine code - perhaps a choice needs to be made and they are not parallel interpretations, or perhaps some other encoding can capture the single use pattern efficiently without needing a stack machine.

sunfishcode · 2016-05-12T13:45:58Z

@rossberg-chromium I've found it useful to be able to think about wasm as a stack machine for several months now. As someone who has done would I hope can be considered relevant work on both the producer and consumer side with the help of this interpretation, I can confirm that it does hold, with or without explicit drops, multi-value, void, any, invalid, empty, or other details.

lukewagner · 2016-05-12T15:40:09Z

In particular, w/o the explicit drop, you have to pretend that all void-returning operators push a special "void" value.

So lgtm and this matches how I've been looking at things too.

rossberg · 2016-05-12T16:26:06Z

@lukewagner, I didn't count that as a proper stack machine interpretation (as you guys said, it's weird), but fair enough.

LGTM

kripken · 2016-05-12T16:49:35Z

AstSemantics.md

-All expressions and operators are typed, with no implicit conversions or overloading rules.
+WebAssembly code is represented in a form that can be interpreted either as an
+Abstract Syntax Tree (AST) where each node represents an expression, or as a
+typed stack machine, where each node represents an instruction. This document


Is the second part of this sentence true? How does each node represent an instruction in a typed stack machine, for if_else for example?

In the 0xb encoding, if_else was split into if, else, and end. In a stack-machine interpretation, these each become independent opcodes that each have a well-defined effect in terms of the value stack and the control flow stack.

Isn't this like forth's IF ELSE THEN?
@flagxor?

Can you please elaborate on what those "well-defined effects" are? I'd like to understand this, and it isn't obvious to me. I remember we talked about this a long time ago, and there wasn't a simple answer - has that changed? Or is my memory wrong? :)

Well, you can take a look at the implementation here: https://github.com/WebAssembly/sexpr-wasm-prototype/blob/master/src/wasm-binary-reader-interpreter.c#L1247. It converts from wasm opcodes to a very similar stack machine, but with gotos instead of structured control flow. IF becomes BR_UNLESS, ELSE becomes a BR and a branch fixup for BR_UNLESS, and END does a branch fixup of the BR.

The only other slightly tricky part is determining whether the result of the expression needs to be discarded. With the current opcodes, that must be done in a prepass.

I see, thanks.

kripken · 2016-05-12T18:14:35Z

I see the value in this information, but I believe it's being added to the wrong place. @sunfishcode, you mentioned that in 0xb things are as they are, but that's the binary format - this page is about the abstract operations and types and so forth. Your stack machine interpretation is for the binary format, so having it in AST semantics is misleading.

Specifically, when you say "form" in the first sentence, you mean "binary format", but we shouldn't talk about the binary format - nor the text format, for that matter - in the AST semantics document.

lukewagner · 2016-05-12T18:47:31Z

I think the placement makes sense since the comment pertains to both the AST and semantics and for someone new to wasm who shows up at this doc (which is sortof the nexus of the whole repo), if they have any familiarity with stack machines, it could help them establish context.

kripken · 2016-05-12T19:36:07Z

As the nexus, I agree it makes sense to mention this and refer to the binary format doc for more details. Perhaps something like "For simplicity we define and describe these semantics in terms of an AST. VMs can implement these semantics in various ways, for example as a concrete AST, a stack machine (see more details in BinaryFormat), or something else."

sunfishcode · 2016-05-12T20:26:30Z

@kripken The stack machine approach to semantics can be a pure abstraction the same way the AST is. It can describe how WebAssembly works, independently of how it's encoded. The postorder encoding makes the mapping to the binary format much more obvious, but the mapping was always there. For example, I wrote some code which found these concepts useful well before the switch to postorder happened.

I'd prefer to avoid adding "for simplicity" because I happen to think the stack machine semantics are simple too, overall -- simpler in some areas and more complex in others.

VMs can implement these semantics in various ways, for example as a concrete AST, a stack machine (see more details in BinaryFormat), or something else.

Does this sentence:

"Implementations need not build an actual AST or maintain an actual stack; they need only behave as if they did so."

cover that? I'd prefer to avoid linking to BinaryFormat.md here, because the stack machine concept exists independently of any particular encoding.

ghost · 2016-05-12T21:23:29Z

@sunfishcode I agree that a stack machine could define the AST semantics, but at present it strictly does not. There are strictly stack machine codes that do no translate to the current AST semantics. So what you are staying is just that the AST can be compiled to a stack machine, but it can also be compiled to an expressionless encoding or to machine code and these could also define the AST semantics but they do not at present either.

Stack machines generally do not represent zero-value stack entries, rather they would just not have a stack entry if there were no value, and also multiple values are generally pushed to multiple stack slots. Take a look at the sexp-wasm stack machine - the wasm AST is compiled to this stack machine. You would need to demonstrate a trivial lossless translation from this stack code back to the AST for them to be equivalent?

Part of the problem is that 'stack machine' is not defined. People once called a language with a call stack a stack machine, and wasm has a call stack, but does that make it useful to describe it as a stack machine? No one can argue against the description because it is effectively defined as 'a stack machine restricted to match the AST semantics', which is just a product of the one way compilation to a stack machine, but how does that help the reader without defining the necessary restrictions? How does the reader translate from their general understanding of a stack machine to the AST representation of this?

I gave one challenge above using dup. Here's another piece of stack machine code that has no AST representation:

call_returning_two_values
i32.const 1
call_with_three_args

And another using drop:

call_returning_const_1
call_returning_const_2
drop
i32.eqz

And another:

call_returning_const_1
call_returning_no_values
i32.eqz

sunfishcode · 2016-05-12T21:51:38Z

@kripken I reworded the paragraph along the lines we discussed offline. Does this work?

kripken · 2016-05-12T22:20:47Z

AstSemantics.md

-All expressions and operators are typed, with no implicit conversions or overloading rules.
+This document describes WebAssembly semantics. The description here is written
+in terms of an Abstract Syntax Tree (AST), however it is also possible to
+understand WebAssembly semantics in terms of a typed stack machine. In practice,


I would put the "In practice" sentence in parens, as this document is not about implementation details. I agree it's worth mentioning, just it's an aside for clarification purposes, and as such, parens seem suitable.

"it is also possible to X" suggests X is the only alternative. is it?

instead, how about "however, it is also possible to understand WebAssembly semantics in other ways, for example, in terms of a stack machine".

kripken · 2016-05-12T22:23:45Z

Thanks, looks a lot nicer imo :) Still I have some comments. Sorry for all the details, but as @lukewagner said, this is a central page and every word counts.

sunfishcode · 2016-05-13T02:00:59Z

@JSStats Yes; there are several semantic restrictions on the stack machine. Also, stack machines can be built to handle void types, or not, tuple types, or not, dup instructions, or not, and any number of other variations or restrictions. These are interesting details for wasm, but they don't change the overall observation being made here.

@kripken Added parens and removed "typed". I kept the "it is also possible to X" there though. There are obviously many other ways one could describe wasm semantics, but stack machines aren't just one example among a crowd of theoretical alternatives: they're an important concept that's contributing to the overall WebAssembly design effort.

kripken · 2016-05-13T03:12:16Z

Thanks, lgtm.

sunfishcode · 2016-05-17T18:19:08Z

Merging with lgtms.

…686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed".

* Clarify that wasm may be viewed as either an AST or a stack machine. (#686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed". * Make opcode 0x00 `unreachable`. (#684) Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here. * BinaryEncoding.md changes implied by #682 * Fix thinko in import section * Rename definition_kind to external_kind for precision * Rename resizable_definition to resizable_limits * Add opcode delimiter to init_expr * Add Elem section to ToC and move it before Data section to reflect Table going before Memory * Add missing init_expr to global variables and undo the grouped representation of globals * Note that only immutable globals can be exported * Change the other 'mutability' flag to 'varuint1' * Give 'anyfunc' its own opcode * Add note about immutable global import requirement * Remove explicit 'default' flag; make memory/table default by default * Change (get|set)_global opcodes * Add end opcode to functions * Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. * Use LEB for br_table (#738) * Describe operand order of call_indirect (#758) * Remove arities from call/return (#748) * Limit varint sizes in Binary Encoding. (#764) * Global section (#771) global-variable was a broken anchor and the type of count was an undefined reference and inconsistent with all the rest of the sections. * Make name section a user-string section. * Update BinaryEncoding.md * Update BinaryEncoding.md * Use positive section code byte * Remove specification of name strings for unknown sections * Update BinaryEncoding.md * Remove repetition in definition of var(u)int types (#768) * Fix typo (#781) * Move the element section before the code section (#779) * Binary format identifier is out of date (#785) * Update BinaryEncoding.md to reflect the ml-proto encoding of the memory and table sections. (#800) * Add string back * Block signatures (#765) * Replace branch arities with block and if signatures. Moving arities to blocks has the nice property of giving implementations useful information up front, however some anticipated uses of this information would really want to know the types up front too. This patch proposes replacing block arities with function signature indices, which would provide full type information about a block up front. * Remove the arity operand from br_table too. * Remove mentions of "arguments". * Make string part of the payload * Remove references to post-order AST in BinaryEncoding.md (#801) * Simplify loop by removing its exit label. This removes loop's bottom label. * Move description of `return` to correct column (#804) * type correction and missing close quote (#805) * Remove more references to AST (#806) * Remove reference to AST in JS.md Remove a reference to AST in JS.md. Note that the ml-proto spec still uses the name `Ast.Module` and has files named `ast.ml`, etc, so leaving those references intact for now. * Use "instruction" instead of "AST operator" * Update rationale for stack machine * Update Rationale.md * Update discussion of expression trees * Update MVP.md * Update Rationale.md * Update Rationale.md * Remove references to expressions * Update Rationale.md * Update Rationale.md * Address review comments * Address review comments * Address review comments * Delete h

Clarify that wasm may be viewed as either an AST or a stack machine.

93e5a92

ghost reviewed May 12, 2016
View reviewed changes

kripken reviewed May 12, 2016
View reviewed changes

Reword the introductory paragraph.

40b428b

kripken reviewed May 12, 2016
View reviewed changes

Add parens, remove "typed".

aa524f9

sunfishcode merged commit 754f5b9 into binary_0xc May 17, 2016

sunfishcode deleted the stack-interpretation branch May 17, 2016 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify that wasm may be viewed as either an AST or a stack machine. #686

Clarify that wasm may be viewed as either an AST or a stack machine. #686

sunfishcode commented May 12, 2016

ghost May 12, 2016

sunfishcode May 12, 2016

rossberg commented May 12, 2016

ghost commented May 12, 2016 •

edited by ghost

Loading

sunfishcode commented May 12, 2016

lukewagner commented May 12, 2016

rossberg commented May 12, 2016

kripken May 12, 2016

sunfishcode May 12, 2016

binji May 12, 2016

kripken May 12, 2016

binji May 12, 2016

kripken May 12, 2016

kripken commented May 12, 2016

lukewagner commented May 12, 2016

kripken commented May 12, 2016

sunfishcode commented May 12, 2016

ghost commented May 12, 2016 •

edited by ghost

Loading

sunfishcode commented May 12, 2016

kripken May 12, 2016

kripken May 12, 2016

kripken commented May 12, 2016

sunfishcode commented May 13, 2016

kripken commented May 13, 2016

sunfishcode commented May 17, 2016

Clarify that wasm may be viewed as either an AST or a stack machine. #686

Clarify that wasm may be viewed as either an AST or a stack machine. #686

Conversation

sunfishcode commented May 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossberg commented May 12, 2016

ghost commented May 12, 2016 • edited by ghost Loading

sunfishcode commented May 12, 2016

lukewagner commented May 12, 2016

rossberg commented May 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken commented May 12, 2016

lukewagner commented May 12, 2016

kripken commented May 12, 2016

sunfishcode commented May 12, 2016

ghost commented May 12, 2016 • edited by ghost Loading

sunfishcode commented May 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken commented May 12, 2016

sunfishcode commented May 13, 2016

kripken commented May 13, 2016

sunfishcode commented May 17, 2016

ghost commented May 12, 2016 •

edited by ghost

Loading

ghost commented May 12, 2016 •

edited by ghost

Loading