Skip to content

Stack machine semantics #323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Sep 9, 2016
Merged

Stack machine semantics #323

merged 51 commits into from
Sep 9, 2016

Conversation

rossberg
Copy link
Member

This implements the stack machine semantics for Wasm:

  • Extends syntax to allow raw instruction sequences. Prior expression forms remain as syntax macros. See README for syntax summary.
  • Changes AST to consist of instruction sequences. Eliminate the kernel/AST distinction and corresponding desugaring.
  • Reimplements type checking in terms of that. Skips dead code (for now).
  • Reimplements evaluation as a small-step rewriting semantics over instruction sequences plus administrative forms.
  • Adjust & extend tests (though we should add more).
  • Various clean-ups on the way.

I apologise for the monolithic PR. Breaking the move over to a stack machine semantics into meaningful smaller steps is kind of impossible.

The main downside of this change is that evaluation got much slower, because it is now implemented by term rewriting, literally modelling a small-step reduction semantics as on paper. The upside is that this makes it suitable for explicitly modelling threads later on.

@drom
Copy link

drom commented Sep 2, 2016

@rossberg-chromium with stack machine, parenthesis don't carry any function anymore. Require putting them around every instruction sounds like unnecessary decoration. New text stack format better off without parenthesis at all. Mixing two styles in the same file format confuses people.

@rossberg
Copy link
Member Author

rossberg commented Sep 2, 2016

@drom, mixing two styles in one format is exactly what is happening currently, what people have complained about, and which I'm suggesting to get rid of.

@AndrewScheidecker
Copy link
Contributor

The goal for the text format in the MVP seems to be a "linear opcode" format to mirror the binary format. This PR makes fundamental changes to the expression syntax, but still includes some form of it. Is this change an incremental step toward removing expressions, or will expressions remain in WAST?

@ghost
Copy link

ghost commented Sep 2, 2016

@AndrewScheidecker Mozilla appear to have committed to continuing to explore a structured text format, see WebAssembly/design#704 (comment)

There is a difference between not standardizing the text format for the MVP which I can understand, and ignoring the use case of a structure text format which I hope will not pass.

This PR has a format for testing purposes, so does it really need to be representative?

@kripken
Copy link
Member

kripken commented Sep 2, 2016

@AndrewScheidecker: the decision as mentioned in your link is to ship the wasm MVP with a linear list. The spec repo's s-expression language will be a superset of that, but not part of the wasm spec, and not shown in browsers. However, after the MVP experimentation might lead to further developments and possible spec additions.

@AndrewScheidecker
Copy link
Contributor

It makes sense for the ml-proto text format to include non-standard functionality for the test suite. But some subset of a future state of the ml-proto text format will be the MVP standard text format, right? Will that subset include expression trees in addition to linear operator sequences, or will expression trees be a non-standard part of the ml-proto syntax?

@eholk
Copy link

eholk commented Sep 2, 2016

I'm in favor of requiring bracketing. In the example @rossberg-chromium gave the bracketed version seems more readable to me (although some more newlines might make the non-bracketed version look better). The parenthesis make it clear which things are meant to be taken as a unit, which I think will be especially helpful for instructions that take a number of optional arguments.

@kripken
Copy link
Member

kripken commented Sep 2, 2016

@AndrewScheidecker: The MVP will only standardize the linear part. In other words, expression trees will be a non-standard thing used in the spec repo.

@AndrewScheidecker
Copy link
Contributor

Thank you, that's what I was trying to figure out.

A big part of the value of ml-proto to other implementations right now is its test suite. If everybody implements the non-standard expression tree syntax to access the test suite, it will be de facto standard. So IMO ml-proto should either remove expression trees (doesn't need to happen all at once), or the standard should include expression trees.

@sunfishcode
Copy link
Member

It seems with the recent change, code like this:

    i32.const 3
    (i32.add (i32.const 4))

is now accepted. I understand why this makes the text format simpler, but it's also surprising. I wouldn't be comfortable with a language that visually looks like it has one structure, but actually has a very different structure, spreading, which is plausible here.

@ghost
Copy link

ghost commented Sep 2, 2016

@kripken I did not see a decision to 'standardize the linear part' in the MVP rather to not have a standard in this area for the MVP. I would not object to also having a linear presentation of the code.

The core issue is surely keeping the design consistent with the use case of a structured presentation.

I don't think it matters if a linear code or some macros are used for testing, even if this becomes a de-facto standard format, this is a small matter.

@AndrewScheidecker
Copy link
Contributor

Since it sounds like the expression tree syntax will stay in ml-proto, it would be nice if there was a stronger separation between it and the stack machine syntax. For example:

  • An operator expression is always parenthesized, and expects a fixed number of subexpressions of a type defined by its context (as ml-proto has been to this point).

  • A sequence expression is a semi-colon separated sequence of subexpressions:

    ((nop); (i32.add (i32.const 3) (i32.const 4)))

    It's a little bit more flexible than the existing implicit sequences, since it allows the result of the sequence to be pushed by any subexpression within it, not just the final subexpression:

    ((nop); (i32.add (i32.const 3) (i32.const 4)); (nop))

  • A stack expression is a non-parenthesized opcode along with only its immediate operands. It may only occur as part of a sequence expression, and must only pop operands that are pushed within the same sequence. @sunfishcode's example above would be:

    (i32.const 3; i32.const 4; i32.add)

  • A sequence expression can embed a sequence anywhere as a subexpression:

    (i32.eqz (i32.const 3; i32.const 4; i32.add))

  • Function bodies, loop bodies, block bodies, and if clauses are implicit sequences:

    (loop $break $loop; (i32.eqz (get_local $num)); br_if $break; ...; br $loop)

I think a syntax like that would be a nice incremental step that exposes the additional flexibility of the stack machine, but without crossing wires in our intuition about the expression syntax: implicit stack operands don't cross parentheses.

@ghost
Copy link

ghost commented Sep 9, 2016

@AndrewScheidecker A semi-colon is not generally used like that in an s-exp, does not fit the representation. For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:

(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

It raises the problem of what to do if there are more or less values than expected by the consumer of the stack operator - probably just a syntax error.

When pick is added it will help to be able to name the stack values, and that is not pretty in the linear stack code. I'd just leave it to @rossberg-chromium for now.

@rossberg rossberg merged commit a7a5fef into binary-0xc Sep 9, 2016
@AndrewScheidecker
Copy link
Contributor

A semi-colon is not generally used like that in an s-exp, does not fit the representation.

I agree, but the format already departs from a standard S-expression syntax. Previously the departure was pretty small: e.g. align=<x>, but as of the merge of this PR the syntax now allows unparenthesized "instructions":

i32.const 3 i32.const 4 i32.add

For a stronger separation a stack operator could be added to the s-exp format (not an opcode). For example:
(i32.eqz (stack (i32.const 3) (i32.const 4) (i32.add)))

That would still not distinguish between an operator expression and an "linear operator". (i32.add) could be a malformed expression or a linear operator that consumes its operands from the results of preceding operators, and you have to look at the context to figure out how to interpret it. I don't care if it's done exactly as I showed (with parentheses), but I do think there should be a more clear distinction than whether it's missing operands.

@ghost
Copy link

ghost commented Sep 9, 2016

@AndrewScheidecker That would still not distinguish between an operator expression and an "linear operator".

Could add another operator expression to flip back, or have operators with only immediate arguments be interpreted as popping all arguments, but what's the point, it's never going to be a nice format to use except for testing.

@rossberg rossberg deleted the stack branch May 18, 2017 11:16
ngzhian added a commit to ngzhian/spec that referenced this pull request Nov 4, 2021
The names are still pre WebAssembly#297, once that's finalize I can fix this up
(after the sync WebAssembly#323).

With this change, test/core/run.py passes on all test cases in simd/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants