Skip to content
This repository was archived by the owner on Aug 17, 2022. It is now read-only.

Change alias sugar, remove local indices, give module/instance types fresh index spaces #26

Merged
merged 7 commits into from
Feb 16, 2021

Conversation

lukewagner
Copy link
Member

@lukewagner lukewagner commented Dec 21, 2020

This PR addresses issues #19 and #20 by:

  • removing the use of local indices (into local module/instance typedefs) in alias and instantiate; instead, <name>s are used:
    • (alias $f (func $instance "export")) with inline sugar (func $instance "export")
    • (instance $i (instantiate $M "foo" (instance $foo) "bar" (func $bar)))
      • Note: I used a flat list of <name> <instance-arg> because I couldn't think of a concrete reason to introduce a more verbose syntax that explicitly grouped them (e.g., (arg <name> <instance-arg>)), but I'm open to hearing other thoughts.
  • renaming "parent" to "outer" alias
  • adding an explicit outer module to outer aliases that can be any enclosing module:
    • (alias $t (type outer $PARENT $Type)) with inline sugar (type outer $PARENT $Type)

Although it isn't explicitly called out in the current explainer, these changes remove the need to add $identifier names to module/instance types. Updating all the code examples, I found this a nice change, removing what always had felt like a superfluous name for each imported module's/instance's export.

@lukewagner
Copy link
Member Author

I added a second commit to this PR which addresses comments in #21. I had actually already accidentally started making this change in the code samples of the first commit, using outer aliases unnecessarily in module/instance imports. The second commit establishes the rules and, most significantly, changes the encoding of module/instance types in Binary.md.

@lukewagner lukewagner changed the title Change alias sugar, remove local indices from aliases/instantiate Change alias sugar, remove local indices, give module/instance types fresh index spaces Dec 21, 2020
Copy link
Collaborator

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I've got one question on the binary encoding of aliases but otherwise everything here seems pretty reasonable to me.

@alexcrichton
Copy link
Collaborator

One slight gotcha to the new alias syntax is that we have both:

(alias (func $funcname $instancename "exportname"))
(alias (func $instancename "exportname"))

so that when parsing this you can't eagerly take the first identifier as the name of the function. You need to look forward ahead to see if there's another identifier (or (instance ...)) to figure out whether the first identifier is the name of the function rather than the instance specifier. Not really the end of the world one way or another, but wanted to be sure to note it.

@lukewagner
Copy link
Member Author

Hmm, yes, I guess that was a nice thing about the (alias $f (func $instance "export")) syntax; it kept things quite separate. At the risk of more verbosity, we could have: (alias (func $f export $instance "export"))), using export|outer as a discriminant for what comes next? I don't have a strong opinion here.

@alexcrichton
Copy link
Collaborator

I think I'd have a slight preference for (alias $f (...)) to keep things separate, although not particularly strongly. One small bonus is that it mirrors the sugar elsewhere to inject aliases.

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 8, 2021
This commit Implements the sugar syntax proposed in
WebAssembly/module-linking#26. The change here was to introduce a new
`ItemRef` construct into the `wast` crate which represents a
parenthesized reference to an item, either in the current module or a
different module (e.g. optional `outer` and optional export names).
Additionally an `IndexOrRef` type was added for parsing references which
are either a bare index or an explicit reference to an item.

All existing references to items were updated to use `ItemRef` where
necessary. Additionally all locations which previously took an `Index`
now store `ItemRef` and are parsed as `IndexOrRef`.
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 8, 2021
This continues to sync this implementation with
WebAssembly/module-linking#26 by implementing the ability for outer
aliases to have an arbitrary depth listed on them.
@lukewagner
Copy link
Member Author

Hmm, yes, on second thought, I guess that seems like the best. I'll switch back and update the original comment to match.

@lukewagner
Copy link
Member Author

(FWIW, I'll be out next week.)

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 11, 2021
This commit Implements the sugar syntax proposed in
WebAssembly/module-linking#26. The change here was to introduce a new
`ItemRef` construct into the `wast` crate which represents a
parenthesized reference to an item, either in the current module or a
different module (e.g. optional `outer` and optional export names).
Additionally an `IndexOrRef` type was added for parsing references which
are either a bare index or an explicit reference to an item.

All existing references to items were updated to use `ItemRef` where
necessary. Additionally all locations which previously took an `Index`
now store `ItemRef` and are parsed as `IndexOrRef`.
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 11, 2021
This continues to sync this implementation with
WebAssembly/module-linking#26 by implementing the ability for outer
aliases to have an arbitrary depth listed on them.
alexcrichton added a commit to bytecodealliance/wasm-tools that referenced this pull request Jan 11, 2021
* Implement new alias shorthand sugar for module linking

This commit Implements the sugar syntax proposed in
WebAssembly/module-linking#26. The change here was to introduce a new
`ItemRef` construct into the `wast` crate which represents a
parenthesized reference to an item, either in the current module or a
different module (e.g. optional `outer` and optional export names).
Additionally an `IndexOrRef` type was added for parsing references which
are either a bare index or an explicit reference to an item.

All existing references to items were updated to use `ItemRef` where
necessary. Additionally all locations which previously took an `Index`
now store `ItemRef` and are parsed as `IndexOrRef`.

* Implement relative-depth aliases

This continues to sync this implementation with
WebAssembly/module-linking#26 by implementing the ability for outer
aliases to have an arbitrary depth listed on them.

* More rename of parent => outer
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 11, 2021
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jan 11, 2021
Don't have an optional field name and additionally remove the
surrounding `(arg ..)` since it's not present in
WebAssembly/module-linking#26
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Jan 14, 2021
This commit updates the various tooling used by wasmtime which has new
updates to the module linking proposal. This is done primarily to sync
with WebAssembly/module-linking#26. The main change implemented here is
that wasmtime now supports creating instances from a set of values, nott
just from instantiating a module. Additionally subtyping handling of
modules with respect to imports is now properly handled by desugaring
two-level imports to imports of instances.

A number of small refactorings are included here as well, but most of
them are in accordance with the changes to `wasmparser` and the updated
binary format for module linking.
@rossberg
Copy link
Member

Thanks, looks good, up to some syntactic nits:

  • For aliases, I suggest (alias $instance "func" (func $f)) and (alias outer $t (type $t)), respectively. That would be more consistent with import syntax and its binding position. In particular, we currently have that a binding occurrence for a symbolic id is always directly preceded by a keyword denoting its namespace, and it would be nice to not break that.

  • For instantiate, I'd prefer something like (instantiate $M (import "foo" (instance $foo)) (import "bar" (func $bar))) -- the wat form does not have flattened-list-of-tuples anywhere else, and IMO they are somewhat out of place, given that the core wat format is meant to reflect the structure of the AST.

WDYT?

@lukewagner
Copy link
Member Author

Heh, for the alias syntax, I also proposed switching to exactly that at some point, but then thought maybe it looked weird and talked myself out of it. And for the instantiate syntax, I was on the fence whether the extra verbosity was justified, so it's good to have a second opinion. SGTM on both changes. @alexcrichton ?

@alexcrichton
Copy link
Collaborator

Seems reasonable to me!

@lukewagner
Copy link
Member Author

lukewagner commented Feb 10, 2021

Ultra-nit: is import the best token for (import "foo" (instance $foo))? How about arg? (It's shorter :)

@rossberg
Copy link
Member

Btw, with this syntax for aliases, we could also allow the analogous "inverted shorthand" for aliases that we already have for imports, e.g., writing (func $f (alias $m "f")).

Ultra-nit: is import the best token for (import "foo" (instance $foo))? How about arg? (It's shorter :)

Yeah, import isn't ideal. FWIW, I keep struggling with there being no good dual term to "import" in our model (supply? provide?). I would avoid "arg", though, since we do not call imports "params" and generally don't invoke the modules-are-functions notion elsewhere.

@lukewagner
Copy link
Member Author

Ah right, the "inverted shorthand" syntax makes sense.

And fair point about "arg" corresponding to a "param" vs. "import". What about "export"?

@rossberg
Copy link
Member

Hm, an export is something a module provides to the outside. But here it's the inverse direction, the outside client supplying something to a module.

The difference comes from Wasm's model requiring an intermediary to connect imports and exports, and that imports can be provided through values that aren't exports of another module. So export as opposite end of import does not seem right.

@lukewagner
Copy link
Member Author

Fair enough, import it is.

@lukewagner lukewagner merged commit 63cd6c0 into master Feb 16, 2021
@lukewagner lukewagner deleted the fix-issue-19 branch February 16, 2021 16:00
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Feb 17, 2021
@alexcrichton
Copy link
Collaborator

Getting around to implement this, FWIW inline aliases are particularly tricky on modules I think. For example I think this should parse:

(module
  (module (alias outer 0 0))
)

but when parsing the inner module after seeing (alias at that point the parser needs to figure out if that's an alias item or whether it's an inline alias annotation. This is the same problem with inline import statements as well (I think I opened up an issue on this awhile back), but the difference is that import could be solved with simple lookahead whereas alias can't.

For inline imports you simply need to see (import "string" "string"?) and you know it's an inline import, but for alias annotations you could see any (alias (instance $i "x") "y") which can also be arbitrarily nested. This means that to parse an inline alias annotation on a module you'd need infinite lookahead in theory I think?

It's not really the end of the world but I think I'll probably avoid implementing inline alias annotations for modules for now because of this.

@lukewagner
Copy link
Member Author

Ooh, good point. Just to confirm, the ambiguity is between the "inverted shorthand" recently added and a normal alias definition. This seems like a complete ambiguity, not just an issue of lookahead. (For example, validity aside, is your example a module-containing-a-module-containing-an-alias or a module-containing-an-alias?)

@rossberg Should we just remove the inverted shorthand?

(On a side note, setting aside the inverted shorthand ambiguity, I think the (alias outer 0 0) in the example would fail validation b/c it would be a self reference: outer aliases can only refer to preceding modules. (module (module) (module (alias outer 0 0))) would be fine.)

@rossberg
Copy link
Member

Hm, is there really an ambiguity? Let me know if I'm overlooking something, but one form is

(module (alias outer $i $j))

while the other would be

(module
  (alias outer $i $j (func))
)

This is disambiguated at the first token after the indices. Combining it with the sugar for inline aliases shouldn't change that, I believe?

Wrt parsing, this difference is LALR(1), since neither form is a prefix of the other, and their common prefix has the exact same structure (unless I'm missing something). So no problem for parser generators. And even a recursive descent parser can just parse the common prefix and then disambiguate, which is sort of common. Would that be difficult here?

@alexcrichton
Copy link
Collaborator

The ambiguity I'm worried about is not the outer form which has a deterministically small number of tokens, but rather the alternate form of:

(module (alias $instance "x"))

where this can be nested arbitrarily deep as:

(module (alias (instance $other_instance "other_export") "x"))

and it can keep repeating.

From a LALR perspective I think it's fine but at least so far the rest of the grammar is quite easy to parse with a recursive descent parser where you typically figure out what to parse next by peeking at most 3 or so tokens ahead. This would involve, however, parsing perhaps an arbitrary number of tokens ahead.

@alexcrichton
Copy link
Collaborator

I realize though that this shorthand is always deterministically an invalid wasm module if it was actually an alias item inside a nested module since there can't be any instances defined, so a fix could be that if alias ( is seen it's deduced as inline annotation. With alias outer ... it would see if there was any token after the indices to deduce whether it's an inline annotation or an inline module.

I mostly wanted to just bring this up though b/c it's sort of a weird thing about parsing. It's not necessarily the weirdest thing though.

@lukewagner
Copy link
Member Author

Oh, sorry, I totally missed the point×2. Could it make sense to only allow an $instance identifier in the parse rule for the inverted abbreviation (so not an inline instance alias)? (Which, I expect, is what @alexcrichton has now implemented in practice?)

@alexcrichton
Copy link
Collaborator

Yeah I think that'd fix it as well, albeit it's a bit weird in the grammar but as a shorthand that's not really the end of the world

@rossberg
Copy link
Member

The ambiguity I'm worried about is not the outer form which has a deterministically small number of tokens, but rather the alternate form of:
[...]
where this can be nested arbitrarily deep as:

(module (alias (instance $other_instance "other_export") "x"))

Yes, but why does this matter? The recursion is still the same for both productions, so you can safely perform it before deciding which construct it is. In general, recursive descent has no issues with overlapping productions of the form

X ::= A B C D
X ::= A B C E

even if A, B, C are arbitrary constructs, as long as D and E can be distinguished by their initial token. Am I missing something that makes that impossible here?

@alexcrichton
Copy link
Collaborator

To be clear I'm not saying anything is impossible, just some parts are harder than before. I am also not aware of a precise definition of a recursive descent parser, so to clarify what I mean is that the parser in the wast Rust crate works by basically looking at the next token and committing to what to parse at that point. There's a few cases it needs to look maybe 2-3 tokens ahead. This is the first case in the parser where it would have to look ahead an arbitrary number of tokens.

I'm not talking about an ambiguity in parsing, I believe you're right in that there's zero ambiguity. My point is that this is making parsing more difficult in one strategy of implementing a parser relative to how difficult it was before.

@rossberg
Copy link
Member

Recursive descent roughly means that you need an LL(1) grammar -- or one that you can transform into LL(1) easily enough.

In practice, very few grammars implemented in recursive descent parsers are LL(1) as given, but they are implicitly transformed into one. Taking my earlier example,

X ::= A B C D
X ::= A B C E

this can be turned into LL(1) via

X ::= A B C X_tail
X_tail ::= D | E

which corresponds to how a recursive descent parser would handle X without needing special look-ahead: the function to parse X first parses A,B,C and then makes a decision to parse either D or E -- essentially inlining the call to the parsing function for the auxiliary X_tail.

The Wasm spec could make these transformations explicit, but they are routine, so it shouldn't be necessary to obfuscate the spec for it. As you say, a similar kind of transformation is already needed in other places -- the only difference here is that you are parsing not just terminals but also non-terminals in the prefix, but usually that shouldn't be any harder.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants