Skip to content
This repository was archived by the owner on Aug 17, 2022. It is now read-only.

Order for testing subtyping between modules #7

Closed
alexcrichton opened this issue Jun 4, 2020 · 28 comments
Closed

Order for testing subtyping between modules #7

alexcrichton opened this issue Jun 4, 2020 · 28 comments

Comments

@alexcrichton
Copy link
Collaborator

This is an issue for the design proposed in #3 (assumed to be merged in the near-ish future) where I'm curious how the subtyping relationship works between modules. Talking with @lukewagner it sounds like import strings are going to be taken into account, but this raises a question about what to do with imports that import from the same string? For example I'm curious if this module is intended to validate:

(module 
    (module $a
        (import "anything_here" (module (;$a.b;)
            (import "" (instance (;$a.1;)
                (export "bar" (func))
                (export "foo" (func))
            ))
            (import "" (instance (;$a.2;)
                (export "foo" (func))
            ))
        ))
    )

    (module $b
        (import "" (instance (;$b.1;)
            (export "bar" (func))
        ))
        (import "" (instance (;$b.2;)
            (export "foo" (func))
        ))
    )
    
    (instance (instantiate $a (module $b)))
)

A similar question would be if the imports of $b were swapped, would it still validate?

I think one possible way to test if $a is a subtype of $b would be to:

  • Build a map from import string to number of items imported from that import string of $b
  • For each import name n of $a, look up its import string in this map.
  • If n exists in $b with only one item imported, then the subtyping check recurses (switching order for covariance/contravariance?)
  • If n exists in $b, but with multiple items imported, then the first unused import (so far) of $b is tested against the import.
  • If n doesn't exist in $b it could do the same as above (check the first unused import) or return an error.

The thinking is that on collisions (multiple imports from the same name) implementations would fall back to order-based comparisons of types. Otherwise you'd do name-based comparisons. This would allow, in general, imports to get reordered or deleted and you wouldn't break code which depends on you (as it would still assert your older type). If, however, all your import names were the same string then it'd be stricter.

I'm curious though if others have thoughts about how this might work? Or how to handle the subtyping relationship between modules with respect to their imports and multiple imports from the same name?

@rossberg
Copy link
Member

rossberg commented Jun 5, 2020

Ouch, yeah, that's a bit nasty. The canonical semantics would be treating multiple imports from the same name as a form of intersection type. That implies the rule that for each import in the subtype, there must exist a suitable import in the supertype. This is independent of any order. It may relate multiple sub-imports to the same super-import (or vice versa), but that's actually what you want.

What's annoying about this, though, is that for each sub-import you may have to try multiple super ones, i.e., it can be quadratic in the number of overlapping imports. That may be okay in this case, but still.

@lukewagner
Copy link
Member

lukewagner commented Jun 5, 2020

Interesting. So as currently written in the PR, an import $A:

(import "" (module $A
  (import "foo" (func))
  (import "foo" (func))
  (import "bar" (func))
))

would accept $B:

(module $B
  (func (import "foo"))
  (func (import "foo"))
  (func (import "bar"))
)

b/c it's the same type (no subtyping rule needed). But would $C be accepted:

(module $C
  (func (import "foo"))
  (func (import "foo"))
)

which requires subtyping? I suppose it could be made to if there was a prefix-compatibility rule added to subtyping. Presumably $D would be rejected due to ambiguity:

(module $D
  (func (import "foo"))
  (func (import "bar"))
  (func (import "foo"))
))

Perhaps we should disallow duplicates in the imports of module imports since it leads to all this wackiness?

@alexcrichton
Copy link
Collaborator Author

@lukewagner your examples have export, but did you mean import? (duplicate names are already disallowed in export directives IIRC)

@lukewagner
Copy link
Member

lukewagner commented Jun 5, 2020

Oops! Somehow I forgot that we already disallowed export duplicates and tried to "simplify" my examples :P I'll edit the comment in-place.

@rossberg
Copy link
Member

rossberg commented Jun 8, 2020

@lukewagner, when interpreting same-named imports as intersection types then the order does not matter. For example,

(module $B
  (import "foo" (func))
  (import "foo" (func))
  (import "bar" (func))
)

morally would be

(module $B
  (import "foo" (func) /\ (func))
  (import "bar" (func))
)

And $D would be the same. So it would match just fine. (The intersection here can actually be simplified to just (func), but not so in other cases, such as module imports with different types.)

The subyping rules for intersection types are as follows:

(1) U /\ V <: T  iff U <: T or V <: T
(2) T <: U /\ V  iff T <: U and T <: V

(where T and U themselves cannot be intersections). Taken together, these allow U <: U /\ U, U /\ V <: U, or U /\ V <: V /\ U, for example. But you generally end up having to test multiple combinations due to the "or" for occurrences on the lhs (though imports are contravariant, so the checks would actually turn around).

Although attractive, I doubt that we can disallow duplicates in imports, for at least two reasons:

  • It would be a breaking change for modules. And we want to be able to express the type of any valid module, so we cannot disallow it for module types either.
  • The desugaring of multi-level imports depends on it, at least when mixing multiple levels (see my earlier comment).

What we could try to restrict, though, would be the subtyping rules, in order to avoid the "or". But we have to be careful to maintain crucial properties, such as reflexivity (i.e., the equivalent of U /\ V <: U /\ V must still hold).

One possibility avoiding the "or" would be to weaken the above rules to

(1A) U1 /\ ... /\ UN <: T1 /\ ... /\ TN  iff U1 <: T1 and ... and UN <: TN
(1B) U1 /\ ... /\ UN <: top

where T and U themselves can not be intersections anymore. The effect would be that matching multiple imports in a supertype becomes positional per name. That is, the only subtype of a module with multiple imports for "foo" is one with no "foo" imports at all (corresponding to top), or one with the exact same number and in a compatible order. Interleaving with imports of other name does not matter.

Concretely, that would allow (assume T <: T' and S <: S')

(module) <: (module (import "foo" T) (import "foo" S))
(module (import "foo" T') (import "foo" S')) <: (module (import "foo" T) (import "foo" S))

but not

(module (import "foo" T')) <: (module (import "foo" T) (import "foo" T))
(module (import "foo" T') (import "foo" T')) <: (module (import "foo" T))   (*)

This may be somewhat restrictive, though. Ruling out (*) makes it impossible to use a module with multi-imports in a context where one sufficiently general import is actually provided. We could remedy that by reintroducing an equivalent of rule (2):

(2') U <: T1 /\ ... /\ TN  iff U <: T1 and ... and U <: TN

This would allow (*), that is, a module with exactly one import for "foo" may still be matched by one with multiple ones. That's still fine, because there always is a uniquely determined import in the supertype to match against, so it remains linear. I kind of suspect that we'd need this to handle multi-level imports in an actual module definition sufficiently well (and we should disallow that sugar in module types!).

@lukewagner
Copy link
Member

@rossberg Let me start by trying to digest the first bit with intersection types :) When you say that $B (as I wrote above) is equivalent to:

(module $B
  (import "foo" (func) /\ (func))
  (import "bar" (func))
)

Given that function closures have identity, I think this would only work if the type of $A was similarly equivalent to:

(import "" (module $A
  (import "foo" (func) /\ (func))
  (import "bar" (func))
))

With the critical implication that instantiate ends up taking two operands, not three. Because this situation applies just as well to two-level imports, I was thinking this would be a breaking change... but maybe not? Is that what you were thinking?

@rossberg
Copy link
Member

rossberg commented Jun 9, 2020

@lukewagner, in contexts identifying imports by name (such as with the JS API's instantiate function) we already "merge" imports with the same name. So nothing changes there.

As for positional contexts, we don't currently have them anywhere in the standard (only in the C API), so I don't think it'd be a breaking change. Just like you proposed that subtyping can reorder imports, it could also merge imports, and instantiate uses whichever view the type at hand provides.

However, it is true that the "or" rule would make this "coercion" more tricky (or even ambiguous?), so perhaps another reason to avoid it.

@lukewagner
Copy link
Member

Yes, that was my question; we can't allow:

(module (import "foo" T')) <: (module (import "foo" T) (import "foo" T))

because then there would be an ambiguity when two distinct funcaddrs are supplied to instantiate the supertype.

An alternative to weakening the rules to avoid "or" is to change the abstract syntax of module types to be an unordered map of names (with duplicates in the text/binary parsed into intersection types which, btw, nicely ensures the "no nested intersection types" property). Then there is never an ambiguity because there is always exactly 1 function value. The abstract syntax of instantiate could also take names and it's just a binary/text encoding detail that the names are lifted from the identified module type. This seems somewhat simpler to think about to me.

@rossberg
Copy link
Member

An alternative to weakening the rules to avoid "or" is to change the abstract syntax of module types to be an unordered map of names

But that's incompatible with a positional interpretation of module imports in the instantiate construct.

@lukewagner
Copy link
Member

@rossberg What I was perhaps-too-briefly suggesting in my last comment was that instantiate, in the abstract syntax (and thus validation and runtime semantics), would contain an ordered list of names (the order determining the order to pop operands) and it's just a binary and textual encoding detail that the way those names are specified is via the module index immediate (essentially "factoring out" the ordered list of names from N instantiate instructions to 1 module type definition).

@rossberg
Copy link
Member

@lukewagner, I see, but how would that express the type of an existing module with multiple "foo" imports? You have to represent the intersection inherent in that somehow. Whether that's by multiple entries for "foo" or by one with multiple types is just a syntactic detail. If we want to avoid the "or" then this intersection must be ordered, no matter how it is represented.

@lukewagner
Copy link
Member

@rossberg As long as duplicates are always collapsed into a single import of an intersection type (by necessity, if types are unordered maps), then I think there should never be any ambiguity (for any import, there will be exactly one value supplied by instantiate) so the "or" seems fine; I don't see a need to avoid it.

@rossberg
Copy link
Member

@lukewagner, oh, okay, so you are just worrying about the value ambiguity, not the "or" per se. But you'd need to introduce intersection types as an explicit concept then. You could equivalently say that in a positional interpretation of imports, duplicate entries are omitted. That has the same net effect without introducing a new construct.

Btw, an unordered interpretation of imports won't work for other reasons: it is incompatible with having dependencies between imports, as with types. That is, the same reason why this proposal requires that static declarations within a module are ordered applies within module types as well.

@lukewagner
Copy link
Member

@rossberg Ah hah, that's a good reason for maintaining positional order.

Could you explain a bit more of what you mean by saying "duplicate entries are omitted"? In all module types or just for the purposes of subtyping?

@rossberg
Copy link
Member

@lukewagner, in contexts where a module type is interpreted positionally to determine argument values (e.g., an instantiate instruction), only the first occurrence of each top-level) import name would be taken into account. Of course, the value would still be checked against all the other types.

Sounds a bit more hacky perhaps, but I would have the exact same effect as introducing and rewriting to intersection types.

@lukewagner
Copy link
Member

lukewagner commented Jun 17, 2020

@rossberg Ah, ok, cool. So to check my understanding, in this module:

(module $A
  (import "B" (module $B
    (import "a" (func (param (ref (func))))
    (import "a" (func (param funcref)))
  ))
  (func $f1 (param funcref))
  (func $f2 (param (ref (func))))
  (instance $i (instantiate $B (func $f1))))  ;; this would validate,
  (instance $i (instantiate $B (func $f2))))  ;; and this would fail, yes?
)

Then, for the instantiation-time check of module import type against the actual imported module, would the subtyping rule just say that A <: B as long as each import of A is satisfied by some import in B, knowing that, if there were multiple matches, that was fine, there was only one value. Is that all right?

So then, for the above example, I think all these modules would match, yes?

(module $A)
(module $B
  (import "a" (func (param funcref))))
(module $C
  (import "a" (func (param funcref)))
  (import "a" (func (param funcref))))
(module $D
  (import "a" (func (param (ref (func))))
  (import "a" (func (param (ref (func)))))
(module $E
  (import "a" (func (param funcref)))
  (import "a" (func (param (ref (func)))))

@rossberg
Copy link
Member

Yes (well, except $A).

But to back up a little, neither of this does anything about the other problem, that "some import" implies that the subtype check can be quadratic in the number of duplicates, because of the "or" rule required. That's independent of whether the intersection is represented as multiple imports, or one import with multiple types. My other ramblings above were primarily concerned with that problem.

@lukewagner
Copy link
Member

@rossberg For $A, wouldn't it be valid b/c you can always ignore a given import?

For the quadratic case: yeah, I saw that too and thought we could address it with some "maximum number of duplicate imports" implementation limit.

Thinking back to our "reinterpret two-level imports as single-level imports" rule, instead of having the parse/decode rules do the de-duplication, we could say that each individual two-level import is directly turned into a single-level import of an instance with a single export, and then the intersection type checking does the rest. So, e.g., a module:

(module
  (import "a" "x" (func))
  (import "a" "y" (func)))

is reinterpreted as:

(module
  (import "a" (instance (export "x" (func))))
  (import "a" (instance (export "y" (func)))))

and then it is subtyping that makes this type isomorphic to:

(module
  (import "a" (instance
    (export "x" (func))
    (export "y" (func)))))

I don't know if that's actually better, but it seemed like a neat symmetry.

However, writing that made me realize a snag with the previously-mentioned scheme of recasting two-level imports as single-level imports: today I can write:

(module
  (import "console" "log" (func (param i32)))
  (import "console" "log" (func (param f64))))

but interpreting that as a single-level import of an instance:

(module
  (import "console" (instance
    (export "log" (func (param i32)))
    (export "log" (func (param f64))))))

isn't valid today because duplicate exports aren't allowed. I was thinking maybe we should allow duplicate exports, but that would seem to amount to adding wasm function overloading which seems to have some pretty ugly JS API and call_indirect interactions... Still, though, I'm not sure how else to interpret duplicate two-level imports as instance imports. WDYT?

@rossberg
Copy link
Member

@lukewagner:

For $A, wouldn't it be valid b/c you can always ignore a given import?

Well, I suppose that depends on what exactly you meant by "would match", in particular wrt direction. But I think we're on the same page.

instead of having the parse/decode rules do the de-duplication, we could say that each individual two-level import is directly turned into a single-level import of an instance with a single export, and then the intersection type checking does the rest.

Yes, that's what I was assuming we'd do. Especially since that does not require any particular rule for dealing with imports with different depths, but where one name list is a prefix of the other.

But such an expansion would of course clash badly with a maximum duplicate limit -- which I think would generally cause composition issues in other ways as well.

isn't valid today because duplicate exports aren't allowed.

Ouch, yeah, I hadn't considered that. We could hack around that and only allow duplicate exports in negative position, but that's ever more monster barring.

Sigh, I increasingly feel like we should just ban duplicate imports altogether. I wonder if it's worth doing some use counter statistics telemetry experiment to figure out whether removing duplicate imports from the language would break any web site in the wild.

@lukewagner
Copy link
Member

@rossberg Hah, I've been wondering the same. I asked @kripken and he can't think of any case where duplicate imports are intentionally generated, so there is actually a chance (although still, I'm guessing, only a 50% one). I filed bug 1647791 to investigate.

@rossberg
Copy link
Member

FWIW, one significant downside of removing duplicate imports would be that merging modules would become significantly more complicated, because you'd need to compute and insert glb's.

@alexcrichton
Copy link
Collaborator Author

Reading over this again today, is it right to say that the general impression here is that it'd be best to figure out how to disallow duplicate imports? But this is also seemingly in a climate where we also can't do that due to backwards compatibility-constraints?

One thought I had is that in type declarations within a module (e.g. a module declaring a module type) we could disallow duplicate names of imports. That should be backwards-compatible and I was curious if it could buy us enough runway to not have to worry about duplicate imports before-module-linking (it doesn't look like the investigation has progressed much in the past few months).

The idea I had is that when you instantiate a module you'd provide the same number of arguments as import statements in the declared module type, and that'd provide a clear 1:1 mapping of "here's what to call that imported value". Given those names you can then use them to satisfy the input module's own imports. This also means that you basically just wouldn't be able to import (or bundle) an MVP module using duplicate import names, but given the expected rarity of that ocurring it may not be the end of the world?

The consequence of this is that we can always have a module type for all nested modules, but the top-level module may not have an expressible type. Engines would have to deal with the top-level case specially in that situation.

Or would this perhaps be too onerous at the spec-level because it'd have to have a special case in a bunch of rules otherwise? For example we could do the desugaring of two-level imports to single-level import of an instance but the spec would have to say that for top-level modules the desugaring doesn't work when two imports have the same name but different types.

@lukewagner
Copy link
Member

Oops, right, I let this ball drop. I'll try to get the telemetry patch written and landed. In the meantime, maybe your implementation work can just operate under the assumption that we have successfully disallowed duplicates?

Thinking again of the alternatives (duplicate exports; not being able to desugar import paths into single-level instance import chains), this really seems like the best option and worth doing a gradual deprecation for (in which the spec officially disallows duplicate imports, as do engines that can without breaking existing content, with engines that must support existing content unofficially allowing duplicates at top-level for a period of time, but disallowing in all new positions).

@alexcrichton
Copy link
Collaborator Author

Makes sense to me! I'll see if I can write up a PR to this repository which tries to summarize the conversation here into the desired end state of disallowing duplicate import strings, but also supporting desugaring.

alexcrichton added a commit to alexcrichton/module-linking that referenced this issue Dec 2, 2020
This commit has a few high-level changes:

* The `Binary.md` is sync'd a bit with `Explainer.md` in terms of
  content and links.
* The textual syntax for `alias` directives was tweaked in
  `Explainer.md` to match `Binary.md`
* Actual section numbers were chosen in the binary format. The exception
  handling proposal uses 13 for an event section so the four new
  sections here are getting 14-17.
* Handling of subtyping between instances and modules is now codified in
  `Subtyping.md`. This goes into more detail about how imports are
  elaborated, how instantiation is expected to work, and how to test for
  subtypes.

This is hoped to describe the intended status quo given the discussion
on WebAssembly#7.
@lukewagner
Copy link
Member

lukewagner commented Feb 19, 2021

Good news: the telemetry probes have now been in Firefox Beta 86 for a week and the initial results show that duplicate imports are used by 0.00014% of page loads (concretely, 2.52k page loads), so I think we're good. I was thinking of adding an agenda item to the next CG meeting to get a vote on making this breaking change.

@bvibber
Copy link

bvibber commented Mar 5, 2021

It seems to me like retroactively banning multiple imports of the same named object isn't something that has to be done for all modules, only modules that specifically want to use this module linking system.

Could you allow existing AssemblyScript and other code using overloaded imports to keep running, and simply throw a validation error when trying to use such a module in a module linking context?

This would avoid backwards compatibility issues, AND provide best practices for tool authors to support using module linking, AND can provide an obvious consistent error message on failure.

@lukewagner
Copy link
Member

That's also an option we've discussed, but it's unfortunate to create a mechanism for linking wasm modules that only works for some kinds of modules -- it essentially means there are two sorts of modules: "top-level" modules (that can only be instantiated by the host) and "linkable" modules (that can be instantiated by either the host or wasm).

In any case: if the CG answers the root question ("should wasm modules always be able to virtualize wasm imports") in the affirmitive, then this proposal isn't the extent of the problem--the JS API and ESM-integration have the same issue.

lukewagner pushed a commit that referenced this issue Sep 23, 2021
This commit has a few high-level changes:

* The `Binary.md` is sync'd a bit with `Explainer.md` in terms of
  content and links.
* The textual syntax for `alias` directives was tweaked in
  `Explainer.md` to match `Binary.md`
* Actual section numbers were chosen in the binary format. The exception
  handling proposal uses 13 for an event section so the four new
  sections here are getting 14-17.
* Handling of subtyping between instances and modules is now codified in
  `Subtyping.md`. This goes into more detail about how imports are
  elaborated, how instantiation is expected to work, and how to test for
  subtypes.

This is hoped to describe the intended status quo given the discussion
on #7.
lukewagner pushed a commit that referenced this issue Sep 24, 2021
This commit has a few high-level changes:

* The `Binary.md` is sync'd a bit with `Explainer.md` in terms of
  content and links.
* The textual syntax for `alias` directives was tweaked in
  `Explainer.md` to match `Binary.md`
* Actual section numbers were chosen in the binary format. The exception
  handling proposal uses 13 for an event section so the four new
  sections here are getting 14-17.
* Handling of subtyping between instances and modules is now codified in
  `Subtyping.md`. This goes into more detail about how imports are
  elaborated, how instantiation is expected to work, and how to test for
  subtypes.

This is hoped to describe the intended status quo given the discussion
on #7.
@lukewagner
Copy link
Member

As of #35 and Module Linking being lifted to another layer, this is no longer an issue, so closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants