-
Notifications
You must be signed in to change notification settings - Fork 74
Complications with multiple upper bounds #160
Comments
They were upper bounds. But they can be problematic nonetheless. |
As @rossberg mentions, they are subtype constraints, i.e. upper bounds. Given that,
Well, sort of agree. The type
I don't quite see where you would need to represent "both possible return types", or how it has "both signatures". A function has one signature. I.e. where in the typechecking algorithm that (intermediate) type arises. |
While we are on the topic, I'll point out another subtlety that tripped up my intuition. If we import |
Oops, for some reason I always flip upper and lower bounds. Post has been fixed. Thanks for the catch!
Because Note that I could construct similar examples using structs if |
Well, I meant something specific about set theory. Types represent (potentially infinite) sets of values, so |
But again, if you have a function reference of type |
Ah. In the example, you have a |
@RossTate, I still don't quite understand where the intersection is actually used. We have this situation (with changed names):
And we are trying to type a |
So "based on its context" is where you are peeking ahead. You are essentially type-checking the following instructions with both options and seeing which one pans out. How long that takes depends on the types involved and how the instruction set is designed. For example, if your This peek-ahead strategy is exponential time. To see why, consider the following:
Note that I can replace |
In order for this subtyping arrangement to be possible, there must be some common subtype between EDIT:
|
But then |
Does it need to be, in the sense you're talking about, for @titzer's purposes? |
I believe so. In @titzer's presentation, the idea was that the Jawa module would import all the required types, solely specifying the required subtyping relations through multiple upper-bound constraints, and then import accessors and mutators of the types. That way the runtime could generate the type after linking, when all the fields could be determined, and provide the generated type and the relevant accessors and mutators to the Jawa module. |
Ok, I see the problem now, but your example involves mutually recursive type constraints. In what I propose, type imports have to be topologically sorted and their constraints can only refer to prior types and type imports. Further, for relaxed section order, types can only be mutually recursive within a block of declarations, so this:
wouldn't be legal.
Would however. And this is where I think the intersection type pops up in the middle of type checking; the return value of the call_ref is the intersection of |
I believe this is incompatible with what @rossberg has told me is necessary for his type-imports plan, but I might be misremembering, so I'll let him clarify. (Related: WebAssembly/function-references#41) But regardless, the recursion was just for concision. You can recreate the example by having a chain of
I haven't seen plans to add intersection types to WebAssembly. They too would cause the same complexity-class issues. |
@titzer there's something I've misunderstood about how Jawa works. I believed that your abstract type imports were intended to completely hide object representations, and instances of the abstract type aren't intended to be introspected at all in the importing code, but instead always handled through imported getters/setters/method invokers. For the scenario @RossTate is posing above to be meaningful, you must need the capability to import an abstract type, with a subtyping constraint that means it is statically known to be a function, and then perform a regular Wasm |
Note that I could create the same problem with just structs if |
I suppose a wider point I'm trying to understand is that it doesn't seem to make sense to have an abstract type import with an associated subtyping constraint that's a known |
@conrad-watt is right. For Jawa, we want to hide object representations, so it is the case that subtype constraints are only needed between type variables. |
It's also worth noting that we need to reintroduce a bottom |
Ah, so if I understand @conrad-watt right, the suggestion is that there is an option 3 for the current MVP: to change the Type Imports proposal to only allow imported types to be constrained by other imported types. @rossberg, would that work? |
I think it would still be ok for imported types to be constrained by (untyped) |
More nuance:
which @rossberg probably cares about. I agree with @RossTate that it seems the current setup heavily assumes that imported abstract types will never have multiple structural subtyping constraints (unless we add annotations back in). It might be that in order to preserve the above use-case, if we wanted to support @titzer, we would need some mechanism to declare a hierarchy of " |
An update: I realized I can encode 3-graph coloring in the current MVP with just |
I mean, it still seems to be an issue with a proposal to add multiple structural upper bounds. Nothing more, nothing less. I think the example of #160 (comment) was already enough to show that we don't ever want to handle such bounds if we we allow instructions like I'm even a little iffy about abstract types with a single structural bound being introduced so early, but that's just about ok because for most of the validation the abstract type can simply be identified with the upper bound. The two ways forward laid out at the end of the OP still seem to be the options on the table: do we want to commit to never supporting multiple structural upper bounds, in order to elide some type annotations? |
This issue makes me even more concerned with the current push to omit type annotations from instructions where things can (currently) be inferred from stack types. It seems to be a constantly-unforeseen consequence, like our seeming mistake (IMO) with an unannotated |
I'm afraid that I don't have a great answer here, other than agreeing with the general concern about introducing intersection types. They will always be tricky, no matter what the rest of the type system is, as they effectively require doing type checking as search. Maybe a sufficient amount of annotations can contain that, but I'm not so sure. I doubt that it would be feasible to restrict bounds to abstract types, because that will likely completely break linking and other forms of composition (i.e., some modules type-check separately, but there is no way to type the result of merging them, even though their import/exports match). Technically, you'll break type substitution, which obviously will cause all sorts of problems with the meta theory as well.
Are you assuming unannotated |
Neither
Multiple upper bounds are not fundamentally tied to exponential search; there are other systems designed around other typing strategies that can handle multiple upper bounds without needing exponential search or frequent type annotations.
Unfortunately we need to pick an answer in the current MVP. Unless another option appears that's consistent with the current MVP: are we forever disallowing multiple lower bounds, or are we having more type annotations than any related system? |
I assume you mean versions where the index is static, so we know exactly what the stack permutation is? |
Yes. |
That depends on other properties of the type system. If we cannot restrict the system to strictly-forward type propagation everywhere, then
Multiple bounds would probably have to be restricted to nominal types, which makes all intersections unique. That seems fine, since they tend to only arise in the compilation of languages with nominal subtyping, AFAICT. Is that what you're alluding to?
s/lower/upper/ I don't think this has to be solved for the MVP, unless there is evidence that any possible typing extension is completely incompatible with the MVP. Not every question needs to be fully answered for the MVP, or we'd delay it indefinitely. |
Does this suggest that EDIT: i.e. because they are inherently structural, and we expect multiple bounds to only be possible for nominal types? |
Again, this is not true. In case we are interpreting these names differently, the forward-propagating principal type of
Not annotating
This is not what I am alluding to. I adapted this construction from a nominal type system, so a simple appeal to nominal types will not address the problem. That said, they do seem to help. Related systems that already address this problem, such as the .NET CLI, iTalX, and Ceylon, each do so in a different manner, but in all cases they seem to appeal to some nominal aspect of their type system to make the problem tractable.
Looking at other type systems that have multiple upper bounds, they either resort to explicit upcasting or to a solution that is incompatible with the MVP, so there is evidence that any possible typing extension would be incompatible with the MVP unless annotations present on all relevant instructions (which would require changing at least To add to the problem, I determined that multiple upper bounds would make subtyping in the current MVP be PSPACE-Hard (harder than NP). So if we do not want to introduce the various problems incurred by exponential-time type-checking, multiple upper bounds might be fundamentally incompatible with the current MVP regardless of type annotations. |
@rossberg is correct that with the current type system, unannotated
Are you suggesting that adding annotations to correct typing ambiguities like this would be a mistake? If so, why?
I very strongly feel that this is an unhelpful way to frame the problem. We are not trying to fly close to the sun with our validation of instruction sequences. Even if there was some esoteric polynomial algorithm for resolving type stack ambiguities caused by a particular feature proposal, we would almost certainly still be having the same conversation about either dropping the feature or adding additional annotations. The actual fork we find ourselves at can be motivated completely practically, and the trade-offs are something that we can usefully discuss as a committee. The theoretical result is a fun aside, not something that should be positioned as more urgent or fundamental than this discussion. |
Can you clarify?
What do you mean by "like this"? If you mean "like
The theoretical result implies that annotating instructions will not solve the problem—it just pushes the problem to subtyping. The stack problem is still useful to understand though, since it is more broadly applicable, whereas the subtyping problem makes more use of features specific to the current MVP. |
The scenarios you set up here would be solved by adding type annotations. You must now be assuming a more general hypothetical extension for specifying/requiring multiple structural upper bounds which is disconnected from the current MVP. Can you give a more concrete example of the kind of issue you're envisaging? |
I believe the issue with
I realized that the same extension (i.e. multiple upper bounds) makes subtyping PSPACE-Hard in the current MVP, introducing a new complication for the same feature in the current MVP. |
Agree.
Ah, I didn't get your point. So this is about subtyping between |
Yes, though the encoding can also be done with function types in place of struct types, i.e. some form of covariance and contravariance, and relies on type recursion. Interestingly, this means the encoding can apply to nominal systems, though it is less limiting because nominal systems have less need for generics and (especially) type recursion. |
Should I interpret this as meaning that both nominal and structural Wasm would need to implement the same restrictions to avoid this problem? What kind of restriction would a hypothetical nominal system implement, that wouldn't be palatable for a structural one? |
Same restriction but palatability will be very different. The talk that ended up not happening this week happens to be on the topic, so when I do end up giving that talk I'll elaborate it a bit to go into this issue. But it's a lot of background to put into a GitHub post, so I figured I'd wait to elaborate things here until after there's some communal understanding of the relevant background. For today's presentation I'll focus on the stack issue rather than the subtyping issue. |
Important Correction: Something felt wrong, so I dug into actual formal definitions and learned that, although a simulation relation between DFAs is a relation between states of the automata, a simulation relation between NFAs is a relation between sets of states of the automata. The generalization is necessary to show, for example, that the NFA for the regex The question is, then, just how hard is subtyping with multiple upper bounds? That seems to be difficult to answer. What seems clear is that the obvious extension of the standard subtyping algorithm would be exponential. The key invariant that algorithm relies on is that, once it is known that certain subtyping need to hold for the goal to hold, then there is a deterministic list of subtypings that all need to hold for the goal to hold. As such, it can keep exploring until either all the dependencies are determined to be able to hold (keeping mindful to detect and accept cycles) or until it finds a subtyping that needs to hold but does not hold (making the goal false). In particular, once any contradiction is found, the algorithm is done. But with multiple bounds, there can be multiple ways to make a given goal hold, so if one way doesn't need work out you need to backtrack and try a different way. And that's when cycles become particularly problematic. Suppose the goal holding relies on To avoid exponential blowup, the algorithm would need to track dependencies and make incremental updates as dependencies are resolved. These dependencies can themselves get complicated, as can the updates, making it hard to reason about the computational complexity of such an approach. I tried reasoning about both lazy and eager approaches to building and incrementing dependencies, and in each case I found the algorithm had quartic complexity. To give the high-level intuition, with multiple bounds a subtyping goal can depend on a conjunction of arbitrary-sized disjunctions of subtyping goals (whereas with single bounds the disjunction is either empty or singleton), resulting in a complexity that is square with respect to the number of subtyping goals, which itself is square with respect to the size of the types, resulting in quartic complexity with respect to the types. My intuition at present is that the problem is not PSPACE-Hard, but quartic complexity isn't exactly great, and any algorithm seems likely to be substantially more complicated (you can't just use a memoization matrix of booleans anymore) if you don't want to risk exponential blow up. Regardless, sorry for the misinformation above! |
I should clarify that the above analysis is based on the straightforward extension to the subtyping rules as they are currently stated in the relevant proposals. This choice of rules, though, is no longer "the most general definition that is sound". I am assuming @rossberg is fine with that loss. |
Closing this because we have decided that the MVP will not include type imports. This discussion would be best continued on the separate type imports proposal. |
@titzer's Jawa system made use of importing types with multiple upper-bound constraints. More generally, permitting type variables to have multiple upper-bound constraints is often quite useful for low-level type systems. For example, a user-level casting operation often takes the run-time representations of two unknown types (represented as type variables) and returns a boolean that, if equal to
true
, guarantees that one type variable is a subtype of another, thereby imposing an additional upper bound on the former type variable. (Such an operation is often useful for, say, optimizing multiple assignments into a covariant array in reified polymorphic code: perform one cast of the type parameter to the array's unknown element type, after which all assignments are safe because the two unknowns are known to be subtypes.)Unfortunately, having multiple upper bounds also causes a number of complications that one needs to plan ahead for if one wants to support this feature. For example, suppose
$A
has both$B
and$C
as upper bounds. Suppose$B
has the upper bound[] -> [i32]
and$C
has the upper bound[] -> [f64]
. Then what type doescall_ref
have when the input is aref $A
?i32
orf64
? In this case, we might reason that any such function must necessarily be a[] -> ⊥
, as the return types are incompatible. But what if instead the upper bounds were[] -> [(ref $B)]
and[] -> [(ref $C)]
respectively? In that case, the return types are compatible (a function returningref $A
would have both signatures), but there is no way to represent their combined return types. This means the type-checker would have to attempt to type-check the remainder using both possible return types.This is one of the many subtleties that arises from designing a type system around structural subtyping. For a structural MVP, the likely viable solutions are the following (though I find neither of them to be great):
struct.get
currently does) or with the type(s) to upcast the resulting value(s) to.Edit: I had said lower bounds when I meant upper bounds. Post is now correct.
The text was updated successfully, but these errors were encountered: