Complications with multiple upper bounds #160

RossTate · 2020-11-18T04:01:19Z

@titzer's Jawa system made use of importing types with multiple upper-bound constraints. More generally, permitting type variables to have multiple upper-bound constraints is often quite useful for low-level type systems. For example, a user-level casting operation often takes the run-time representations of two unknown types (represented as type variables) and returns a boolean that, if equal to true, guarantees that one type variable is a subtype of another, thereby imposing an additional upper bound on the former type variable. (Such an operation is often useful for, say, optimizing multiple assignments into a covariant array in reified polymorphic code: perform one cast of the type parameter to the array's unknown element type, after which all assignments are safe because the two unknowns are known to be subtypes.)

Unfortunately, having multiple upper bounds also causes a number of complications that one needs to plan ahead for if one wants to support this feature. For example, suppose $A has both $B and $C as upper bounds. Suppose $B has the upper bound [] -> [i32] and $C has the upper bound [] -> [f64]. Then what type does call_ref have when the input is a ref $A? i32 or f64? In this case, we might reason that any such function must necessarily be a [] -> ⊥, as the return types are incompatible. But what if instead the upper bounds were [] -> [(ref $B)] and [] -> [(ref $C)] respectively? In that case, the return types are compatible (a function returning ref $A would have both signatures), but there is no way to represent their combined return types. This means the type-checker would have to attempt to type-check the remainder using both possible return types.

This is one of the many subtleties that arises from designing a type system around structural subtyping. For a structural MVP, the likely viable solutions are the following (though I find neither of them to be great):

Permanently annotate all type-directed instructions with the restricted type form(s) to upcast the relevant reference(s) to (like what struct.get currently does) or with the type(s) to upcast the resulting value(s) to.
Permanently disallow multiple upper bounds.

Edit: I had said lower bounds when I meant upper bounds. Post is now correct.

The text was updated successfully, but these errors were encountered:

rossberg · 2020-11-18T07:31:11Z

@titzer's Jawa system made use of importing types with multiple lower-bound constraints.

They were upper bounds. But they can be problematic nonetheless.

titzer · 2020-11-18T08:49:06Z

As @rossberg mentions, they are subtype constraints, i.e. upper bounds.

Given that,

For example, suppose $A has both $B and $C as lower bounds. Suppose $B has the lower bound [] -> [i32] and $C has the lower bound [] -> [f64]. Then what type does call_ref have when the input is a ref $A? i32 or f64? In this case, we might reason that any such function must necessarily be a [] -> ⊥, as the return types are incompatible.

Well, sort of agree. The type $A must meet both constraints $B and $C, so it represents an intersection. The intersection of i32 and f64 is empty, so there is actually no type binding that can be supplied for $A that meets both constraints.

But what if instead the lower bounds were [] -> [(ref $B)] and [] -> [(ref $C)] respectively? In that case, the return types are compatible (a function returning ref $A would have both signatures), but there is no way to represent their combined return types. This means the type-checker would have to attempt to type-check the remainder using both possible return types.

I don't quite see where you would need to represent "both possible return types", or how it has "both signatures". A function has one signature. I.e. where in the typechecking algorithm that (intermediate) type arises.

titzer · 2020-11-18T09:00:18Z

While we are on the topic, I'll point out another subtlety that tripped up my intuition. If we import $A <: $B, $C and import $D <: $B, it is not actually the case that $D <: $A. But that's probably just my rustiness with intersection types :-)

RossTate · 2020-11-18T12:42:01Z

Oops, for some reason I always flip upper and lower bounds. Post has been fixed. Thanks for the catch!

The type $A must meet both constraints $B and $C, so it represents an intersection.

$A is not necessarily the intersection of $B and $C. It is just a subtype of both. In particular, other types can be subtypes of both $B and $C but not a subtype of $A.

I don't quite see where you would need to represent "both possible return types", or how it has "both signatures".

Because $A is a subtype of two function types, it can be implicitly upcast to either function type (such is the role of non-coercive subtypes). So the call_ref can either by type-checked to return ref $B or to return ref $C. The type-checker does not which is the correct choice and so has to try both in type-checking the subsequent instructions.

Note that I could construct similar examples using structs if struct.get did not have an annotation indicating what type to upcast the reference to. $B and $C could each be subtypes of different structs, and then the return type of struct.get 0 would be the type of the first field in one of these two struct types, but the type-checker doesn't know which to use.

titzer · 2020-11-18T23:32:25Z

$A is not necessarily the intersection of $B and $C. It is just a subtype of both. In particular, other types can be subtypes of both $B and $C but not a subtype of $A.

Well, I meant something specific about set theory. Types represent (potentially infinite) sets of values, so $A is a variable that represents an unknown set of values which is constrained to be a subset of the intersection of $B and $C. It is not equivalent to the intersection, no.

titzer · 2020-11-18T23:36:51Z

Because $A is a subtype of two function types, it can be implicitly upcast to either function type (such is the role of non-coercive subtypes). So the call_ref can either by type-checked to return ref $B or to return ref $C. The type-checker does not which is the correct choice and so has to try both in type-checking the subsequent instructions.

But again, if you have a function reference of type [] -> $A, when you apply the function with call.ref, you get an $A return value. Where do more complicated internal types (e.g. representing an intersection) get introduced in type checking?

RossTate · 2020-11-19T01:33:47Z

But again, if you have a function reference of type [] -> $A, when you apply the function with call.ref, you get an $A return value. Where do more complicated internal types (e.g. representing an intersection) get introduced in type checking?

Ah. In the example, you have a ref $A, which is not necessarily a function reference of type [] -> $A (because $A is not necessarily the intersection of $B and $C, just a subtype thereof).

tlively · 2020-11-19T18:45:48Z

@RossTate, I still don't quite understand where the intersection is actually used. We have this situation (with changed names):

$A <: [] -> [(ref $A')]
$B <: [] -> [(ref $B')]
$C <: $A
$C <: $B

And we are trying to type a call.ref of a value with type (ref $C). I can see that this call could be typed with either [] -> [(ref $A')] or [] -> [(ref $B')], but it's not clear to me that the type checking algorithm won't be able to choose a correct option immediately based on its context. What is a situation in which the type checking algorithm would have to consider both options for a nontrivial number of steps?

RossTate · 2020-11-19T20:15:36Z

but it's not clear to me that the type checking algorithm won't be able to choose a correct option immediately based on its context

So "based on its context" is where you are peeking ahead. You are essentially type-checking the following instructions with both options and seeing which one pans out. How long that takes depends on the types involved and how the instruction set is designed. For example, if your $A' and $B' were each subtypes of functions, then another call_ref wouldn't be enough to determine which is right. (And if they were each subtypes of multiple functions, you could have even more cases to consider after a second call_ref.) If $A' and $B' were instead $A and $B, then you could type-check an arbitrary-length sequence of call_ref instructions before learning which was the correct choice.

This peek-ahead strategy is exponential time. To see why, consider the following:

$A <: [] -> [(ref $t0) (ref $C)]
$B <: [] -> [(ref $t1) (ref $C)]
$C <: $A
$C <: $B

(block ([(ref $C)] -> [...])
  (call_ref)^n
)

Note that I can replace ... with any n-length sequence of ref $t0 and ref $t1s. Each such sequence is possible depending on the choice of how to resolve each of the n copies of call_ref. In theory, a type-checker can figure out that each of these choices is independent, but that requires rather smart reasoning (especially when dealing with extensions like dup and rotate), so a more practical type-checker would likely take exponential time. In fact, if we had pick and if struct.get and struct.set were unannotated, I figured out how to reduce graph 3-coloring to the problem, thereby making it NP-hard.

conrad-watt · 2020-11-19T20:34:56Z

$A <: [] -> [(ref $t0) (ref $C)]
$B <: [] -> [(ref $t1) (ref $C)]
$C <: $A
$C <: $B

In order for this subtyping arrangement to be possible, there must be some common subtype between $t0 and $t1. Would the problem you describe above be avoided if the type of $C (EDIT: and $A and $B I guess) was required to be explicitly declared in addition to declaring its subtyping constraints?

EDIT:
i.e.

$A ::= [] -> [(ref $t0) (ref $C)]
$B ::= [] -> [(ref $t1) (ref $C)]

$C ::= [] -> [(ref $t2) (ref $C)]

$t2 <: $t0
$t2 <: $t1

$C <: $A
$C <: $B

RossTate · 2020-11-19T21:03:50Z

But then $C isn't a type variable.

conrad-watt · 2020-11-19T21:05:53Z

Does it need to be, in the sense you're talking about, for @titzer's purposes?

RossTate · 2020-11-19T21:23:45Z

I believe so. In @titzer's presentation, the idea was that the Jawa module would import all the required types, solely specifying the required subtyping relations through multiple upper-bound constraints, and then import accessors and mutators of the types. That way the runtime could generate the type after linking, when all the fields could be determined, and provide the generated type and the relevant accessors and mutators to the Jawa module.

titzer · 2020-11-19T22:53:36Z

Ok, I see the problem now, but your example involves mutually recursive type constraints.

In what I propose, type imports have to be topologically sorted and their constraints can only refer to prior types and type imports. Further, for relaxed section order, types can only be mutually recursive within a block of declarations, so this:

 $A <: [] -> [(ref $t0) (ref $C)]
 $B <: [] -> [(ref $t1) (ref $C)]
 $C <: $A
 $C <: $B
 
 (block ([(ref $C)] -> [...])
   (call_ref)^n
 )

wouldn't be legal.

 $A <: [] -> [(ref $t0)]
 $B <: [] -> [(ref $t1)]
 $C <: $A
 $C <: $B
 
 (block ([(ref $C)] -> [...])
   (call_ref)
 )

Would however. And this is where I think the intersection type pops up in the middle of type checking; the return value of the call_ref is the intersection of ref $t1 and ref $t0.

RossTate · 2020-11-20T00:14:11Z

In what I propose, type imports have to be topologically sorted and their constraints can only refer to prior types and type imports.

I believe this is incompatible with what @rossberg has told me is necessary for his type-imports plan, but I might be misremembering, so I'll let him clarify. (Related: WebAssembly/function-references#41)

But regardless, the recursion was just for concision. You can recreate the example by having a chain of n variables with two function lower bounds. So my observation about exponential peek-ahead and NP-Hard complexity applies regardless of recursion.

the return value of the call_ref is the intersection of ref $t1 and ref $t0

I haven't seen plans to add intersection types to WebAssembly. They too would cause the same complexity-class issues.

conrad-watt · 2020-11-20T03:14:21Z

@titzer there's something I've misunderstood about how Jawa works. I believed that your abstract type imports were intended to completely hide object representations, and instances of the abstract type aren't intended to be introspected at all in the importing code, but instead always handled through imported getters/setters/method invokers.

For the scenario @RossTate is posing above to be meaningful, you must need the capability to import an abstract type, with a subtyping constraint that means it is statically known to be a function, and then perform a regular Wasm ref.call on an instance of that type within the importing code. Would it break your approach if ref.call of an instance of a type variable was just completely disallowed, regardless of its sub constraints (or something similar like requiring/kinding subtyping constraints to not be function types)?

RossTate · 2020-11-20T03:21:10Z

Note that I could create the same problem with just structs if struct.get did not have an explicit upcast type annotation. The problem is fundamentally about type annotations versus multiple bounds in the current MVP. I just chose funcs because call_ref happens to not have an upcast type annotation.

conrad-watt · 2020-11-20T03:29:22Z

I suppose a wider point I'm trying to understand is that it doesn't seem to make sense to have an abstract type import with an associated subtyping constraint that's a known struct or func type, which is a necessary component of this ambiguous upcasting. In this case you have all the information you need to give the abstract type itself some concrete structure (in the sense I laid out in #160 (comment)). @titzer appears to only require sub constraints which refer to other abstract type imports, which is what I'd expect.

titzer · 2020-11-20T03:39:26Z

@conrad-watt is right. For Jawa, we want to hide object representations, so it is the case that subtype constraints are only needed between type variables.

titzer · 2020-11-20T03:40:09Z

It's also worth noting that we need to reintroduce a bottom null type for Jawa, and I did this by importing an abstract type that is a subtype of all the other reference types (which could be zillions).

RossTate · 2020-11-20T12:19:38Z

Ah, so if I understand @conrad-watt right, the suggestion is that there is an option 3 for the current MVP: to change the Type Imports proposal to only allow imported types to be constrained by other imported types. @rossberg, would that work?

conrad-watt · 2020-11-20T12:55:07Z

I think it would still be ok for imported types to be constrained by (untyped) funcref or dataref? The issue here seems to be that we have instructions like call_ref which need to infer structural type information about the input, but with abstract types + structural constraints, that information might only be available by upcasting, which seems weird.

conrad-watt · 2020-11-20T13:58:23Z

More nuance:
One thing that structural constraints let you do is this setup

M1 {

  $tstruct ::= struct i32 i32

  $tfunc ::= func ($tstruct -> [])

}

M2 {

  import $tstruct (sub struct i32)

  import $func :: func ($tstruct -> [])

}

which @rossberg probably cares about. I agree with @RossTate that it seems the current setup heavily assumes that imported abstract types will never have multiple structural subtyping constraints (unless we add annotations back in).

It might be that in order to preserve the above use-case, if we wanted to support @titzer, we would need some mechanism to declare a hierarchy of "~~really~~ fully abstract types" which are allowed to have multiple constraints but can't be constrained by struct or func.

RossTate · 2020-11-22T16:42:04Z

An update: I realized I can encode 3-graph coloring in the current MVP with just call_ref, plus multiple upper bounds on imported types, plus rotate or pick. I can alternatively encode it in the current MVP but with non-type-annotated struct.get and non-type-annotated struct.set, plus multiple upper bounds on imported types, plus rotate or pick. So this seems to be an issue for the overall typing strategy, rather than any one proposal. It would be useful to hear thoughts from @rossberg.

conrad-watt · 2020-11-22T17:11:34Z

So this seems to be an issue for the overall typing strategy, rather than any one proposal.

I mean, it still seems to be an issue with a proposal to add multiple structural upper bounds. Nothing more, nothing less. I think the example of #160 (comment) was already enough to show that we don't ever want to handle such bounds if we we allow instructions like call_ref to be unannotated. During validation we shouldn't allow such ambiguity about what the desired input stack type is.

I'm even a little iffy about abstract types with a single structural bound being introduced so early, but that's just about ok because for most of the validation the abstract type can simply be identified with the upper bound. The two ways forward laid out at the end of the OP still seem to be the options on the table: do we want to commit to never supporting multiple structural upper bounds, in order to elide some type annotations?

titzer · 2020-11-23T04:35:03Z

This issue makes me even more concerned with the current push to omit type annotations from instructions where things can (currently) be inferred from stack types. It seems to be a constantly-unforeseen consequence, like our seeming mistake (IMO) with an unannotated select instruction.

rossberg · 2020-11-23T11:39:29Z

I'm afraid that I don't have a great answer here, other than agreeing with the general concern about introducing intersection types. They will always be tricky, no matter what the rest of the type system is, as they effectively require doing type checking as search. Maybe a sufficient amount of annotations can contain that, but I'm not so sure.

I doubt that it would be feasible to restrict bounds to abstract types, because that will likely completely break linking and other forms of composition (i.e., some modules type-check separately, but there is no way to type the result of merging them, even though their import/exports match). Technically, you'll break type substitution, which obviously will cause all sorts of problems with the meta theory as well.

@RossTate:

I realized I can encode 3-graph coloring in the current MVP with just call_ref, plus multiple upper bounds on imported types, plus rotate or pick.

Are you assuming unannotated rotate or pick instructions? We have already established that we do not ever want to allow instructions that would require computing lubs or glbs, and that these instructions would require type annotations to avoid that.

RossTate · 2020-11-23T14:43:23Z

that these instructions would require type annotations to avoid that

Neither rotate nor pick require lubs or glbs.

They will always be tricky, no matter what the rest of the type system is, as they effectively require doing type checking as search.

Multiple upper bounds are not fundamentally tied to exponential search; there are other systems designed around other typing strategies that can handle multiple upper bounds without needing exponential search or frequent type annotations.

I'm afraid that I don't have a great answer here

Unfortunately we need to pick an answer in the current MVP. Unless another option appears that's consistent with the current MVP: are we forever disallowing multiple lower bounds, or are we having more type annotations than any related system?

taralx · 2020-11-24T01:03:43Z

Neither rotate nor pick require lubs or glbs.

I assume you mean versions where the index is static, so we know exactly what the stack permutation is?

RossTate · 2020-11-24T02:14:26Z

Yes.

rossberg · 2020-11-26T14:31:29Z

Neither rotate nor pick require lubs or glbs.

That depends on other properties of the type system. If we cannot restrict the system to strictly-forward type propagation everywhere, then pick may require introducing constrained type variables tracking the glb's of their consuming sites. I hope we can avoid such a situation for all time, but requiring a type annotation certainly is the safer choice -- as @titzer points out, select might be a lesson.

Multiple upper bounds are not fundamentally tied to exponential search; there are other systems designed around other typing strategies that can handle multiple upper bounds without needing exponential search or frequent type annotations.

Multiple bounds would probably have to be restricted to nominal types, which makes all intersections unique. That seems fine, since they tend to only arise in the compilation of languages with nominal subtyping, AFAICT. Is that what you're alluding to?

Unfortunately we need to pick an answer in the current MVP. Unless another option appears that's consistent with the current MVP: are we forever disallowing multiple lower bounds, or are we having more type annotations than any related system?

s/lower/upper/

I don't think this has to be solved for the MVP, unless there is evidence that any possible typing extension is completely incompatible with the MVP. Not every question needs to be fully answered for the MVP, or we'd delay it indefinitely.

conrad-watt · 2020-11-26T14:37:46Z

I hope we can avoid such a situation for all time, but requiring a type annotation certainly is the safer choice -- as @titzer points out, select might be a lesson.

Does this suggest that call_ref should have a type annotation, or do we believe this is less of a concern for function types?

EDIT: i.e. because they are inherently structural, and we expect multiple bounds to only be possible for nominal types?

RossTate · 2020-11-30T14:23:29Z

Neither rotate nor pick require lubs or glbs.

That depends on other properties of the type system. If we cannot restrict the system to strictly-forward type propagation everywhere, then pick may require introducing constrained type variables tracking the glb's of their consuming sites.

Again, this is not true. In case we are interpreting these names differently, the forward-propagating principal type of rotate n is [tn ... t0] -> [... t0 tn] and the forward-propagating principal type of pick n is [tn ... t0] -> [tn ... t0 tn].

as @titzer points out, select might be a lesson.

Not annotating select was a mistake because its forward-propagating principle type is [t1 t2 i32] -> [lub(t1,t2)], which makes sense given that it is shorthand for an if. Y'all decided that if should annotate its output types to avoid the use of lubs, and so it would have been consistent to do the same for select. It's not a big deal since at the time there were only numeric types, but to generalize this mistake and suggest that all instructions should have such type annotations is to misunderstand the origin of the problem.

Multiple bounds would probably have to be restricted to nominal types, which makes all intersections unique. That seems fine, since they tend to only arise in the compilation of languages with nominal subtyping, AFAICT. Is that what you're alluding to?

This is not what I am alluding to. I adapted this construction from a nominal type system, so a simple appeal to nominal types will not address the problem. That said, they do seem to help. Related systems that already address this problem, such as the .NET CLI, iTalX, and Ceylon, each do so in a different manner, but in all cases they seem to appeal to some nominal aspect of their type system to make the problem tractable.

I don't think this has to be solved for the MVP, unless there is evidence that any possible typing extension is completely incompatible with the MVP. Not every question needs to be fully answered for the MVP, or we'd delay it indefinitely.

Looking at other type systems that have multiple upper bounds, they either resort to explicit upcasting or to a solution that is incompatible with the MVP, so there is evidence that any possible typing extension would be incompatible with the MVP unless annotations present on all relevant instructions (which would require changing at least call_ref in the MVP).

To add to the problem, I determined that multiple upper bounds would make subtyping in the current MVP be PSPACE-Hard (harder than NP). So if we do not want to introduce the various problems incurred by exponential-time type-checking, multiple upper bounds might be fundamentally incompatible with the current MVP regardless of type annotations.

conrad-watt · 2020-11-30T15:09:10Z

Again, this is not true. In case we are interpreting these names differently, the forward-propagating principal type of rotate n is [tn ... t0] -> [... t0 tn] and the forward-propagating principal type of pick n is [tn ... t0] -> [tn ... t0 tn].

@rossberg is correct that with the current type system, unannotated pick would not fit our expectations for how the type checking algorithm should work. That's one of the motivations behind the relaxed validation proposal.

It's not a big deal since at the time there were only numeric types, but to generalize this mistake and suggest that all instructions should have such type annotations is to misunderstand the origin of the problem.

Are you suggesting that adding annotations to correct typing ambiguities like this would be a mistake? If so, why?

To add to the problem, I determined that multiple upper bounds would make subtyping in the current MVP be PSPACE-Hard (harder than NP). So if we do not want to introduce the various problems incurred by exponential-time type-checking, multiple upper bounds might be fundamentally incompatible with the current MVP regardless of type annotations.

I very strongly feel that this is an unhelpful way to frame the problem. We are not trying to fly close to the sun with our validation of instruction sequences. Even if there was some esoteric polynomial algorithm for resolving type stack ambiguities caused by a particular feature proposal, we would almost certainly still be having the same conversation about either dropping the feature or adding additional annotations.

The actual fork we find ourselves at can be motivated completely practically, and the trade-offs are something that we can usefully discuss as a committee. The theoretical result is a fun aside, not something that should be positioned as more urgent or fundamental than this discussion.

RossTate · 2020-11-30T15:41:21Z

@rossberg is correct that with the current type system, unannotated pick would not fit our expectations for how the type checking algorithm should work.

Can you clarify?

Are you suggesting that adding annotations to correct typing ambiguities like this would be a mistake? If so, why?

What do you mean by "like this"? If you mean "like select", where lubs are known to be necessary in any system with subtyping, then yes it should be annotated. If you mean "like any type-directed instruction", then that would make WebAssembly have more type annotation than any comparable system, which seems at odds with its goals of compact binary size and quick validation.

I very strongly feel that this is an unhelpful way to frame the problem.

The theoretical result implies that annotating instructions will not solve the problem—it just pushes the problem to subtyping. The stack problem is still useful to understand though, since it is more broadly applicable, whereas the subtyping problem makes more use of features specific to the current MVP.

conrad-watt · 2020-11-30T16:11:20Z

Can you clarify?

pick 0 is just dup, and we've talked before about the issues that would cause for type checking.

The theoretical result implies that annotating instructions will not solve the problem—it just pushes the problem to subtyping.

The scenarios you set up here would be solved by adding type annotations. You must now be assuming a more general hypothetical extension for specifying/requiring multiple structural upper bounds which is disconnected from the current MVP. Can you give a more concrete example of the kind of issue you're envisaging?

RossTate · 2020-11-30T16:35:01Z

pick 0 is just dup, and we've talked before about the issues that would cause for type checking.

I believe the issue with dup (and pick) was already resolved by the addition of the bottom value type.

You must now be assuming a more general hypothetical extension for specifying/requiring multiple structural upper bounds which is disconnected from the current MVP.

I realized that the same extension (i.e. multiple upper bounds) makes subtyping PSPACE-Hard in the current MVP, introducing a new complication for the same feature in the current MVP.

conrad-watt · 2020-11-30T16:58:18Z

I believe the issue with dup (and pick) was already resolved by the addition of the bottom value type.

Agree.

I realized that the same extension (i.e. multiple upper bounds) makes subtyping PSPACE-Hard in the current MVP, introducing a new complication for the same feature in the current MVP.

Ah, I didn't get your point. So this is about subtyping between struct types which are allowed to have fields which are abstract types with multiple structural upper bounds?

RossTate · 2020-11-30T18:30:35Z

So this is about subtyping between struct types which are allowed to have fields which are abstract types with multiple structural upper bounds?

Yes, though the encoding can also be done with function types in place of struct types, i.e. some form of covariance and contravariance, and relies on type recursion. Interestingly, this means the encoding can apply to nominal systems, though it is less limiting because nominal systems have less need for generics and (especially) type recursion.

conrad-watt · 2020-12-01T15:04:11Z

Interestingly, this means the encoding can apply to nominal systems, though it is less limiting because nominal systems have less need for generics and (especially) type recursion.

Should I interpret this as meaning that both nominal and structural Wasm would need to implement the same restrictions to avoid this problem? What kind of restriction would a hypothetical nominal system implement, that wouldn't be palatable for a structural one?

RossTate · 2020-12-01T15:53:26Z

Same restriction but palatability will be very different. The talk that ended up not happening this week happens to be on the topic, so when I do end up giving that talk I'll elaborate it a bit to go into this issue. But it's a lot of background to put into a GitHub post, so I figured I'd wait to elaborate things here until after there's some communal understanding of the relevant background. For today's presentation I'll focus on the stack issue rather than the subtyping issue.

RossTate · 2020-12-02T22:03:26Z

Important Correction: Something felt wrong, so I dug into actual formal definitions and learned that, although a simulation relation between DFAs is a relation between states of the automata, a simulation relation between NFAs is a relation between sets of states of the automata. The generalization is necessary to show, for example, that the NFA for the regex a(b|c) is simulated by the NFA for the regex (ab|ac). The reduction I developed was from a relation between just states rather than sets of states and so is not a proof that subtyping would be PSPACE-Hard.

The question is, then, just how hard is subtyping with multiple upper bounds? That seems to be difficult to answer. What seems clear is that the obvious extension of the standard subtyping algorithm would be exponential. The key invariant that algorithm relies on is that, once it is known that certain subtyping need to hold for the goal to hold, then there is a deterministic list of subtypings that all need to hold for the goal to hold. As such, it can keep exploring until either all the dependencies are determined to be able to hold (keeping mindful to detect and accept cycles) or until it finds a subtyping that needs to hold but does not hold (making the goal false). In particular, once any contradiction is found, the algorithm is done. But with multiple bounds, there can be multiple ways to make a given goal hold, so if one way doesn't need work out you need to backtrack and try a different way. And that's when cycles become particularly problematic.

Suppose the goal holding relies on t1 <: t1' holding, which relies on t2 <: t2' holding, which in turn relies on t1 <: t1' holding, i.e. a cycle. Suppose we detect this cycle, but then later discover that another subtyping that t1 <: t1' depends on cannot hold. Even if t1 <: t1' cannot hold, with multiple upper bounds there might be some other way to prove the goal. So the search for a (cyclic) proof needs to go on. But it's also possible that t2 <: t2' can still hold through some other means that doesn't depend on t1 <: t1' holding. And it might be that the search eventually tries to reconsider t2 <: t2'. A backtracking algorithm will have forgotten all the search it had already done for t2 <: t2' and so repeat that search, potentially resulting in exponential blow up. This is conceptually the same problem behind ReDoSing attacks.

To avoid exponential blowup, the algorithm would need to track dependencies and make incremental updates as dependencies are resolved. These dependencies can themselves get complicated, as can the updates, making it hard to reason about the computational complexity of such an approach. I tried reasoning about both lazy and eager approaches to building and incrementing dependencies, and in each case I found the algorithm had quartic complexity. To give the high-level intuition, with multiple bounds a subtyping goal can depend on a conjunction of arbitrary-sized disjunctions of subtyping goals (whereas with single bounds the disjunction is either empty or singleton), resulting in a complexity that is square with respect to the number of subtyping goals, which itself is square with respect to the size of the types, resulting in quartic complexity with respect to the types.

My intuition at present is that the problem is not PSPACE-Hard, but quartic complexity isn't exactly great, and any algorithm seems likely to be substantially more complicated (you can't just use a memoization matrix of booleans anymore) if you don't want to risk exponential blow up. Regardless, sorry for the misinformation above!

RossTate · 2020-12-03T18:17:27Z

I should clarify that the above analysis is based on the straightforward extension to the subtyping rules as they are currently stated in the relevant proposals. This choice of rules, though, is no longer "the most general definition that is sound". I am assuming @rossberg is fine with that loss.

tlively · 2022-02-16T23:49:56Z

Closing this because we have decided that the MVP will not include type imports. This discussion would be best continued on the separate type imports proposal.

RossTate changed the title ~~Complications with multiple lower bounds~~ Complications with multiple upper bounds Nov 18, 2020

RossTate mentioned this issue Nov 23, 2020

Agenda for subgroup meeting - December 1, 2020 #162

Closed

RossTate mentioned this issue Aug 25, 2021

Field Accessors #238

Closed

RossTate mentioned this issue Sep 14, 2021

Remove redundant typeidx on gc instructions #241

Closed

tlively closed this as completed Feb 16, 2022

Complications with multiple upper bounds #160

Complications with multiple upper bounds #160

Comments

RossTate commented Nov 18, 2020 • edited Loading

rossberg commented Nov 18, 2020

titzer commented Nov 18, 2020 • edited Loading

titzer commented Nov 18, 2020

RossTate commented Nov 18, 2020 • edited by tlively Loading

titzer commented Nov 18, 2020 • edited Loading

titzer commented Nov 18, 2020

RossTate commented Nov 19, 2020

tlively commented Nov 19, 2020

RossTate commented Nov 19, 2020

conrad-watt commented Nov 19, 2020 • edited Loading

RossTate commented Nov 19, 2020

conrad-watt commented Nov 19, 2020

RossTate commented Nov 19, 2020

titzer commented Nov 19, 2020

RossTate commented Nov 20, 2020

conrad-watt commented Nov 20, 2020 • edited Loading

RossTate commented Nov 20, 2020 • edited Loading

conrad-watt commented Nov 20, 2020

titzer commented Nov 20, 2020

titzer commented Nov 20, 2020

RossTate commented Nov 20, 2020

conrad-watt commented Nov 20, 2020 • edited Loading

conrad-watt commented Nov 20, 2020 • edited Loading

RossTate commented Nov 22, 2020

conrad-watt commented Nov 22, 2020

titzer commented Nov 23, 2020

rossberg commented Nov 23, 2020

RossTate commented Nov 23, 2020

taralx commented Nov 24, 2020

RossTate commented Nov 24, 2020

rossberg commented Nov 26, 2020

conrad-watt commented Nov 26, 2020 • edited Loading

RossTate commented Nov 30, 2020

conrad-watt commented Nov 30, 2020

RossTate commented Nov 30, 2020

conrad-watt commented Nov 30, 2020

RossTate commented Nov 30, 2020

conrad-watt commented Nov 30, 2020

RossTate commented Nov 30, 2020

conrad-watt commented Dec 1, 2020

RossTate commented Dec 1, 2020

RossTate commented Dec 2, 2020

RossTate commented Dec 3, 2020

tlively commented Feb 16, 2022

RossTate commented Nov 18, 2020 •

edited

Loading

titzer commented Nov 18, 2020 •

edited

Loading

RossTate commented Nov 18, 2020 •

edited by tlively

Loading

titzer commented Nov 18, 2020 •

edited

Loading

conrad-watt commented Nov 19, 2020 •

edited

Loading

conrad-watt commented Nov 20, 2020 •

edited

Loading

RossTate commented Nov 20, 2020 •

edited

Loading

conrad-watt commented Nov 20, 2020 •

edited

Loading

conrad-watt commented Nov 20, 2020 •

edited

Loading

conrad-watt commented Nov 26, 2020 •

edited

Loading