Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Inefficiency with (Unbounded) Parameteric Polymorphism #117

Closed
RossTate opened this issue Aug 14, 2020 · 3 comments
Closed

Inefficiency with (Unbounded) Parameteric Polymorphism #117

RossTate opened this issue Aug 14, 2020 · 3 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Aug 14, 2020

Terminology note: Parametric polymorphism is the use of instantiable type parameters, whereas subtype polymorphism is the use of subtyping.

In #114, the following extensions are described as part of a sketch for supporting parametric polymorphism (where I've removed bounds):

  • Allow type parameters on function types:

    • functype ::= (func <typeparam>* <valtype>* <valtype>*)
    • typeparam ::= (typeparam $x)
  • Add a way to reference a type parameter as a heap type:

    • heaptype ::= ... | (typeparam $x)
  • Generalise all call instructions (and func.bind) with a way to supply type arguments:

    • e.g., call $f <heaptype>*

This extension makes subtyping at least quadratic time and no longer amortizable with memoization. The ability to instantiate type parameters with type arguments means new types are constructed during type checking and consequently have no entries in a memoization table.


This inefficiency in the presence of parametric polymorphism seems fundamental to the use of structural types in the current MVP.

@rossberg
Copy link
Member

rossberg commented Aug 17, 2020

The ability to instantiate type parameters with type arguments means new types are constructed during type checking and consequently have no entries in a memoization table.

I would assume that the nature of memoization is that it is always constructed incrementally while checking. That seems unavoidable with polymorphism. What alternative do you have in mind?

This inefficiency in the presence of parametric polymorphism seems fundamental to the use of structural types in the current MVP.

Once you have type parameters, a structural component to subtyping is inherent and unavoidable. You have to traverse the type parameters structurally, and they can be arbitrarily nested. Checking whether Pair<Pair<Pair<A,B>, Pair<C,D>>, E> <: Pair<Pair<Pair<A,B>, Pair<C,D>>, E> nominally is all the same as checking (((A,B), (C,D)), E) <: (((A,B), (C,D)), E) structurally. So a Wasm engine will want to memoize these relations in either case.

I suppose what you are getting at is that there is an expansion involved if Pair is an alias instead of a nominal definition. And that this expansion involves (at least conceptually) substitution. That is correct, but a separate issue. More importantly, this is already the case for link-time type checking, even with nominal definitions. So a Wasm engine has to implement (or emulate) structural substitution and incremental memoization no matter what.

AFAICS, the extension does not change the worst-case complexity of type checking. You could expand all type instantiations in a linear pre-pass before checking. But of course, type polymorphism allows expressing much bigger types.

In practice, I would probably not perform actual substitution, but rather run the subtyping check relative to a local substitution environment.

@RossTate
Copy link
Contributor Author

AFAICS, the extension does not change the worst-case complexity of type checking. You could expand all type instantiations in a linear pre-pass before checking.

This linear pre-pass is only useful for memoization if you recognize redundancies across instantiations. That requires hash-consing, which is linear with respect to the size of the type. In other words, memoization itself becomes linear time rather than constant time. Furthermore, you have to use a hashmap rather than an array to implement memoization, which causes its own slowdowns.

There's also the important issue that grammatically identical types are not necessarily identical in the presence of bounded parametric polymorphism (even only predicative polymorphism, which is decidable). The same type variable can have different bounds in different environments. This makes defining/implementing the cons-hashing function more complicated/expensive, and it reduces the number of redundancies.

So a Wasm engine has to implement (or emulate) structural substitution and incremental memoization no matter what.

Actually, memoization is only useful if there are lots of redundancies, there is an easy way to identify redundancies, and the operation being memoized is slow. But well-designed subtyping systems are linear time except for corner cases, and implementable without any data structures (besides the type expression), whereas memoization itself is already linear time and requires maintaining a hashmap and is sensitive to loss in redundancies.

This makes a difference in practice. I've even seen removing memoization speed up type-checking by over an order of magnitude. That, of course, relied on non-memoized subtyping being efficient, which isn't the case with this proposal.

Checking whether Pair<Pair<Pair<A,B>, Pair<C,D>>, E> <: Pair<Pair<Pair<A,B>, Pair<C,D>>, E> nominally is all the same as checking (((A,B), (C,D)), E) <: (((A,B), (C,D)), E) structurally.
...
I suppose what you are getting at is that there is an expansion involved if Pair is an alias instead of a nominal definition.

Your second sentence here captures why the comparison in your first sentence here is completely misrepresentative of the issues. Remember that the nominal type for java.lang.ArrayList expands into $t6 in the following (already minimized yet still incomplete) structural type:

$t1 = ref (func (param $t32) (result i32))
$t2 = ref (func (param $t6 i32) (result))
$t3 = ref (struct opt$t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t4 = ref (func (param $t12 $t28) (result i32))
$t5 = ref (func (param $t12) (result))
$t6 = ref (struct i32 $t23 (mut i32) (mut $t24) (mut i32))
$t7 = optref (struct i32 $t30 (mut i32))
$t8 = ref (func (param $t32) (result $t42))
$t9 = ref (func (param $t12 $t28) (result $t28))
$t10 = ref (func (param $t32) (result $t28))
$t11 = ref (func (param $t32 $t28) (result i32))
$t12 = ref (struct i32 $t14))
$t13 = ref (func (param $t32 i64 i32) (result))
$t14 = ref (struct $t3 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16)
$t15 = ref (func (param $t7 $t28) (result i32))
$t16 = ref (func (param $t12) (result $t28))
$t17 = ref (func (param $t12 $t24) (result $t24))
$t18 = ref (func (param $t6) (result))
$t19 = ref (func (param $t7 i32 i32) (result $t28))
$t20 = ref (func (param $t32 i64) (result))
$t21 = ref (func (param $t7 i32 $t28) (result $t28))
$t22 = ref (func (param $t7) (result $t28))
$t23 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36 $t2 $t18)
$t24 = optref (struct i32 $t3 $t41 $t27)
$t25 = ref (func (param $t7) (result i32))
$t26 = ref (func (param $t7 i32 i32) (result))
$t27 = array (mut $t28)
$t28 = optref (struct i32 $t3)
$t29 = ref (func (param $t12) (result i32))
$t30 = ref (struct $t30 $t41 funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref funcref $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13 $t4 $t4 $t5 $t4 $t4 $t29 $t16 $t4 $t4 $t4 $t31 $t17 $t9 $t16 $t4 $t16 $t16 $t41 $t37 $t15 $t15 $t22 $t37 $t37 $t26 $t21 $t19 $t15 $t38 $t25 $t38 $t36)
$t31 = ref (func (param $t12) (result $t24))
$t32 = ref (struct i32 $t3)
$t33 = optref (struct opt$t3 $t41 $t10 $t11 $t35 $t8 $t1 $t35 $t35 $t39 $t35 $t20 $t13)
$t34 = array (packed i16 i16)
$t35 = ref (func (param $t32) (result))
$t36 = ref (func (param $t7) (result $t28))
$t37 = ref (func (param $t7 i32) (result $t28))
$t38 = ref (func (param $t7 $t28) (result))
$t39 = ref (func (param $t32) (result $t40))
$t40 = optref (struct i32 $t3 i32 i32 $t34)
$t41 = ref (func (param $t7 i32 $t28) (result))

The nominal type Pair can have a similarly large structural type because it's detailing all the internals of a Pair, such as the structure of its v-table (whose type might depend on the type parameters to Pair). The difference between being linear with respect to nominal types versus (at least) quadratic with respect to structural types is significant not just because of their difference in asymptotic complexity but also because structural types tend to be much much larger than their nominal counterparts. Memoization for structural types is essentially a bandage trying to dynamically recover the (sub)type relationships already explicitly provided in the nominal types.

rossberg pushed a commit that referenced this issue Feb 24, 2021
@tlively
Copy link
Member

tlively commented Nov 1, 2022

The type system for the MVP is now settled, but of course will have to make sure that any post-MVP features we work on have appropriate algorithmic complexity. PRs adding general notes about known algorithmic issues to the post-MVP doc would be welcome, but I'll close this particular issue since it's not actionable for this particular proposal.

@tlively tlively closed this as completed Nov 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants