Supporting Basic Generics before Advanced Generics #156
Description
In yesterday's presentation, @rossberg recapped some of the advanced challenges to managing generics. While these advanced challenges are important, they are common to both structural and nominal types, and so I would like to first take a step back and call attention to some more basic challenges with generics that nominal types have already addressed but that structural types cannot overcome.
Consider the example of C# arrays that were prominently featured in the presentation. For 20 years, no one has been able to design a (decidable) structural type system that can express even just the low-level safety invariants of C# arrays. I warned of these issues when I first tried to get involved with WebAssembly, and more recently posted #138 to prompt broader group discussion of them, and there still seems to be no solution in sight even for easier cases like Java arrays. This means that C#, Java, Kotlin, and many other languages will have to compile their arrays (of reference types), regardless of the specific reference type, to just array anyref
in the current structural Post-MVP and will always cast accessed values. Meanwhile, there is a 15-year-old paper showing how to express these arrays with nominal types, a 12-year-old paper actually putting nominal types into practice in an existing compiler for C#, and a 10-year-old paper showing how to make it practical to generate these types (plus an accompanying tech report providing the necessary theory for generalizing the approach to other classes of languages).
As for typed functional languages, consider the example of polymorphic recursion that was also prominently featured in the presentation. As pointed out in #119 (comment), the common application of polymorphic recursion discussed in the literature is handling expansive-recursive types, with the function collect
for flattening a datatype 'a T = EMPTY | NODE of 'a * (('a T) T)
into a list being the prominent used-in-industry ML example discussed. This type is inexpressible in the Post-MVP and in any decidable structural type system because expensive-recursive structural subtyping is known to be undecidable.1 The lack of expressiveness means that the NODE
wasm-structure will have to be a pair of 'a
and T
-of-anyref
or the like, losing the invariant that this latter data structure is comprised of 'a
s. And, since ML types are erased, there is no way to dynamically cast these values to 'a
, which means that collect
cannot guarantee the resulting list
is comprised of only 'a
s. If 'a
were to denote string
in some context, then this means that there is a string list
that is not guaranteed (according to wasm's type system) to contain only string
values, which consequently means wasm functions taking string list
cannot demand the list only contains strings. So any compilation of ML programs with expansive-recursive types into the current Post-MVP will have to ignore type arguments on list
s (and in fact all generic types) and just insert casts everywhere, just like C#/Java/Kotlin arrays have to. Meanwhile, nominal types are able to express this feature because they allow one to define an expansive structure without also making subtyping expansive (just like how the type 'a T
in ML denotes an expansive structure but is not a subtype of anything).
So, in debating between structural and nominal types, before advancing into discussing the likes of how a nominal Post-MVP will represent the RTT for string[][][]
(which the above three papers can all do) or a ((String T) T) T
(which the techniques in the above tech report can do), we should first acknowledge that a (decidable) structural Post-MVP cannot even express the structural types of string[]
or String T
to begin with.
1Right now we can decide subtyping between equirecursive types by viewing them as finite automata, with (unparameterized) fixpoint type variables serving as the back edges. Unfortunately, when the variables can be parameterized, that corresponds to generating a new set of automaton states rather than being able to cyclically reuse existing automaton states, meaning the automaton is no longer finite. One can also encode a Turing machine as an infinite automaton, incorporating the state of the string directly into the states of the automaton, and it turns out expansive-recursive types can express this encoding. See, for example, this encoding of Turing machines using expansive recursion in Java generics—an encoding can easily be translated to use structural types instead. This is not problematic for Java at the low level because Java erases its generics, but C# reifies generics and consequently disallows expansive inheritance to prevent this problem (due to this paper). And it is known that expansive nominal inheritance is not used in practice, even when permitted, though expansive structures are.