Supporting Basic Generics before Advanced Generics

In yesterday's presentation, @rossberg recapped some of the advanced challenges to managing generics. While these advanced challenges are important, they are common to both structural and nominal types, and so I would like to first take a step back and call attention to some more basic challenges with generics that nominal types have already addressed but that structural types cannot overcome.

Consider the example of C# arrays that were prominently featured in the presentation. For 20 years, no one has been able to design a (decidable) structural type system that can express even just the low-level safety invariants of C# arrays. I warned of these issues when I first tried to get involved with WebAssembly, and more recently posted #138 to prompt broader group discussion of them, and there still seems to be no solution in sight even for easier cases like Java arrays. This means that C#, Java, Kotlin, and many other languages will have to compile their arrays (of reference types), regardless of the specific reference type, to just `array anyref` in the current structural *Post*-MVP and will always cast accessed values. Meanwhile, there is a [15-year-old paper](https://doi.org/10.1145/1379022.1375604) showing how to express these arrays with nominal types, a [12-year-old paper](https://doi.org/10.1145/1379022.1375604) actually putting nominal types into practice in an existing compiler for C#, and a [10-year-old paper](https://doi.org/10.1145/1809028.1806644) showing how to make it practical to generate these types (plus an accompanying [tech report](https://www.cs.cornell.edu/~ross/publications/italx/existentialstr.pdf) providing the necessary theory for generalizing the approach to other classes of languages).

As for typed functional languages, consider the example of polymorphic recursion that was also prominently featured in the presentation. As pointed out in https://github.com/WebAssembly/gc/issues/119#issuecomment-683324242, the common application of polymorphic recursion discussed in [the literature](https://dl.acm.org/doi/10.1145/237721.237728) is handling expansive-recursive types, with the function `collect` for flattening a `datatype 'a T = EMPTY | NODE of 'a * (('a T) T)` into a list being the prominent used-in-industry ML example discussed. This type is inexpressible in the Post-MVP and in any decidable structural type system because expensive-recursive structural subtyping is known to be undecidable.<sup>1</sup> The lack of expressiveness means that the `NODE` wasm-structure will have to be a pair of `'a` and `T`-of-`anyref` or the like, losing the invariant that this latter data structure is comprised of `'a`s. And, since ML types are erased, there is no way to dynamically cast these values to `'a`, which means that `collect` cannot guarantee the resulting `list` is comprised of only `'a`s. If `'a` were to denote `string` in some context, then this means that there is a `string list` that is not guaranteed (according to wasm's type system) to contain only `string` values, which consequently means wasm functions taking `string list` cannot demand the list only contains strings. So any compilation of ML programs with expansive-recursive types into the current *Post*-MVP will have to ignore type arguments on `list`s (and in fact all generic types) and just insert casts everywhere, just like C#/Java/Kotlin arrays have to. Meanwhile, nominal types are able to express this feature because they allow one to define an expansive *structure* without also making *subtyping* expansive (just like how the type `'a T` in ML denotes an expansive structure but is not a subtype of anything).

So, in debating between structural and nominal types, before advancing into discussing the likes of how a nominal Post-MVP will represent the RTT for `string[][][]` (which the above three papers can all do) or a `((String T) T) T` (which the techniques in the above tech report can do), we should first acknowledge that a (decidable) structural Post-MVP cannot even express the structural types of `string[]` or `String T` to begin with.

---

<sup>1</sup>Right now we can decide subtyping between equirecursive types by viewing them as *finite* automata, with (unparameterized) fixpoint type variables serving as the back edges. Unfortunately, when the variables can be parameterized, that corresponds to generating a new set of automaton states rather than being able to cyclically reuse existing automaton states, meaning the automaton is no longer finite. One can also encode a Turing machine as an infinite automaton, incorporating the state of the string directly into the states of the automaton, and it turns out expansive-recursive types can express this encoding. See, for example, [this encoding](https://dl.acm.org/doi/10.1145/3093333.3009871) of Turing machines using expansive recursion in Java generics&mdash;an encoding can easily be translated to use structural types instead. This is not problematic for Java at the low level because Java erases its generics, but C# reifies generics and consequently disallows expansive inheritance to prevent this problem (due to [this paper](https://www.microsoft.com/en-us/research/publication/on-decidability-of-nominal-subtyping-with-variance/)). And [it is known](https://dl.acm.org/doi/abs/10.1145/2666356.2594308) that expansive *nominal* inheritance is not used in practice, even when permitted, though expansive *structures* are.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting Basic Generics before Advanced Generics #156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting Basic Generics before Advanced Generics #156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions