Hierarchies #245

RossTate · 2021-09-27T16:24:52Z

The following motivates and describes a change to the type section. (Instructions are unchanged.)

Consolidating Too Much Information

Many (most?) well-known type systems run into problems when they have too much information. To illustrate, I'll pick on Kotlin because in Kotlin I can most directly recreate the examples I've discussed on this elsewhere, but be aware that the overarching issue can be demonstrated in most major languages.

Here's a Kotlin program that type-checks. Notice the functions asRed, asGreen, and asBlue. These are simply the identify functions, so they seem like they should be no-ops. You should be able to just inline them, right? Well, unfortunately inlining is not type-preserving in Kotlin. These functions are serving an important purpose: they're telling the type-checker how to type-check the code by downgrading the information about the variable c's type. By inlining them, you've removed those downgrades, and if you though inlining should be type-preserving then you've made type-checking NP-Hard per this presentation.

The issue is that, in order to support common patterns, Kotlin deduces information about a variable's type within contexts where that immutable variable has been cast, and here that means that c is simultaneously a Red, a Green, and a Blue. The grammar for field accesses in Kotlin only specifies the field name (analogous to field offset in the MVP), leaving the type of the field to be synthesized from the type of the receiver. But here the receiver has an intersection type with each component specifying different types for the field, so the type-checker has too much information.

To address this and similar cases, Kotlin throws out subsumptive subtyping (i.e. this core wasm rule) and raises an ambiguity type-checking error asking the programmer to resolve the error by downgrading types. Many languages take this approach, and if you've ever encountered an ambiguity error than you're using a language without subsumptive subtyping. This is reasonable in surface languages where programmers are interactively involved and can patch the code accordingly. But it doesn't work so well for low-level languages where programmers cannot so easily patch compiled code, and where program transformations/optimizations (such as inlining) are readily employed but are not type-preserving without subsumption.

Now, if you look at the Kotlin example, you'll notice that Red, Green, and Blue are all classes, and none of these inherit the others. Kotlin permits only single-class inheritance, which means unrelated classes can have no common instances. Because the type-checker has complete knowledge of the class-inheritance hierarchy, it could realize that c cannot exist and can be given the "bottom" type Nothing. This amounts to consolidating information into something that is unambiguous for the type-checker. The "hierarchies" I describe here provide the means to always consolidate information and thereby avoid both ambiguity and exponential type-checking. (This doesn't work for Kotlin interfaces, but thankfully interfaces are a surface-level construct that gets lowered to distinct low-level constructs, and so the lowering process itself must resolve the ambiguity above the WebAssembly level.)

Design

A hierarchy is defined as follows in the types section:

deftype ::= ... | <hierarchy>
hierarchy ::= hierarchy 1 <dependent> ; (more dependents in future)
dependent ::= fields <fieldtype>* | ... (more cases in future)

Notice that this is essentially a struct definition, but with room for more extensions, and with different import/export semantics (discussed shortly).

Besides adding this, the only other change to the (nominal) MVP would be to require struct_subtype and array_subtype to subtype only hierarchy/struct_subtype/array_subtype types (and extensible imported types, per below). This ensures all nominal types belong to some hierarchy. (Typically there will be only one hierarchy per module.)

Imports and Exports

For simplicity, let's suppose that no field hiding is done by the exporting module. In that case, one exports hierarchies as follows:

exportdesc ::= ... | hierarchy <typeidx> | type <exportidx> <typeidx>

For hierarchy <typeidex>, the index must refer to a hierarchy definition.
For type <exportidx>, the index must refer to either an earlier hierarchy export or an earlier type export, in which case the <typeidx> must be a subtype of the type specified by that export.
The use of export indices specifies a subtyping relation between exported types.

One imports hierarchies and types as follows:

importdesc ::= ... | hierarchy extensible? 1 <dependent> | struct_subtype <importidx> extensible? <fieldtype>* | array_subtype <importidx> <fieldtype>

The field types in the _subtype imports are required to be subtypes of their counterparts in the <importidx>.
The use of import indices specifies a subtyping relation between imported types.

When matching individual exports with individual imports, one checks the following properties:

Hierarchies must match with hierarchies, and the <dependent> of the export must be a subtype of (or, if the import is extensible, equivalent to) the import's <dependent>
A struct_subtype import must match with a type export referencing a struct_subtype type, and the <fieldtype>* of the export must be a subtype of (or, if the import is extensible, equivalent to) the import's <fieldtype>*
An array_subtype import must match with a type export referencing an array_subtype type, and the <fieldtype> of the export must be equivalent to the import's <fieldtype>
For _subtype imports, the imported type referenced <importidx> must be a subtype in the exported hierarchy

Furthermore, one checks the following properties across imports and exports:

For any two imports of types in the same import hierarchy, their corresponding exported types must be subtypes within the export hierarchy if and only if the imports are subtypes within the import hierarchy.

The "and only if" in this last step is critical as it means the importing module has complete knowledge of the subtyping relationships between the imported nominal types within the same hierarchy. This means we can employ the consolidation technique mentioned above: if ever an instance belongs to multiple unrelated nominal types, we can resolve the ambiguity by simply giving it the bottom type. And for module composition, it is important that types be grouped into hierarchies.

Multiple Dependents

The design above leaves room for defining hierarchies with multiple dependents.
In the (current) nominal MVP, there's a common pattern where one generates two nominal hierarchies: one for instances and one for v-tables.
Furthermore, the hierarchy for instances constantly redeclares all the field types for the primary purpose of refining the v-table field to use its counterpart's nominal type in the hierarchy for v-tables.

What's really happening here is that there's one nominal hierarchy with two pieces of information that depend on it: the field types of instances and the field types of v-tables.
With hierarchies, we can encode this pattern more directly.
At the same time, we can use this to add support for a common feature: the v-table's contents are usually placed directly in the object descriptor rather than as a separate field.

First, let's generalize hierarchies a bit:

hierarchy ::= hierarchy <dependent>*

Then let's require that one defines a new nominal type, the definition provides a refinement/extension of each of the dependents on the hierarchy, e.g. specifying new instance fields and new v-table fields.

Second, we could add a new instruction desc.get n : [(ref $t)] -> [tn]. This loads the n'th field from the object descriptor (such as the n'th method in the v-table). The type of this field is given by the second dependent specified by the type $t (whereas instance fields are given by the first dependent). Or we could add a new heap type desc <typeidx> for "object descriptors", add an instruction ref.get_descriptor : [(ref $t)] -> [(ref (desc $t))] for getting an object's descriptor (i.e. the thing that the head of the object points to), and extend get instructions for accessing the fields of these descriptors.

Anyways, there's a lot of flexibility here, and we don't have to do any of this for the MVP if we don't want, so I won't go into more detail unless requested. The only thing that's pressing for the MVP is adding some sort of "hierarchy" structure.

The text was updated successfully, but these errors were encountered:

tlively · 2021-09-27T23:05:07Z

I see that Kotlin has too much type information, but as you write, any ambiguity there has to be resolved before arriving at Wasm. How does the Kotlin example extend to WebAssembly? Can you explain what problem hierarchies solve?

RossTate · 2021-09-28T01:33:32Z

but as you write, any ambiguity there has to be resolved before arriving at Wasm

That's for interfaces, not for classes. An interface-method dispatch compiles to either a lookup of an entry for a specific interface in an i-table or for a call tag of a specific method of a specific interface. In either case, the choice of interface is baked into the compiled code, and therefor the ambiguity must be resolved during compilation. However, for classes the field access or method dispatch resolves to just an offset within either the instance's structure or the v-table's structure, and in this example that offset is the same regardless of which subclass of Color is used. If we were to use field accessors, then the ambiguity would have to be resolved during compilation in order to know which field accessor to use, and that's how other bytecodes have addressed this problem.

How does the Kotlin example extend to WebAssembly?

WebAssembly casts are too lossy for Kotlin to use directly. A Kotlin compiler will either have to add consolidation reasoning or will have to have multiple copies of the WebAssembly local for c, each with a different type.
In Remove redundant typeidx on gc instructions #241 (comment), I discussed how more direct support for casting would work using type variables. In this Kotlin example, the type variable representing the exact type of c gets an additional upper bound with each cast. The WebAssembly type-checker then needs to type-check the struct.get for the instance/v-table field corresponding to color. Without some way to consolidate types, this makes type-checking NP-Hard.

Can you explain what problem hierarchies solve?

They provide a means to synthesize type-checking information from nominal information while keeping type-checking efficient (as in, not NP-Hard).

RossTate · 2021-09-29T14:10:41Z

I just realized that hierarchies enable an implementation optimization that I've been wondering how to support.

Right now, in all variants of the MVP, in order to perform a cast I have to do this sequence:

Load the rtt table from the instance.
Load the rtt of the type being cast to from a compile-time-determined index within that table (this has to be done even for the rtt-less variant of the MVP, since you still have to determine the magic number being used to represent the nominal type of this instance)
Check if the object has exactly that rtt
If not, load the rtt array from the object
Load the rtt from an index within that array (in some variants of the MVP, that index is determined at compile time)
Check to see if the two rtts are equal

With hierarchies, we can (typically) avoid the first two chained loads of this sequence. Here's how:

During compilation, when a module declares a new hierarchy, associate a fresh counter with that new hierarchy (e.g. starting at 0). When a module declares a new type within a hierarchy, use the current value of that counter as the magic number for that new type, and increment the counter by 1. This associates a magic number with every nominal time at compile time.

At instantiation time, associate a fresh "hierarchy identifier" with every new hierarchy declared by the instantiated module. This process ensures that every nominal type can be represented at run time by a instantiation-time-determined "hierarchy identifier" and a compile-time-determined magic number.

During compilation, for each type create an array of the magic numbers of its supertypes (at the appropriate indices). During instantiation, when creating the object descriptors for each type, place a reference to that array within the object descriptor, and place the hierarchy identifier within the object descriptor. (Right now, the array of supertypes has to be generated at instantiation time rather than at compile time.) (Alternatively, one could place the hierarchy identifier in the object itself, in which case the entire object descriptor can be determined at compile time.)

When compiling a cast, in the typical case the static type of the object being cast will be known to be within the same hierarchy as the nominal type being cast to. As a consequence, the cast can be implemented by just checking if the compile-time-determined magic number for the type being cast to is at the appropriate index within the object's array of supertypes. This removes the need to (chained) load anything from the instance in most casts. (I also wouldn't be surprised if it increases the chance that the array of supertypes is in the cache, especially for multi-threaded programs, since it is stored in the compiled binary rather than the instance's heap.)

tlively · 2022-02-23T00:33:28Z

Since this doesn't look like it will make it into the MVP, I've added the Post-MVP label to it. I'll close this for now so we can focus open issues on the MVP, but we can use that tag to find this idea and others like it once the MVP has been finished.

This was referenced Oct 5, 2021

Define iso-recursive type semantics #243

Merged

Understanding Overhead #249

Closed

tlively added the Post-MVP Ideas for Post-MVP extensions label Feb 23, 2022

tlively closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchies #245

Hierarchies #245

RossTate commented Sep 27, 2021

tlively commented Sep 27, 2021

RossTate commented Sep 28, 2021

RossTate commented Sep 29, 2021

tlively commented Feb 23, 2022

Hierarchies #245

Hierarchies #245

Comments

RossTate commented Sep 27, 2021

Consolidating Too Much Information

Design

Imports and Exports

Multiple Dependents

tlively commented Sep 27, 2021

RossTate commented Sep 28, 2021

RossTate commented Sep 29, 2021

tlively commented Feb 23, 2022