-
Notifications
You must be signed in to change notification settings - Fork 74
Hierarchies #245
Comments
I see that Kotlin has too much type information, but as you write, any ambiguity there has to be resolved before arriving at Wasm. How does the Kotlin example extend to WebAssembly? Can you explain what problem hierarchies solve? |
That's for interfaces, not for classes. An interface-method dispatch compiles to either a lookup of an entry for a specific interface in an i-table or for a call tag of a specific method of a specific interface. In either case, the choice of interface is baked into the compiled code, and therefor the ambiguity must be resolved during compilation. However, for classes the field access or method dispatch resolves to just an offset within either the instance's structure or the v-table's structure, and in this example that offset is the same regardless of which subclass of
They provide a means to synthesize type-checking information from nominal information while keeping type-checking efficient (as in, not NP-Hard). |
I just realized that hierarchies enable an implementation optimization that I've been wondering how to support. Right now, in all variants of the MVP, in order to perform a cast I have to do this sequence:
With hierarchies, we can (typically) avoid the first two chained loads of this sequence. Here's how: During compilation, when a module declares a new hierarchy, associate a fresh counter with that new hierarchy (e.g. starting at 0). When a module declares a new type within a hierarchy, use the current value of that counter as the magic number for that new type, and increment the counter by 1. This associates a magic number with every nominal time at compile time. At instantiation time, associate a fresh "hierarchy identifier" with every new hierarchy declared by the instantiated module. This process ensures that every nominal type can be represented at run time by a instantiation-time-determined "hierarchy identifier" and a compile-time-determined magic number. During compilation, for each type create an array of the magic numbers of its supertypes (at the appropriate indices). During instantiation, when creating the object descriptors for each type, place a reference to that array within the object descriptor, and place the hierarchy identifier within the object descriptor. (Right now, the array of supertypes has to be generated at instantiation time rather than at compile time.) (Alternatively, one could place the hierarchy identifier in the object itself, in which case the entire object descriptor can be determined at compile time.) When compiling a cast, in the typical case the static type of the object being cast will be known to be within the same hierarchy as the nominal type being cast to. As a consequence, the cast can be implemented by just checking if the compile-time-determined magic number for the type being cast to is at the appropriate index within the object's array of supertypes. This removes the need to (chained) load anything from the instance in most casts. (I also wouldn't be surprised if it increases the chance that the array of supertypes is in the cache, especially for multi-threaded programs, since it is stored in the compiled binary rather than the instance's heap.) |
Since this doesn't look like it will make it into the MVP, I've added the Post-MVP label to it. I'll close this for now so we can focus open issues on the MVP, but we can use that tag to find this idea and others like it once the MVP has been finished. |
The following motivates and describes a change to the type section. (Instructions are unchanged.)
Consolidating Too Much Information
Many (most?) well-known type systems run into problems when they have too much information. To illustrate, I'll pick on Kotlin because in Kotlin I can most directly recreate the examples I've discussed on this elsewhere, but be aware that the overarching issue can be demonstrated in most major languages.
Here's a Kotlin program that type-checks. Notice the functions
asRed
,asGreen
, andasBlue
. These are simply the identify functions, so they seem like they should be no-ops. You should be able to just inline them, right? Well, unfortunately inlining is not type-preserving in Kotlin. These functions are serving an important purpose: they're telling the type-checker how to type-check the code by downgrading the information about the variablec
's type. By inlining them, you've removed those downgrades, and if you though inlining should be type-preserving then you've made type-checking NP-Hard per this presentation.The issue is that, in order to support common patterns, Kotlin deduces information about a variable's type within contexts where that immutable variable has been cast, and here that means that
c
is simultaneously aRed
, aGreen
, and aBlue
. The grammar for field accesses in Kotlin only specifies the field name (analogous to field offset in the MVP), leaving the type of the field to be synthesized from the type of the receiver. But here the receiver has an intersection type with each component specifying different types for the field, so the type-checker has too much information.To address this and similar cases, Kotlin throws out subsumptive subtyping (i.e. this core wasm rule) and raises an ambiguity type-checking error asking the programmer to resolve the error by downgrading types. Many languages take this approach, and if you've ever encountered an ambiguity error than you're using a language without subsumptive subtyping. This is reasonable in surface languages where programmers are interactively involved and can patch the code accordingly. But it doesn't work so well for low-level languages where programmers cannot so easily patch compiled code, and where program transformations/optimizations (such as inlining) are readily employed but are not type-preserving without subsumption.
Now, if you look at the Kotlin example, you'll notice that
Red
,Green
, andBlue
are all classes, and none of these inherit the others. Kotlin permits only single-class inheritance, which means unrelated classes can have no common instances. Because the type-checker has complete knowledge of the class-inheritance hierarchy, it could realize thatc
cannot exist and can be given the "bottom" typeNothing
. This amounts to consolidating information into something that is unambiguous for the type-checker. The "hierarchies" I describe here provide the means to always consolidate information and thereby avoid both ambiguity and exponential type-checking. (This doesn't work for Kotlin interfaces, but thankfully interfaces are a surface-level construct that gets lowered to distinct low-level constructs, and so the lowering process itself must resolve the ambiguity above the WebAssembly level.)Design
A hierarchy is defined as follows in the types section:
Notice that this is essentially a
struct
definition, but with room for more extensions, and with different import/export semantics (discussed shortly).Besides adding this, the only other change to the (nominal) MVP would be to require
struct_subtype
andarray_subtype
to subtype onlyhierarchy
/struct_subtype
/array_subtype
types (and extensible imported types, per below). This ensures all nominal types belong to some hierarchy. (Typically there will be only one hierarchy per module.)Imports and Exports
For simplicity, let's suppose that no field hiding is done by the exporting module. In that case, one exports hierarchies as follows:
For
hierarchy <typeidex>
, the index must refer to ahierarchy
definition.For
type <exportidx>
, the index must refer to either an earlierhierarchy
export or an earliertype
export, in which case the<typeidx>
must be a subtype of the type specified by that export.The use of export indices specifies a subtyping relation between exported types.
One imports hierarchies and types as follows:
The field types in the
_subtype
imports are required to be subtypes of their counterparts in the<importidx>
.The use of import indices specifies a subtyping relation between imported types.
When matching individual exports with individual imports, one checks the following properties:
<dependent>
of the export must be a subtype of (or, if the import isextensible
, equivalent to) the import's<dependent>
struct_subtype
import must match with atype
export referencing astruct_subtype
type, and the<fieldtype>*
of the export must be a subtype of (or, if the import isextensible
, equivalent to) the import's<fieldtype>*
array_subtype
import must match with atype
export referencing anarray_subtype
type, and the<fieldtype>
of the export must be equivalent to the import's<fieldtype>
_subtype
imports, the imported type referenced<importidx>
must be a subtype in the exported hierarchyFurthermore, one checks the following properties across imports and exports:
The "and only if" in this last step is critical as it means the importing module has complete knowledge of the subtyping relationships between the imported nominal types within the same hierarchy. This means we can employ the consolidation technique mentioned above: if ever an instance belongs to multiple unrelated nominal types, we can resolve the ambiguity by simply giving it the bottom type. And for module composition, it is important that types be grouped into hierarchies.
Multiple Dependents
The design above leaves room for defining hierarchies with multiple dependents.
In the (current) nominal MVP, there's a common pattern where one generates two nominal hierarchies: one for instances and one for v-tables.
Furthermore, the hierarchy for instances constantly redeclares all the field types for the primary purpose of refining the v-table field to use its counterpart's nominal type in the hierarchy for v-tables.
What's really happening here is that there's one nominal hierarchy with two pieces of information that depend on it: the field types of instances and the field types of v-tables.
With hierarchies, we can encode this pattern more directly.
At the same time, we can use this to add support for a common feature: the v-table's contents are usually placed directly in the object descriptor rather than as a separate field.
First, let's generalize hierarchies a bit:
Then let's require that one defines a new nominal type, the definition provides a refinement/extension of each of the dependents on the hierarchy, e.g. specifying new instance fields and new v-table fields.
Second, we could add a new instruction
desc.get n : [(ref $t)] -> [tn]
. This loads the n'th field from the object descriptor (such as the n'th method in the v-table). The type of this field is given by the second dependent specified by the type$t
(whereas instance fields are given by the first dependent). Or we could add a new heap typedesc <typeidx>
for "object descriptors", add an instructionref.get_descriptor : [(ref $t)] -> [(ref (desc $t))]
for getting an object's descriptor (i.e. the thing that the head of the object points to), and extendget
instructions for accessing the fields of these descriptors.Anyways, there's a lot of flexibility here, and we don't have to do any of this for the MVP if we don't want, so I won't go into more detail unless requested. The only thing that's pressing for the MVP is adding some sort of "hierarchy" structure.
The text was updated successfully, but these errors were encountered: