Problems for deferred loading

In https://github.com/WebAssembly/gc/issues/220#issuecomment-852799769, @rossberg claimed that Java/C#'s linking was unsound due to their use of nominal types. Here I'll illustrate that the "unsoundness" is actually due to their use of and support for deferred loading. By deferred loading, I mean deferring loading/validating/compiling parts of a program to some time after the program has begun executing, say in order to achieve faster response time or to reduce resource consumption. There is a connection to mutual recursion in that support for deferred loading makes support for (sound) mutual recursion very easy, but I follow up on that specific point in https://github.com/WebAssembly/gc/issues/220#issuecomment-856335295. There is also a connection to nominal vs. structural typing, or more specifically type canonicalization, in that deferred loading does not work well with things like `rtt.canon`, which I'll illustrate here.

To elaborate on deferred loading, consider the following (Java) class:
```
package my.code;
import someone.else.Bar;

public class Foo { private Bar b; }
```
The question is, does class `someone.else.Bar` need to be loaded (e.g. fetched), compiled, and linked before we can compile `my.code.Foo`? In systems that support deferred loading, the answer is generally "no", at least supposing `someone.else.Bar` is at least a reference type (I won't go into the more complicated things with C#'s `struct` types).

*This is perfectly sound.* All code using `Foo` is guaranteed to respect the invariant that only `Bar` instances (and `null`) are ever assigned to the field `b` *even without knowing what it means to be a `Bar` instance*. We could add methods like `Bar getb() { return b; }` and `void setb(Bar b) { this.b = b; }` and still compile these methods before loading `Bar`. If this were Go, we could set up complex data structures with interior `*Bar` pointers to fields of `Foo` instances, without loading `Bar`, and be assured that&mdash;*once `Bar` gets loaded and defined*&mdash;those `*Bar` pointers will respect that definition.

The key for supporting deferred loading (of types) at the low level is to separate type *declaration* from type *definition*. That way a linker/loader can just "declare" the type `my.code.Bar`, without defining it, then pass that declared type as a type import for `my.code.Foo`, which can in turn use that imported type in its definition of `my.code.Foo` and its relevant code/methods. Or, more precisely, in its relevant code/methods that treat it abstractly. But eventually something will need to treat the type more concretely, by which point the linker/loader will need to actually load the code intended for `my.code.Bar`, compile it, and instantiate it in order to get a definition for the type `my.code.Bar`. When and how that happens is a matter of policy; the Java Language Specification indicates this must be done on demand, e.g. the exact time a field of `my.code.Bar` first gets access, but there are many other policies, e.g. load in parallel and link once available. In my opinion, a good design for WebAssembly would not standardize a policy but instead provide the means for programs to implement their own policy.

For example, one could compile `my.code.Foo` to a WebAssembly module that imports not just the type for `someone.else.Bar`, but also the type for `my.code.Foo` and the (linear) (not first-class) capability/obligation to define the type for `my.code.Foo`&mdash;this way the module could itself be loaded after some other module abstractly using the type `my.code.Foo` has already started executing. Then, if someone wanted to use this module in eager-loading setting, they could simplify specify a module that declares the types for both `someone.else.Bar` and `my.code.Foo` and passes the types and the definition-capabilities to the relevant modules&mdash;all of this would resolve and validate statically, so there's no loss of guarantees for the eager case. But if someone wanted to use this module in a deferred-loading setting, their loader system (written in, say, the JS API) could declare the two types, pass them and the appropriate definition-capability to the `my.code.Foo` module, and wait until later to load the module for `someone.else.Bar` and pass it the appropriate types and definition-capability. Of course, when that time comes, something could go wrong: the code for `someone.else.Bar` may turn out to be missing or have some error. In this case, the person would specify (again, in the JS API or the like) what to do&mdash;for Java, the appropriate thing to do would be to throw a `NoClassDefFoundError` at whatever point in the code prompted `someone.else.Bar` to be loaded. This is the "unsoundness" that @rossberg alluded to. But none of this makes *WebAssembly* unsound&mdash;it's just to-be-expected dynamic "errors" in someone's custom-written deferred-loading code.

Now, for this to work well, it is very important that type imports be treated abstractly. That way code behaves correctly regardless of whether the type is still undefined or of how it gets defined. This is where `rtt.canon` (and the like) becomes problematic&mdash;it looks past the type import and digs up the type definition in order to determine which `rtt` to return. What happens if, in the module for `my.code.Foo`, one asks for the `rtt.canon` of the imported type for `someone.else.Bar` that is still undefined? Or if one asks for the `rtt.canon` of the type definition of `my.code.Foo`, which is defined in terms of the still-undefined `someone.else.Bar`? We could make `rtt.canon` trap in these cases, but that still means that programs wanting to use deferred loading (of types) cannot utilize `rtt.canon`. On the other hand, instructions like `rtt.fresh` do respect type abstraction and can be used even when part of the relevant type is a type variable (unknowingly) representing a (presently) undefined typed.

I haven't gone into more advanced use cases, such as letting the module for `my.code.Foo` defer the compilation of a method like `double getxofb() { return b.x; }`&mdash;which accesses a field of `Bar`&mdash;until after (the relevant parts of) `someone.else.Bar` has been loaded. I believe staged modules, where imports/definitions/exports have stages associated with them so that the whole module does not have to compiled/instantiated all at one, would be a good fit for this (as, again, they are sound and do not force any particular loading policy), but I suggest deferring discussion of these more advanced use cases to a later time, understanding that they would be well served by the concepts above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems for deferred loading #224

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems for deferred loading #224

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions