Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Problems for deferred loading #224

Closed
@RossTate

Description

@RossTate

In #220 (comment), @rossberg claimed that Java/C#'s linking was unsound due to their use of nominal types. Here I'll illustrate that the "unsoundness" is actually due to their use of and support for deferred loading. By deferred loading, I mean deferring loading/validating/compiling parts of a program to some time after the program has begun executing, say in order to achieve faster response time or to reduce resource consumption. There is a connection to mutual recursion in that support for deferred loading makes support for (sound) mutual recursion very easy, but I follow up on that specific point in #220 (comment). There is also a connection to nominal vs. structural typing, or more specifically type canonicalization, in that deferred loading does not work well with things like rtt.canon, which I'll illustrate here.

To elaborate on deferred loading, consider the following (Java) class:

package my.code;
import someone.else.Bar;

public class Foo { private Bar b; }

The question is, does class someone.else.Bar need to be loaded (e.g. fetched), compiled, and linked before we can compile my.code.Foo? In systems that support deferred loading, the answer is generally "no", at least supposing someone.else.Bar is at least a reference type (I won't go into the more complicated things with C#'s struct types).

This is perfectly sound. All code using Foo is guaranteed to respect the invariant that only Bar instances (and null) are ever assigned to the field b even without knowing what it means to be a Bar instance. We could add methods like Bar getb() { return b; } and void setb(Bar b) { this.b = b; } and still compile these methods before loading Bar. If this were Go, we could set up complex data structures with interior *Bar pointers to fields of Foo instances, without loading Bar, and be assured that—once Bar gets loaded and defined—those *Bar pointers will respect that definition.

The key for supporting deferred loading (of types) at the low level is to separate type declaration from type definition. That way a linker/loader can just "declare" the type my.code.Bar, without defining it, then pass that declared type as a type import for my.code.Foo, which can in turn use that imported type in its definition of my.code.Foo and its relevant code/methods. Or, more precisely, in its relevant code/methods that treat it abstractly. But eventually something will need to treat the type more concretely, by which point the linker/loader will need to actually load the code intended for my.code.Bar, compile it, and instantiate it in order to get a definition for the type my.code.Bar. When and how that happens is a matter of policy; the Java Language Specification indicates this must be done on demand, e.g. the exact time a field of my.code.Bar first gets access, but there are many other policies, e.g. load in parallel and link once available. In my opinion, a good design for WebAssembly would not standardize a policy but instead provide the means for programs to implement their own policy.

For example, one could compile my.code.Foo to a WebAssembly module that imports not just the type for someone.else.Bar, but also the type for my.code.Foo and the (linear) (not first-class) capability/obligation to define the type for my.code.Foo—this way the module could itself be loaded after some other module abstractly using the type my.code.Foo has already started executing. Then, if someone wanted to use this module in eager-loading setting, they could simplify specify a module that declares the types for both someone.else.Bar and my.code.Foo and passes the types and the definition-capabilities to the relevant modules—all of this would resolve and validate statically, so there's no loss of guarantees for the eager case. But if someone wanted to use this module in a deferred-loading setting, their loader system (written in, say, the JS API) could declare the two types, pass them and the appropriate definition-capability to the my.code.Foo module, and wait until later to load the module for someone.else.Bar and pass it the appropriate types and definition-capability. Of course, when that time comes, something could go wrong: the code for someone.else.Bar may turn out to be missing or have some error. In this case, the person would specify (again, in the JS API or the like) what to do—for Java, the appropriate thing to do would be to throw a NoClassDefFoundError at whatever point in the code prompted someone.else.Bar to be loaded. This is the "unsoundness" that @rossberg alluded to. But none of this makes WebAssembly unsound—it's just to-be-expected dynamic "errors" in someone's custom-written deferred-loading code.

Now, for this to work well, it is very important that type imports be treated abstractly. That way code behaves correctly regardless of whether the type is still undefined or of how it gets defined. This is where rtt.canon (and the like) becomes problematic—it looks past the type import and digs up the type definition in order to determine which rtt to return. What happens if, in the module for my.code.Foo, one asks for the rtt.canon of the imported type for someone.else.Bar that is still undefined? Or if one asks for the rtt.canon of the type definition of my.code.Foo, which is defined in terms of the still-undefined someone.else.Bar? We could make rtt.canon trap in these cases, but that still means that programs wanting to use deferred loading (of types) cannot utilize rtt.canon. On the other hand, instructions like rtt.fresh do respect type abstraction and can be used even when part of the relevant type is a type variable (unknowingly) representing a (presently) undefined typed.

I haven't gone into more advanced use cases, such as letting the module for my.code.Foo defer the compilation of a method like double getxofb() { return b.x; }—which accesses a field of Bar—until after (the relevant parts of) someone.else.Bar has been loaded. I believe staged modules, where imports/definitions/exports have stages associated with them so that the whole module does not have to compiled/instantiated all at one, would be a good fit for this (as, again, they are sound and do not force any particular loading policy), but I suggest deferring discussion of these more advanced use cases to a later time, understanding that they would be well served by the concepts above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions