Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Problems for deferred loading #224

Closed
RossTate opened this issue Jun 7, 2021 · 23 comments
Closed

Problems for deferred loading #224

RossTate opened this issue Jun 7, 2021 · 23 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Jun 7, 2021

In #220 (comment), @rossberg claimed that Java/C#'s linking was unsound due to their use of nominal types. Here I'll illustrate that the "unsoundness" is actually due to their use of and support for deferred loading. By deferred loading, I mean deferring loading/validating/compiling parts of a program to some time after the program has begun executing, say in order to achieve faster response time or to reduce resource consumption. There is a connection to mutual recursion in that support for deferred loading makes support for (sound) mutual recursion very easy, but I follow up on that specific point in #220 (comment). There is also a connection to nominal vs. structural typing, or more specifically type canonicalization, in that deferred loading does not work well with things like rtt.canon, which I'll illustrate here.

To elaborate on deferred loading, consider the following (Java) class:

package my.code;
import someone.else.Bar;

public class Foo { private Bar b; }

The question is, does class someone.else.Bar need to be loaded (e.g. fetched), compiled, and linked before we can compile my.code.Foo? In systems that support deferred loading, the answer is generally "no", at least supposing someone.else.Bar is at least a reference type (I won't go into the more complicated things with C#'s struct types).

This is perfectly sound. All code using Foo is guaranteed to respect the invariant that only Bar instances (and null) are ever assigned to the field b even without knowing what it means to be a Bar instance. We could add methods like Bar getb() { return b; } and void setb(Bar b) { this.b = b; } and still compile these methods before loading Bar. If this were Go, we could set up complex data structures with interior *Bar pointers to fields of Foo instances, without loading Bar, and be assured that—once Bar gets loaded and defined—those *Bar pointers will respect that definition.

The key for supporting deferred loading (of types) at the low level is to separate type declaration from type definition. That way a linker/loader can just "declare" the type my.code.Bar, without defining it, then pass that declared type as a type import for my.code.Foo, which can in turn use that imported type in its definition of my.code.Foo and its relevant code/methods. Or, more precisely, in its relevant code/methods that treat it abstractly. But eventually something will need to treat the type more concretely, by which point the linker/loader will need to actually load the code intended for my.code.Bar, compile it, and instantiate it in order to get a definition for the type my.code.Bar. When and how that happens is a matter of policy; the Java Language Specification indicates this must be done on demand, e.g. the exact time a field of my.code.Bar first gets access, but there are many other policies, e.g. load in parallel and link once available. In my opinion, a good design for WebAssembly would not standardize a policy but instead provide the means for programs to implement their own policy.

For example, one could compile my.code.Foo to a WebAssembly module that imports not just the type for someone.else.Bar, but also the type for my.code.Foo and the (linear) (not first-class) capability/obligation to define the type for my.code.Foo—this way the module could itself be loaded after some other module abstractly using the type my.code.Foo has already started executing. Then, if someone wanted to use this module in eager-loading setting, they could simplify specify a module that declares the types for both someone.else.Bar and my.code.Foo and passes the types and the definition-capabilities to the relevant modules—all of this would resolve and validate statically, so there's no loss of guarantees for the eager case. But if someone wanted to use this module in a deferred-loading setting, their loader system (written in, say, the JS API) could declare the two types, pass them and the appropriate definition-capability to the my.code.Foo module, and wait until later to load the module for someone.else.Bar and pass it the appropriate types and definition-capability. Of course, when that time comes, something could go wrong: the code for someone.else.Bar may turn out to be missing or have some error. In this case, the person would specify (again, in the JS API or the like) what to do—for Java, the appropriate thing to do would be to throw a NoClassDefFoundError at whatever point in the code prompted someone.else.Bar to be loaded. This is the "unsoundness" that @rossberg alluded to. But none of this makes WebAssembly unsound—it's just to-be-expected dynamic "errors" in someone's custom-written deferred-loading code.

Now, for this to work well, it is very important that type imports be treated abstractly. That way code behaves correctly regardless of whether the type is still undefined or of how it gets defined. This is where rtt.canon (and the like) becomes problematic—it looks past the type import and digs up the type definition in order to determine which rtt to return. What happens if, in the module for my.code.Foo, one asks for the rtt.canon of the imported type for someone.else.Bar that is still undefined? Or if one asks for the rtt.canon of the type definition of my.code.Foo, which is defined in terms of the still-undefined someone.else.Bar? We could make rtt.canon trap in these cases, but that still means that programs wanting to use deferred loading (of types) cannot utilize rtt.canon. On the other hand, instructions like rtt.fresh do respect type abstraction and can be used even when part of the relevant type is a type variable (unknowingly) representing a (presently) undefined typed.

I haven't gone into more advanced use cases, such as letting the module for my.code.Foo defer the compilation of a method like double getxofb() { return b.x; }—which accesses a field of Bar—until after (the relevant parts of) someone.else.Bar has been loaded. I believe staged modules, where imports/definitions/exports have stages associated with them so that the whole module does not have to compiled/instantiated all at one, would be a good fit for this (as, again, they are sound and do not force any particular loading policy), but I suggest deferring discussion of these more advanced use cases to a later time, understanding that they would be well served by the concepts above.

@rossberg
Copy link
Member

rossberg commented Jun 8, 2021

In #220 (comment), @rossberg claimed that Java/C#'s linking was unsound due to their use of nominal types.

I did not claim that. I said that recursive linking with nominal types in Java/C# works because they accept unsafe linking. Deferred loading indeed is the key here: the recursive linking semantics relies on the ability to defer loading (and thus checking), plus, a sort of global name space for class names. In Wasm we have neither.

Moreover, the "unsoundness" (I didn't actually use that word, Java semantics is sound due to dynamic checks, it just isn't statically type-safe) I'm referring to is not the possibility of a NoClassDefFoundError, but the possibility of NoSuchMethodError and NoSuchFieldError exceptions, both of which are directly due to the lax linking semantics. Concretely, when you have classes A and B, and B uses a field or method from A, then you have no guarantee that this succeeds at runtime. Because you have no guarantee that the A you actually load has any resemblance to the A you compiled against -- that is not checked! B will successfully load A, access and call into A many times, and may still fail later during the same run when it hits an actual mismatching access.

Standard example:

// A.java
class A {
  public void lock() {}
  public void release() {}
};
// B.java
import java.lang.System;
class B {
  public static void main(String[] args) {
    A a = new A();
    System.out.println("1");
    a.lock();
    System.out.println("2");
    a.release();
    System.out.println("3");
  }
};

Compile A, compile B, run B. All fine. Now notice that you prefer the method name unlock over release for symmetry. Change A, recompile A, but forget to recompile B. Here is what you'll see when you run B again:

1
2
Exception in thread "main" java.lang.NoSuchMethodError: 'void A.release()'
	at B.main(B.java:8)

It's folklore knowledge that, as far as the class system is concerned, Java essentially is a dynamically-typed language. Hence we cannot expect its compilation model to map onto Wasm modules without casts, at least not without restricting the semantics.

@RossTate
Copy link
Contributor Author

RossTate commented Jun 8, 2021

I did not claim that. ... "unsoundness" (I didn't actually use that word, Java semantics is sound due to dynamic checks, it just isn't statically type-safe)

Sorry, I often put quotes around formal terms I'm aware I'm using very handwavily, which in this case was a poor choice because it comes across as a direct quotation from the referenced comment. I chose that term because it's concise and because I recalled you using it in meetings to describe Java linking. But you're right that you did not use it in that specific comment.

I said that recursive linking with nominal types in Java/C# works because they accept unsafe linking.

I will respond to this in the appropriate thread shortly (where we are otherwise waiting for you to respond to our requests to improve your example). The short is that, as pointed above, the "unsoundness" issue has nothing to do with nominal types and is entirely due to support for deferred loading. Java could just as easily load everything eagerly and trap if there are any mismatches (while still using a nominal type system) rather than letting the program run as long as it can until a mismatch is necessarily problematic (e.g. a NoSuchMethodError or a NoSuchFieldError).

To keep this discussion focused, please refrain from discussing mutual recursion here (as I did above).


This post is about (lack of) support for deferred loading (specifically of type definitions). I illustrated how we could enable an application to implemented their own deferred loading policy (in, say, the JS API) by separating type declaration and type definition. But I also illustrated how this would be problematic for rtt.canon. Those are the topics of this thread. Do you have anything to add on topic (rather than on mutual recursion, which is the topic of #220)?

@rossberg
Copy link
Member

rossberg commented Jun 8, 2021

The short is that, as pointed above, the "unsoundness" issue has nothing to do with nominal types and is entirely due to support for deferred loading. Java could just as easily load everything eagerly and trap if there are any mismatches (while still using a nominal type system) rather than letting the program run as long as it can until a mismatch is necessarily problematic (e.g. a NoSuchMethodError or a NoSuchFieldError).

No it can't, not without fundamental, presumably incompatible, changes to its type, class, loader, and compilation system. Its existing semantics of recursive linking deeply depends on deferred checks.

To keep this discussion focused, please refrain from discussing mutual recursion here (as I did above).

Hm, you create an issue in reply to a quote I made about recursive linking which you now say is unrelated and tell me to not reply to your misinterpretation of that quote? What is the point you want to make anyway?

@RossTate
Copy link
Contributor Author

RossTate commented Jun 8, 2021

Quoting from WebAssembly/meetings#795:

  • Keep conversations focused
    Threads have different purposes such as brainstorming, bikeshedding,
    consensus seeking, etc. Create new threads for new topics or new purposes
    of discussion.

I recognized that, although there is a relationship between deferred loading and mutual recursion, it would be worthwhile to discuss deferred loading separate from mutual recursion (and then reference that in the mutual recursion thread). I gave an introduction here to give context for the discussion, such as the connection to mutual recursion while also making it clear that elaborations on that connection would be conducted in the appropriate thread, and then made a proactive effort to make sure that everything past that introductory context is not on mutual recursion.

What is the point you want to make anyway?

Quoting from above:

This post is about (lack of) support for deferred loading (specifically of type definitions). I illustrated how we could enable an application to implemented their own deferred loading policy (in, say, the JS API) by separating type declaration and type definition. But I also illustrated how this would be problematic for rtt.canon. Those are the topics of this thread.

Do you have anything to add on topic? (Your comment above is on mutual recursion, and so I'll refrain from correcting it in order to keep this discussion focused on the topic of deferred loading.)

@sjrd
Copy link

sjrd commented Jun 8, 2021

No it can't, not without fundamental, presumably incompatible, changes to its type, class, loader, and compilation system. Its existing semantics of recursive linking deeply depends on deferred checks.

Perhaps if you want to preserve the liberty of dynamically loading classes through a class loader. But if you accept to get rid of class loaders, one can definitely statically, eagerly link a Java classpath while preserving its type system and class sytem. That's what all existing compilers from JVM languages to JS do. And they can statically report linking errors.

@rossberg
Copy link
Member

rossberg commented Jun 9, 2021

@RossTate:

Do you have anything to add on topic?

Well, I'm at a loss what exactly the topic is. Clearly, a type-checking linking mechanism cannot support deferral without the use of casts. You don't even have a concrete type at hand before linking its defining module, so what would an RTT even be referring to? Once you actually need to access a class, and you'll need to load the module, which also gives you access to the concrete type and their RTTs if needed and allow you to perform a suitable downcast.

But you seem to be pondering over a rather different setting, and again I can't help the impression that it's based on various hypotheticals and assumptions that have little to do with how Wasm works today.

@rossberg
Copy link
Member

rossberg commented Jun 9, 2021

@sjrd, I agree, if you only care about whole-program linking at a single point of control, then you can work around the limitation. However, Wasm's module system is supposed to support layering and incremental linking (see e.g. the module-linking proposal), and then the approach of inverting dependencies already falls apart.

@RossTate
Copy link
Contributor Author

RossTate commented Jun 9, 2021

Clearly, a type-checking linking mechanism cannot support deferral without the use of casts. You don't even have a concrete type at hand before linking its defining module, so what would an RTT even be referring to?

Quoting from the OP's paragraph on rtts:

On the other hand, instructions like rtt.fresh do respect type abstraction and can be used even when part of the relevant type is a type variable (unknowingly) representing a (presently) undefined typed.

To clarify, rtt.fresh works even if the type you're creating a fresh rtt for is itself a type variable (unknowingly) representing a (presently) undefined typed.


Well, I'm at a loss what exactly the topic is.

Quoting from above (again):

This post is about (lack of) support for deferred loading (specifically of type definitions). I illustrated how we could enable an application to implemented their own deferred loading policy (in, say, the JS API) by separating type declaration and type definition. But I also illustrated how this would be problematic for rtt.canon. Those are the topics of this thread.

If it helps to offer more direction, one thing I would appreciate you contributing on the above topics is confirmation or denial that, whereas rtt.fresh can work with still-to-be-defined types, rtt.canon would not work well. Or, similarly, it would be useful for you to illustrate how you would support the deferred-loading example in the OP (ideally extended with the methods getb and setb) so that we could get insight into your plan for deferred loading.

@tlively
Copy link
Member

tlively commented Jun 10, 2021

@RossTate, I think it would help to make this discussion more concrete if you could share an example WebAssembly program demonstrating the split you have in mind between type declarations and definitions and how that split enables safe deferred loading.

@RossTate
Copy link
Contributor Author

Sure thing. How advanced of an example do you think would be useful? Or do you have an example Java-ish class and deferred-loading policy whose compilation you would find illuminating? (Depending on what functionality it has, I might have to provide a sketch of staging.)

P.S. I'm off tomorrow, so I probably won't be able to provide a compilation until Friday.

@tlively
Copy link
Member

tlively commented Jun 10, 2021

Let's start simple and take it from there :) The class Foo you used in the OP along with the getb() and setb() methods and a separate module defining Bar to be loaded eagerly or lazily might be good. No rush, of course!

@RossTate
Copy link
Contributor Author

Got it. That I could write it up real quick, so here's the "surface" code for Foo:

import Bar;
public class Foo {
    private Bar b;
    public Foo() { b = null; }
    public Bar getb() { return this.b; }
    public void setb(Bar b) { this.b = b; }
}

Here's the WebAssembly module for Foo (apologies for syntactic errors):

(module $Foo
    (type $Bar (import "Bar"))
    (global (import "__Bar_rtt") (rtt $Bar))
    (define-type $Foo (import "Foo")
        (struct
            (field $vtable (ref $FooVTable))
            (field $b (mut (ref null $Bar)))
        )
    )
    (global (import "__Foo_rtt") (rtt $Foo))
    (type $FooVTable
        (struct
            (field $getb (func (param $this (ref $Foo)) (result (ref null $Bar))))
            (field $setb (func (param $this (ref $Foo)) (param $b (ref null $Bar))))
        )
    )
    (func $Foo.getb (param $this (ref $Foo)) (result (ref null $Bar))
        (struct.get $b (local.get $this))
    )
    (func $Foo.setb (param $this (ref $Foo)) (param $b (ref null $Bar))
        (struct.set $b (local.get $this) (local.get $b))
    )
    (global $__Foo_vtable_rtt (rtt $FooVTable) (rtt.fresh $FooVTable))
    (global $__Foo_vtable (ref $FooVTable) (struct.new_with_rtt (global.get $__Foo_vtable_rtt) (ref.func $Foo.getb) (ref.func $Foo.setb)))
    (func (export "__new_Foo") (result (ref $Foo))
        (struct.new_with_rtt (global.get $__Foo_rtt) (global.get $__Foo_vtable) (ref.null $Bar))
    )
    ... ;; exports of the relevant field references for the methods $getb and $setb
)

The definition of Bar is not particularly important; what matters is that its module would roughly have the following shape:

(module $Bar
    (define-type $Bar (import "Bar") ...)
    (global (import "__Bar_rtt") (rtt $Bar))
    ...
)

If one were to link these statically, the glue module would look like the following:

(module $__glue
    (declare-type $Foo)
    (global $__Foo_rtt (rtt $Foo) (rtt.fresh $Foo))
    (declare-type $Bar)
    (global $__Bar_rtt (rtt $Bar) (rtt.fresh $Bar))
    (instance $Foo (import "Bar" (type $Bar)) (import "__Bar_rtt" (global.get $__Bar_rtt)) (import "Foo" (type $Foo)) (import "__Foo_rtt" (global.get $__Foo_rtt)))
    (instance $Bar (import "Bar" (type $Bar)) (import "__Bar_rtt" (global.get $__Bar_rtt)))
    ...
)

Important here is that declare-type and define-type are complementary. To validate $__glue, the type-checker ensures that each declare-type is given to (at most) one define-type (import).

For deferred loading, the JS API would have some WebAssembly.declareType() function that declares (but does not define) a type. The JS wrapper of that type is guarded by a boolean that makes sure that it is (dynamically) given to at most one definable-type import. Similarly there would be some WebAssembly.rttFresh(Type) function that generates a fresh rtt. A deferring loader for this example would call WebAssembly.declareType() twice to get jsBar and jsFoo, passes each of those to WebAssembly.rttFresh, and then passes jsBar and jsFoo and the generated rtts as imports to instantiate $Foo (without having loaded $Bar). When $Bar eventually gets loaded, the loader would then pass jsBar and its previously generated rtt as imports to instantiate $Bar.

I'm aware I'm sweeping a few things under the rug, but what remains should be all boilerplate or uninteresting for the discussion, and the example is already long as is.

@tlively
Copy link
Member

tlively commented Jun 18, 2021

I personally haven't had the bandwidth to read and understand this sample code yet, and I'm guessing that others are in the same boat. Perhaps a walk through of this example and discussion of its implications would be a good candidate for an upcoming meeting.

@RossTate
Copy link
Contributor Author

RossTate commented Jun 26, 2021

Looks like I have enough time to make slides for this. I'll plan on just doing enough to give context on deferred loading, then focus on specifically deferred loading of types, giving this example using declare-type and define-type. Shouldn't take long (10 min?).

@titzer
Copy link
Contributor

titzer commented Jun 28, 2021

I wrestled with all these issues and more implementing Jawa, a complete prototype JVM on top of Wasm.

The lesson I took away from that is that RTTs should not be exposed across module boundaries, because expecting lowered code to interact across module boundaries always falls short. You cannot faithfully implement Java's linking model this way, and you can't trust lowered code because it can cheat. I wrote a whole report about this a year ago and shared it and gave a presentation on it.

Jawa moved the deferred linking of Java types up into the Jawa runtime system. Ross hit on the obvious solution for breaking the recursion between types which is to separate the declaration from the definition. In Jawa, all Java types are referenced and defined by importing them from the jawa module and using their JVM name, including an import form that was powerful enough to define the structure of new classes. The Jawa runtime system had additional capabilities to define new Wasm types via a special API to the Wasm engine. In short, it requires JITing new types.

I think continued reasoning about how to make lowered, separately-compiled Java modules interoperate just with type imports is futile. Without building a runtime that actually does this, it is more reasoning in a vacuum.

Again, I think RTTs are an implementation detail that will eventually be hidden by the proper ability to abstract types across module boundaries. That said, I don't seen how rtt.canon is any worse for modularity than rtt.fresh--unless you expect that rtt.fresh can be applied to incomplete types, which is a whole can of worms, because now type definitions are really stateful. But in either case, it's a runtime operation that is akin to just allocating an instance of that type, anyway. I don't see where Java code would need to use an RTT at runtime before an actual instance of that type has ever been allocated.

@RossTate
Copy link
Contributor Author

Glad to get your thoughts, @titzer! I view this discussion as a first step towards support for things like Jawa. I think the issue you bring up with RTTs is valid, and you made me realize that I was sloppy in my discussion of them above, so I should clarify.

I feel like I've mentioned this before, though I can't remember where. RTTs in the current MVP are serving two distinct purposes: layout description (for the garbage collector) and casting (for the dynamic type system). These two purposes can be separated, and doing so is useful for the example above because the layout description cannot be established before the type is defined but the casting structures can be.

Such a separation could look like follows:

  • caster.root : [] -> [(caster anyref)] and caster.extend t : [(caster t)] -> [(caster t)], where caster is covariant with respect to t
  • rtt.new t : [(caster t)] -> [(rtt t)] where the definition of t must be available (and rtt is invariant with respect to t)
  • struct.new : [(rtt t) ... fields of t ...] -> [(ref t)] where the definition of t must be available

rtt.new needs the definition of t to be available because the rtt it returns needs to encode the layout description of the object so that the garbage collector can know how to walk it. struct.new needs the definition of t to be available so that it knows what fields need to be provided (punting separate compilation of subclasses for now).

But caster.extend t (the analog of rtt.sub_fresh) does not need to know anything about t. It just makes a magic number associated with t and does not even need t to be defined (nor does it need to do anything when t eventually is defined). The type system above ensures that references whose rtt contains that magic number must have (at least) type t, making the following instruction sound:

  • ref.cast : [(caster t) anyref] -> [(ref t)]

So, in the example above, rather than importing rtts, the modules should be importing casters (which the glue module would create via (caster.extend $Foo/$Bar (caster.root)) rather than rtt.fresh). In particular, even if a type that a module depends on has not been defined, the module can still compile code that needs to downcast to that type (with a particular caster). And while each module would allocate the rtt for the class it is defining (which, in a good design, would also contain the v-table inside the rtt) using the relevant caster, they would not export that rtt, instead just exporting the constructor function.

So, while I do see a need for importing/exporting casters, I agree that I don't see a real need for importing/exporting rtts (or at least not until we get to the point of both being able to store v-tables in rtts and being able to separately compile subclasses).

@rossberg
Copy link
Member

RTTs in the current MVP are serving two distinct purposes: layout description (for the garbage collector) and casting (for the dynamic type system).

This seems to be the source of some reoccurring misunderstanding.

The primary purpose of RTT's is neither to serve the GC (that's an implementation detail) nor is it to implement source-level casting (what I called "piggy-backing" on the Wasm casting mechanism).

  1. Types describe data layout. They are tracked in the type system so that access to data does not involve runtime checks.

  2. But Wasm's type system isn't always powerful enough (and never can be), so runtime checks are still necessary sometimes. That is expressed through a separate mechanism: casts. Their only purpose is to rediscover layout information where the type system was not able to track it and thereby maintain the check-free access mechanism.

Everything else, especially implementing possible source-level casts, is not what motivates the cast mechanism, at least not for the MVP. If piggy-backing works, you are lucky, but in most cases it doesn't, neither with nor without the addition of something like rtt.fresh (because that's way too limited for anything but Java). We'll need at least Wasm-level generics to make that possible for a relevant set of languages (yes, short of that you could use a hybrid approach, but it may not be worth it).

@RossTate
Copy link
Contributor Author

If that is the case, then what I said about rtt.fresh (and rtt.sub_fresh) working with partially undefined types, and thereby compatible with deferred loading (unlike rtt.canon and rtt.sub), was correct. (Also, I made no comment about surface-level casting, though the fact that rtt.fresh and rtt.sub_fresh supports "piggybacking" is an added perk.)

Could you speak to rtt.canon versus deferred loading of types?

@tlively
Copy link
Member

tlively commented Jun 30, 2021

My understanding is that the primary reason to defer the loading of type definitions is to avoid duplicating the description of their structures in the multiple modules that make up a program. Personally, I don't think this should be a priority for the GC MVP. As long as we have the ability to split functions out of GC modules and more generally have the ability to dynamically link modules that share types, I will be happy even if that means that type descriptions have to be duplicated between modules.

I would be happy to investigate de-duplicating type section contents between modules after the MVP, possibly as part of a more general staged import feature. For now, I am satisfied that it should be possible to make improvements in this area in the future.

@rossberg
Copy link
Member

An instruction like rtt.fresh is only useful for piggy-backing. It's irrelevant for dealing with layout, because, well that only cares about layout.

I can see what you are getting at with your declare/define construct. But as pointed out during your presentation yesterday, for anything but the degenerate case of the example you give (which does not access b or Bar at all, so conveniently does not require any layout knowledge), this is far from sufficient. The real problems only start there. To actually make deferred linking work for real examples you will need at least staged compilation. In fact, staging is not enough, since you want to run code before all dependencies are available. For that to be efficient, it requires full-blown jitting and more, or much richer engine/host/runtime support -- @titzer keeps pointing that out. Waaaay outside the scope of the MVP.

Also, as somebody who's actually written module papers with exactly a mechanism like the one you sketch, I am not convinced this would even be the right piece in the puzzle. I see multiple problems with it, both technical and practical. But I won't go into them, since I want to resist diverging yet more time and energy into fruitless hypothetical discussions. I want to focus on shipping an MVP, that's already difficult enough.

@RossTate
Copy link
Contributor Author

RossTate commented Jul 1, 2021

My understanding is that the primary reason to defer the loading of type definitions is to avoid duplicating the description of their structures in the multiple modules that make up a program.

Hmm, sorry, I must have miscommunicated. The primary reason to defer loading of type definitions is so that the bootstrapping module that kicks off your program does not have to contain the entire contents of all type definitions reachable from your main program.

Personally, I don't think this should be a priority for the GC MVP.

That's a fine take for the MVP. But even if we choose not to prioritize it for the MVP, it is useful to know which of the viable approaches for the MVP are forwards compatible with this feature. What I'm pointing out is that rtt.canon is not compatible (in the sense that no one using deferred loading of types would be able to use it) whereas rtt.fresh is compatible. I feel like we should be able to establish consensus on that technical property even if not on what action to take from such an observation. I'll try to focus the discussion to that end.

An instruction like rtt.fresh is only useful for piggy-backing.

Casting is used to dynamically check that some object satisfies an invariant. For example, consider the following code:

static Bar getBarFromArray(Bar[] bars, int i) { return bars[i]; }

There is no surface-level cast in this code. And this static method could be inside Foo, which has no need to know Bar's layout, but does need to ensure that, if it claims to return a Bar, then it actually does return a Bar so that other code that both calls this method and does know Bar's layout can be assured the returned value has the appropriate layout.

The issue is that, since we do not support generics, the Bar[] lowers to an array of anyref (or the like), and as such Foo's compiled implementation of getBarFromArray needs to cast the value returned by bars[i] to Bar in order to circumnavigate the limitations of wasm's type system. An rtt generated by rtt.fresh can do this (provided Bar instances are allocated using that same rtt). That is, Foo can use an rtt generated by rtt.fresh to dynamically ensure an anyref satisfies the invariant associated with Bar's type without knowing what that invariant ensures, and this code can be compiled before Bar has been loaded.

This example may seem silly because there won't be any Bar instances that can be successfully cast until Bar is loaded. But the same issue comes up when compiling code like

static Foo getFooFromArray(Foo[] foos, int i) { return foos[i]; }

Here we run into the same issue where we have to do a low-level cast due to the lossy lowering of arrays. But in this case, it's perfectly possible for there to be Foo instances that can successfully be cast before Bar has been loaded.

This is possible to support with rtt.fresh, which can successfully generate an rtt for an abstract/(partially-)undefined typed. However, it is not possible to support with rtt.canon, which cannot generate an rtt until the type is completely defined.

Can we come to consensus on that last statement?

@tlively
Copy link
Member

tlively commented Jul 1, 2021

My understanding is that the primary reason to defer the loading of type definitions is to avoid duplicating the description of their structures in the multiple modules that make up a program.

Hmm, sorry, I must have miscommunicated. The primary reason to defer loading of type definitions is so that the bootstrapping module that kicks off your program does not have to contain the entire contents of all type definitions reachable from your main program.

I think we're saying the same thing here. The alternative to deferred loading of type definitions is to have all the reachable type definitions duplicated in both the bootstrapping module and any secondary modules.

This is possible to support with rtt.fresh, which can successfully generate an rtt for an abstract/(partially-)undefined typed. However, it is not possible to support with rtt.canon, which cannot generate an rtt until the type is completely defined.

Can we come to consensus on that last statement?

That makes sense to me, but I would still like to defer consideration of this to post-MVP. Supposing hypothetically that we end up with rtt.canon in the MVP due to other considerations, we could also add rtt.fresh in a follow-up if we find that it is necessary for this feature.

@tlively
Copy link
Member

tlively commented Feb 17, 2022

Closing this issue because the MVP type system has been settled and the latest conversation about RTTs can be found at #275. I can imagine that we might want to do something to try to eliminate some of the type definition duplication that will be necessary when splitting modules, but I don't think there's interest in solving that problem in the MVP.

@tlively tlively closed this as completed Feb 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants