-
Notifications
You must be signed in to change notification settings - Fork 74
Problems for deferred loading #224
Comments
I did not claim that. I said that recursive linking with nominal types in Java/C# works because they accept unsafe linking. Deferred loading indeed is the key here: the recursive linking semantics relies on the ability to defer loading (and thus checking), plus, a sort of global name space for class names. In Wasm we have neither. Moreover, the "unsoundness" (I didn't actually use that word, Java semantics is sound due to dynamic checks, it just isn't statically type-safe) I'm referring to is not the possibility of a Standard example:
Compile A, compile B, run B. All fine. Now notice that you prefer the method name
It's folklore knowledge that, as far as the class system is concerned, Java essentially is a dynamically-typed language. Hence we cannot expect its compilation model to map onto Wasm modules without casts, at least not without restricting the semantics. |
Sorry, I often put quotes around formal terms I'm aware I'm using very handwavily, which in this case was a poor choice because it comes across as a direct quotation from the referenced comment. I chose that term because it's concise and because I recalled you using it in meetings to describe Java linking. But you're right that you did not use it in that specific comment.
I will respond to this in the appropriate thread shortly (where we are otherwise waiting for you to respond to our requests to improve your example). The short is that, as pointed above, the "unsoundness" issue has nothing to do with nominal types and is entirely due to support for deferred loading. Java could just as easily load everything eagerly and trap if there are any mismatches (while still using a nominal type system) rather than letting the program run as long as it can until a mismatch is necessarily problematic (e.g. a To keep this discussion focused, please refrain from discussing mutual recursion here (as I did above). This post is about (lack of) support for deferred loading (specifically of type definitions). I illustrated how we could enable an application to implemented their own deferred loading policy (in, say, the JS API) by separating type declaration and type definition. But I also illustrated how this would be problematic for |
No it can't, not without fundamental, presumably incompatible, changes to its type, class, loader, and compilation system. Its existing semantics of recursive linking deeply depends on deferred checks.
Hm, you create an issue in reply to a quote I made about recursive linking which you now say is unrelated and tell me to not reply to your misinterpretation of that quote? What is the point you want to make anyway? |
Quoting from WebAssembly/meetings#795:
I recognized that, although there is a relationship between deferred loading and mutual recursion, it would be worthwhile to discuss deferred loading separate from mutual recursion (and then reference that in the mutual recursion thread). I gave an introduction here to give context for the discussion, such as the connection to mutual recursion while also making it clear that elaborations on that connection would be conducted in the appropriate thread, and then made a proactive effort to make sure that everything past that introductory context is not on mutual recursion.
Quoting from above:
Do you have anything to add on topic? (Your comment above is on mutual recursion, and so I'll refrain from correcting it in order to keep this discussion focused on the topic of deferred loading.) |
Perhaps if you want to preserve the liberty of dynamically loading classes through a class loader. But if you accept to get rid of class loaders, one can definitely statically, eagerly link a Java classpath while preserving its type system and class sytem. That's what all existing compilers from JVM languages to JS do. And they can statically report linking errors. |
Well, I'm at a loss what exactly the topic is. Clearly, a type-checking linking mechanism cannot support deferral without the use of casts. You don't even have a concrete type at hand before linking its defining module, so what would an RTT even be referring to? Once you actually need to access a class, and you'll need to load the module, which also gives you access to the concrete type and their RTTs if needed and allow you to perform a suitable downcast. But you seem to be pondering over a rather different setting, and again I can't help the impression that it's based on various hypotheticals and assumptions that have little to do with how Wasm works today. |
@sjrd, I agree, if you only care about whole-program linking at a single point of control, then you can work around the limitation. However, Wasm's module system is supposed to support layering and incremental linking (see e.g. the module-linking proposal), and then the approach of inverting dependencies already falls apart. |
Quoting from the OP's paragraph on rtts:
To clarify,
Quoting from above (again):
If it helps to offer more direction, one thing I would appreciate you contributing on the above topics is confirmation or denial that, whereas |
@RossTate, I think it would help to make this discussion more concrete if you could share an example WebAssembly program demonstrating the split you have in mind between type declarations and definitions and how that split enables safe deferred loading. |
Sure thing. How advanced of an example do you think would be useful? Or do you have an example Java-ish class and deferred-loading policy whose compilation you would find illuminating? (Depending on what functionality it has, I might have to provide a sketch of staging.) P.S. I'm off tomorrow, so I probably won't be able to provide a compilation until Friday. |
Let's start simple and take it from there :) The class |
Got it. That I could write it up real quick, so here's the "surface" code for
Here's the WebAssembly module for
The definition of
If one were to link these statically, the glue module would look like the following:
Important here is that For deferred loading, the JS API would have some I'm aware I'm sweeping a few things under the rug, but what remains should be all boilerplate or uninteresting for the discussion, and the example is already long as is. |
I personally haven't had the bandwidth to read and understand this sample code yet, and I'm guessing that others are in the same boat. Perhaps a walk through of this example and discussion of its implications would be a good candidate for an upcoming meeting. |
Looks like I have enough time to make slides for this. I'll plan on just doing enough to give context on deferred loading, then focus on specifically deferred loading of types, giving this example using |
I wrestled with all these issues and more implementing Jawa, a complete prototype JVM on top of Wasm. The lesson I took away from that is that RTTs should not be exposed across module boundaries, because expecting lowered code to interact across module boundaries always falls short. You cannot faithfully implement Java's linking model this way, and you can't trust lowered code because it can cheat. I wrote a whole report about this a year ago and shared it and gave a presentation on it. Jawa moved the deferred linking of Java types up into the Jawa runtime system. Ross hit on the obvious solution for breaking the recursion between types which is to separate the declaration from the definition. In Jawa, all Java types are referenced and defined by importing them from the I think continued reasoning about how to make lowered, separately-compiled Java modules interoperate just with type imports is futile. Without building a runtime that actually does this, it is more reasoning in a vacuum. Again, I think RTTs are an implementation detail that will eventually be hidden by the proper ability to abstract types across module boundaries. That said, I don't seen how |
Glad to get your thoughts, @titzer! I view this discussion as a first step towards support for things like Jawa. I think the issue you bring up with RTTs is valid, and you made me realize that I was sloppy in my discussion of them above, so I should clarify. I feel like I've mentioned this before, though I can't remember where. RTTs in the current MVP are serving two distinct purposes: layout description (for the garbage collector) and casting (for the dynamic type system). These two purposes can be separated, and doing so is useful for the example above because the layout description cannot be established before the type is defined but the casting structures can be. Such a separation could look like follows:
But
So, in the example above, rather than importing So, while I do see a need for importing/exporting |
This seems to be the source of some reoccurring misunderstanding. The primary purpose of RTT's is neither to serve the GC (that's an implementation detail) nor is it to implement source-level casting (what I called "piggy-backing" on the Wasm casting mechanism).
Everything else, especially implementing possible source-level casts, is not what motivates the cast mechanism, at least not for the MVP. If piggy-backing works, you are lucky, but in most cases it doesn't, neither with nor without the addition of something like rtt.fresh (because that's way too limited for anything but Java). We'll need at least Wasm-level generics to make that possible for a relevant set of languages (yes, short of that you could use a hybrid approach, but it may not be worth it). |
If that is the case, then what I said about Could you speak to |
My understanding is that the primary reason to defer the loading of type definitions is to avoid duplicating the description of their structures in the multiple modules that make up a program. Personally, I don't think this should be a priority for the GC MVP. As long as we have the ability to split functions out of GC modules and more generally have the ability to dynamically link modules that share types, I will be happy even if that means that type descriptions have to be duplicated between modules. I would be happy to investigate de-duplicating type section contents between modules after the MVP, possibly as part of a more general staged import feature. For now, I am satisfied that it should be possible to make improvements in this area in the future. |
An instruction like rtt.fresh is only useful for piggy-backing. It's irrelevant for dealing with layout, because, well that only cares about layout. I can see what you are getting at with your declare/define construct. But as pointed out during your presentation yesterday, for anything but the degenerate case of the example you give (which does not access Also, as somebody who's actually written module papers with exactly a mechanism like the one you sketch, I am not convinced this would even be the right piece in the puzzle. I see multiple problems with it, both technical and practical. But I won't go into them, since I want to resist diverging yet more time and energy into fruitless hypothetical discussions. I want to focus on shipping an MVP, that's already difficult enough. |
Hmm, sorry, I must have miscommunicated. The primary reason to defer loading of type definitions is so that the bootstrapping module that kicks off your program does not have to contain the entire contents of all type definitions reachable from your main program.
That's a fine take for the MVP. But even if we choose not to prioritize it for the MVP, it is useful to know which of the viable approaches for the MVP are forwards compatible with this feature. What I'm pointing out is that
Casting is used to dynamically check that some object satisfies an invariant. For example, consider the following code:
There is no surface-level cast in this code. And this static method could be inside The issue is that, since we do not support generics, the This example may seem silly because there won't be any
Here we run into the same issue where we have to do a low-level cast due to the lossy lowering of arrays. But in this case, it's perfectly possible for there to be This is possible to support with Can we come to consensus on that last statement? |
I think we're saying the same thing here. The alternative to deferred loading of type definitions is to have all the reachable type definitions duplicated in both the bootstrapping module and any secondary modules.
That makes sense to me, but I would still like to defer consideration of this to post-MVP. Supposing hypothetically that we end up with rtt.canon in the MVP due to other considerations, we could also add rtt.fresh in a follow-up if we find that it is necessary for this feature. |
Closing this issue because the MVP type system has been settled and the latest conversation about RTTs can be found at #275. I can imagine that we might want to do something to try to eliminate some of the type definition duplication that will be necessary when splitting modules, but I don't think there's interest in solving that problem in the MVP. |
In #220 (comment), @rossberg claimed that Java/C#'s linking was unsound due to their use of nominal types. Here I'll illustrate that the "unsoundness" is actually due to their use of and support for deferred loading. By deferred loading, I mean deferring loading/validating/compiling parts of a program to some time after the program has begun executing, say in order to achieve faster response time or to reduce resource consumption. There is a connection to mutual recursion in that support for deferred loading makes support for (sound) mutual recursion very easy, but I follow up on that specific point in #220 (comment). There is also a connection to nominal vs. structural typing, or more specifically type canonicalization, in that deferred loading does not work well with things like
rtt.canon
, which I'll illustrate here.To elaborate on deferred loading, consider the following (Java) class:
The question is, does class
someone.else.Bar
need to be loaded (e.g. fetched), compiled, and linked before we can compilemy.code.Foo
? In systems that support deferred loading, the answer is generally "no", at least supposingsomeone.else.Bar
is at least a reference type (I won't go into the more complicated things with C#'sstruct
types).This is perfectly sound. All code using
Foo
is guaranteed to respect the invariant that onlyBar
instances (andnull
) are ever assigned to the fieldb
even without knowing what it means to be aBar
instance. We could add methods likeBar getb() { return b; }
andvoid setb(Bar b) { this.b = b; }
and still compile these methods before loadingBar
. If this were Go, we could set up complex data structures with interior*Bar
pointers to fields ofFoo
instances, without loadingBar
, and be assured that—onceBar
gets loaded and defined—those*Bar
pointers will respect that definition.The key for supporting deferred loading (of types) at the low level is to separate type declaration from type definition. That way a linker/loader can just "declare" the type
my.code.Bar
, without defining it, then pass that declared type as a type import formy.code.Foo
, which can in turn use that imported type in its definition ofmy.code.Foo
and its relevant code/methods. Or, more precisely, in its relevant code/methods that treat it abstractly. But eventually something will need to treat the type more concretely, by which point the linker/loader will need to actually load the code intended formy.code.Bar
, compile it, and instantiate it in order to get a definition for the typemy.code.Bar
. When and how that happens is a matter of policy; the Java Language Specification indicates this must be done on demand, e.g. the exact time a field ofmy.code.Bar
first gets access, but there are many other policies, e.g. load in parallel and link once available. In my opinion, a good design for WebAssembly would not standardize a policy but instead provide the means for programs to implement their own policy.For example, one could compile
my.code.Foo
to a WebAssembly module that imports not just the type forsomeone.else.Bar
, but also the type formy.code.Foo
and the (linear) (not first-class) capability/obligation to define the type formy.code.Foo
—this way the module could itself be loaded after some other module abstractly using the typemy.code.Foo
has already started executing. Then, if someone wanted to use this module in eager-loading setting, they could simplify specify a module that declares the types for bothsomeone.else.Bar
andmy.code.Foo
and passes the types and the definition-capabilities to the relevant modules—all of this would resolve and validate statically, so there's no loss of guarantees for the eager case. But if someone wanted to use this module in a deferred-loading setting, their loader system (written in, say, the JS API) could declare the two types, pass them and the appropriate definition-capability to themy.code.Foo
module, and wait until later to load the module forsomeone.else.Bar
and pass it the appropriate types and definition-capability. Of course, when that time comes, something could go wrong: the code forsomeone.else.Bar
may turn out to be missing or have some error. In this case, the person would specify (again, in the JS API or the like) what to do—for Java, the appropriate thing to do would be to throw aNoClassDefFoundError
at whatever point in the code promptedsomeone.else.Bar
to be loaded. This is the "unsoundness" that @rossberg alluded to. But none of this makes WebAssembly unsound—it's just to-be-expected dynamic "errors" in someone's custom-written deferred-loading code.Now, for this to work well, it is very important that type imports be treated abstractly. That way code behaves correctly regardless of whether the type is still undefined or of how it gets defined. This is where
rtt.canon
(and the like) becomes problematic—it looks past the type import and digs up the type definition in order to determine whichrtt
to return. What happens if, in the module formy.code.Foo
, one asks for thertt.canon
of the imported type forsomeone.else.Bar
that is still undefined? Or if one asks for thertt.canon
of the type definition ofmy.code.Foo
, which is defined in terms of the still-undefinedsomeone.else.Bar
? We could makertt.canon
trap in these cases, but that still means that programs wanting to use deferred loading (of types) cannot utilizertt.canon
. On the other hand, instructions likertt.fresh
do respect type abstraction and can be used even when part of the relevant type is a type variable (unknowingly) representing a (presently) undefined typed.I haven't gone into more advanced use cases, such as letting the module for
my.code.Foo
defer the compilation of a method likedouble getxofb() { return b.x; }
—which accesses a field ofBar
—until after (the relevant parts of)someone.else.Bar
has been loaded. I believe staged modules, where imports/definitions/exports have stages associated with them so that the whole module does not have to compiled/instantiated all at one, would be a good fit for this (as, again, they are sound and do not force any particular loading policy), but I suggest deferring discussion of these more advanced use cases to a later time, understanding that they would be well served by the concepts above.The text was updated successfully, but these errors were encountered: