-
Notifications
You must be signed in to change notification settings - Fork 74
Supporting mutual recursion across module boundaries #220
Comments
The simplest possible example would be two modules like this:
The most direct way to represent these in Wasm would be:
A client could link these by locally duplicating the recursive type definitions and supplying them to each other:
But this requires types being structural as well as supporting "recursion-after-the-fact", which we only have with equi-recursive types, as I mentioned in my presentation. With iso-recursive types, we'd be forced to replace the imported types with anyref and use strategic casting. Another possible translation would be to already duplicate the types in each module as a kind of forward declaration:
Arguably, this isn't really "mutual recursion across module boundaries" anymore. But it works as long as types are structural, including with iso-recursive types. The obvious downside of this approach is that it generally requires duplicating more types (and possibly type imports those types refer to). Depending on how each forward type is used in the remainder of the module, we can get away with a supertype for the forward declaration (at least under an equi-recursive regime), but if it occurs in positive position anywhere (e.g., as a result), we'll have to duplicate the entire type. This is not the case for the import version, where the import bound would only have to cover what's needed locally. Needless to say that none of this works with nominal heap types. With those, you'd inevitably have to define all mutually recursive types in a single module. The only reason why cross-classfile recursion between nominal types works in languages like Java or C# is that they do not have type-safe linking and instead, their semantics conceptually performs a runtime type check on every object access. And those runtime checks cannot be eliminated without giving up on Wasm's AOT compilation model (and without other fundamental extensions to Wasm). |
Thanks, @rossberg!
Can you explain this scheme more? By "locally duplicating" the types, do you mean pulling them out into a separate module to instantiate first and rewriting the two original modules to import the types from that new module? Can you explain how "recursion-after-the-fact" comes into this?
My intuition is that for many use cases where the type export is not meant to hide anything, the necessary type bound will be very close to the actual type because most of the type would end up being needed locally. Does that match your intuition? As the necessary bound becomes more precise, the necessary type duplication approaches the full duplication that is the worst case of the forward declaration approach, right?
This is not needless to say because not everyone has been convinced of this, myself included ;) Once we're all on the same page about the assumptions we're making about the problem space, I'd like to dive into this more.
I agree that a linking approach as dynamic and unsafe as the approaches used by Java and C# would be inappropriate for Wasm 👍 |
Following up on @tlively's request for clarification, could you provide a more practical and complete example? In #217, I gave source code from a realistic surface language (e.g. Java). The example you provided has "coincidences" that are exemplary of taking code that was compiled all together and then artificially splitting it into separate modules. Once we're on the same page, then I can go into how nominal types work and how the issues you've raised with Java and C# have nothing to do with nominal typing. |
I am puzzled about the comment about nominal types:
|
I'm actually questioning myself if mutual recursion across module boundaries is something Wasm should support. Are there languages that need this? As far as I know, most (if not all) functional languages only support recursive data types inside one module and do not support cyclic imports ^[1]. But I don't know how this is for object oriented languages... [1]: GHC does, but in a limited way and it is discouraged. |
Languages like Java and C# support mutual recursion across classes and also use classes as their units of linkage. I think we can safely assume that we will want to support separate compilation of Java programs to WebAssembly as well, and I am assuming that Java implementers will want to treat either classes or source files as the separate compilation units. That means that mutual recursion will need to be supported across compilation units, but whether those compilation units must necessarily correspond to separate Wasm modules (rather than e.g. Wasm object files) is an open question that we need implementer feedback to answer. For the sake of this discussion, let's assume we have a requirement that Wasm modules support cross-module mutual recursion and see where we get, but let's also keep in mind that the results will factor into our balancing of trade offs rather than automatically disqualify any particular type system. |
While we wait for a more realistic example, I'll follow up on #224 and illustrate how nominal type systems can easily support separate compilation and mutual recursion. The key idea in #224 is the separation of type declaration from type definition. Once you've done this, it's really easy to support separate compilation and mutual recursion. To see why, the following expands upon the example in #224:
Like with Note that I declared both the fields of As in #224, I haven't gone into advanced cases such as where |
Hm, what are you asking specifically? It's precisely what I showed in the code snippet doing the linking, which would work as is. No rewriting of, or importing in, the original modules is needed if the types are structurally (equi-recursively) equivalent. The linking module merely duplicates the definitions from each module (which it'll have to do anyway to some extend for the purpose of link-time checking).
Not necessarily. A bound would only need to specify what's used by the module itself. E.g., in a trivial example like Ross's, where it doesn't even access the other class, the bound could be entirely empty. Or if a module imports an instance type with 20 fields, but only uses the first field, then it doesn't need to enumerate the other 19, nor define or import any of the types needed to define them. This is transitive, so if you're lucky, it can make an exponential difference. In other cases it can make none. You can potentially do more pruning if you take advantage of co/contravariance. But of course it's hard to predict how effective this would be in practice.
In what sense is this not a realistic example? I can write this almost literally in, say, Haskell:
Are you implying that only Java examples count as realistic? In any case, this example is enough to demonstrate my claim that cross-module type recursion cannot generally be expressed without equi-recursion.
Well, the nature of nominal types is that you cannot duplicate them. And when you split them between two modules, then each of them would need to import from the other. That would require recursive linking: you'd need an instance of module $B to extract $u and instantiate $A with and simultaneously an instance of $A to extract type $t and instantiate $B. Deadlock. You can't do that in Wasm. And in languages where you can it's usually not type-safe.
To be precise, we need to be able to compile source-level cross-module recursion. But as always, this can be achieved with casts. So there certainly is no requirement that Wasm itself supports cross-module recursion. In any case, that would be far outside the scope of the current proposal -- recursive modules are an order of magnitude more complicated than non-recursive ones, and a deep can of worms, especially if you want them to remain type-safe.
It seems you are diverging into hand-wavy hypotheticals again. There is nothing "easy" about this at all. What do you mean by importing an "obligation" to define a type? How would it propagate through JS? It almost sounds like you are talking about a semantics like this. Because that is what you would need to make this kind of idea precise and sound. But that is the essence of recursive linking, and considerably more complicated than anything we are willing to swallow right now, perhaps ever. And of course, your suggestion still runs afoul the export/import inversion problem, so does vastly alter the dependency topology of the source class graph. That would e.g. show as soon as you were to try to do layering or faithfully support dynamic linking with it (and fail). It's not an adequate nor a scalable solution. |
Your "glue" module contains the entire type definitions of both modules. So it is not clear that that is a realistic example of separate compilation. If you're willing to have the "glue" module contain the type definitions, then you can have the "separate" modules just import (constrained) types from the glue module rather than define the types themselves. This would work both nominally and equirecursively.
For one, I believe Haskell's
Thus So it seems that GHC uses the strategy I suggested, at least at a high-level. Also, I noticed that the paper you linked to makes a proactive effort to avoid using equi-recursive types. It says the "equational theory is not fully understood" for equi-recursive type constructors. Maybe those limitations have since been addressed, but it seems difficult to claim equi-recursive types are necessary for mutual recursion when such works on mutual recursion actively avoid using them.
In addition to
The JS wrapper would guard it with a boolean. When you provide the wrapper as an import to a wasm module, the JS API would check the boolean to see it had already been used. If not, the JS API would mark it as used. If it had been, it would be an error. This ensures it gets defined at most once. This wouldn't ensure that it does get defined. But that's fine. Everything works correctly even if the type never gets defined. That's necessary to support deferred loading of types because the type's definition may never get loaded. UPDATE: Accidentally clicked the button before finishing.
Right now you are forcing everyone to swallow equi-recursive types even though no one's currently trying to support mutually recursive separate compilation. With a nominal type system, we could avoid all the complexity of equi-recursive types, and we would not need to add support for deferred loading or mutually recursive separate compilation until people want it. And if that time comes, what I described above seems much easier to implement, much more useful (as it also supports deferred loading, not just mutually recursive separate compilation), and much more efficient than equi-recursive types. |
There are a lot of sub-discussions here, but I want to pull the focus back to the central claim that only equirecursive typing supports mutual recursion across modules. @rossberg, you wrote:
I still don't see how you arrive at this conclusion, so there must be some difference in the implicit assumptions we're making. Here's how this same example could be expressed using nominal types and
The full type graph is included in both modules, just as it would have to be included in both modules if using equirecursive types in the case where the bounds could not be made less precise. @rossberg, besides the possibility of more general type bounds, am I missing something in this example that equirecursive types supports or provides but the |
@rossberg, I just realized that we have been ignoring the parenthetical attached to your original statement (emphasis added):
If we consider all new linkage mechanisms like |
@tlively, thanks for emphasising what I actually said!
Technically, I don't think that mechanism would be nominal types -- certainly not in the established sense of generative type definitions. Nobody has worked out a proper semantics for this idea (there is no precedent for something like it AFAICT (*)), so it's difficult to say what it actually would be. For example, if I externally project corresponding types from each module, are they equivalent or not? Note that you don't have their importexport names at this point anymore, because they're not part of the types themselves (and if they were, we'd be back to an anti-modular global type namespace). If this is coherently definable at all, I believe it's ultimately some hybrid that is more structural than nominal in nature, though in some odd way tied to module boundaries (which makes me wary, because that typically breaks composability). But even then I don't see how it would solve the recursive linking problem without real recursive linking in Wasm. Concretely, how would you link your two modules given just Wasm's module instantiation constructs? (*) Incidentally, I have co-authored a paper with a similar mechanism, where a module's public type members are indeed of this dual import/export nature and can be merged through linking. This is the only system I am aware of that has something similar to this idea. But there, (1) importexport only applies to type definitions that are equivalent a priori, (2) it depends on the complicated type system machinery I have mentioned before, and (3) it immediately opens the can of worms of recursive linking and is not very useful without. Extending this semantics to nominal types in the way imagined would likely amount to extending that work with what the module literature calls "applicative functor semantics". I have written papers on that as well, and it's fairly complicated in it's own right. We have never figured out how to combine the two into one system without blowing up any reasonable complexity budget, even by advanced type theory standards.
Under any type-safe linking regime it generally has to anyway (at least partially), unless the client uses the types entirely opaquely like in your toy example, which represents the exception, not the common case or a realistic program.
The type system will want to ensure that (1) every type is defined, (2) no type is defined twice, and ideally (3) that no type is defined vacuously by itself. This is (more than) a linearity property, and checking linearity typically requires global tracking in the typing environment. See either of the papers I linked to. Not sure how you envision achieving this across module and language boundaries. In any case, this is highly non-standard, complex, and I think firmly out of scope.
To the contrary, I've proposed iso-recursion as an alternative, based on the argument that this is indeed not a pressing problem. All I pointed out was that if we cared, then equi-recursion would be the only solution known to work under Wasm's existing constraints. |
@rossberg, I responded to your questions and comments in #148 (comment) so as to not clutter this thread with the specifics of the |
So, since @rossberg has clarified that he expects the glue/client module to need to have full knowledge of the type definitions inside the mutually recursive modules, here is how one can encode his Haskell example with nominal types (in the following, I use the same syntax as the MVP, but only assume a very weak type-equivalence relation rather than introducing specialized syntax for nominal types):
Note that If there's any concern about the use of the Thus it is my belief that we have yet to have an example that equirecursive types supports that nominal types does not also support. |
@RossTate, this inverts the import/export dependencies of the program, so is not a faithful translation of the modular structure. You are essentially assuming a whole-program transformation. |
Why should faithful translation of the modular structure be a requirement? |
Also, I disagree that this requires a whole-program transformation for the same reason I gave in #148 (comment). Each strongly-connected component in the type dependency graph (e.g. each separate library) can have its mutually recursive types factored out into a provider module with no knowledge of the downstream users of that strongly-connected component. |
Adding to @tlively's points. First, as I suggested above, if we added support for separation of type declaration and type definition, we can eliminate the inversion:
Second, even if equi-recursive types were the only way to avoid inversion for mutually recursive Haskell modules, Haskell also requires fixpoints of higher-kinded types, and my understanding of the current state of the art of equi-recursive types is that this extension is believed to make type equivalence (at least) super-exponential-time with respect to the size of the types. Given those tradeoffs, I imagine we would accept module inversion over super-exponentional-time type-checking (though, to reiterate, I believe there are options besides equi-recursion that can avoid module inversion). |
As a note for future readers, the bulk of conversation on the inversion of exports into imports has been taking place on #148 starting around here: #148 (comment). |
We've settled on the isorecursive type system in #243 as our solution for sharing types across modules, so I'll close this issue. |
In #217 (comment), @rossberg wrote that of all the type systems we have considered, only equirecursive types allow for mutual recursion across module boundaries without casts. I'm not personally sure of this, so it would be good to work this out and make sure everyone is on the same page. To start off by making sure we're all talking about the same thing, @rossberg, can you share an example of the kind of mutual recursion you're thinking of?
The text was updated successfully, but these errors were encountered: