Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Supporting mutual recursion across module boundaries #220

Closed
tlively opened this issue May 26, 2021 · 20 comments
Closed

Supporting mutual recursion across module boundaries #220

tlively opened this issue May 26, 2021 · 20 comments

Comments

@tlively
Copy link
Member

tlively commented May 26, 2021

In #217 (comment), @rossberg wrote that of all the type systems we have considered, only equirecursive types allow for mutual recursion across module boundaries without casts. I'm not personally sure of this, so it would be good to work this out and make sure everyone is on the same page. To start off by making sure we're all talking about the same thing, @rossberg, can you share an example of the kind of mutual recursion you're thinking of?

@rossberg
Copy link
Member

rossberg commented Jun 2, 2021

The simplest possible example would be two modules like this:

// module A
struct t {x : B.u};
...
// module B
struct u {y : A.t};
...

The most direct way to represent these in Wasm would be:

(module $A
  (type $B.u (import "B" "u"))  ;; may need a bound depending on local use
  (type $t (export "t") (struct (field (ref $B.u))))
  ...
)
(module $B
  (type $A.t (import "A" "t"))  ;; may need a bound depending on local use
  (type $u (export "u") (struct (field (ref $A.t))))
  ...
)

A client could link these by locally duplicating the recursive type definitions and supplying them to each other:

(type $t (struct (field (ref $u))))
(type $u (struct (field (ref $t))))
(instance $A (import "B" "u" (type $u)))
(instance $B (import "A" "t" (type $t)))

But this requires types being structural as well as supporting "recursion-after-the-fact", which we only have with equi-recursive types, as I mentioned in my presentation. With iso-recursive types, we'd be forced to replace the imported types with anyref and use strategic casting.

Another possible translation would be to already duplicate the types in each module as a kind of forward declaration:

(module $A
  (type $B.u (struct (field (ref $t)))) ;; forward declare 
  (type $t (struct (field (ref $B.u))))
  ...
)
(module $B
  (type $A.t (struct (field (ref $u)))) ;; forward declare
  (type $u (struct (field (ref $A.t))))
  ...
)

Arguably, this isn't really "mutual recursion across module boundaries" anymore. But it works as long as types are structural, including with iso-recursive types. The obvious downside of this approach is that it generally requires duplicating more types (and possibly type imports those types refer to). Depending on how each forward type is used in the remainder of the module, we can get away with a supertype for the forward declaration (at least under an equi-recursive regime), but if it occurs in positive position anywhere (e.g., as a result), we'll have to duplicate the entire type. This is not the case for the import version, where the import bound would only have to cover what's needed locally.

Needless to say that none of this works with nominal heap types. With those, you'd inevitably have to define all mutually recursive types in a single module. The only reason why cross-classfile recursion between nominal types works in languages like Java or C# is that they do not have type-safe linking and instead, their semantics conceptually performs a runtime type check on every object access. And those runtime checks cannot be eliminated without giving up on Wasm's AOT compilation model (and without other fundamental extensions to Wasm).

@tlively
Copy link
Member Author

tlively commented Jun 2, 2021

Thanks, @rossberg!

...

A client could link these by locally duplicating the recursive type definitions and supplying them to each other:

(type $t (struct (field (ref $u))))
(type $u (struct (field (ref $t))))
(instance $A (import "B" "u" (type $u)))
(instance $B (import "A" "t" (type $t)))

But this requires types being structural as well as supporting "recursion-after-the-fact", which we only have with equi-recursive types, as I mentioned in my presentation.

Can you explain this scheme more? By "locally duplicating" the types, do you mean pulling them out into a separate module to instantiate first and rewriting the two original modules to import the types from that new module? Can you explain how "recursion-after-the-fact" comes into this?

...

Another possible translation would be to already duplicate the types in each module as a kind of forward declaration:

...

Arguably, this isn't really "mutual recursion across module boundaries" anymore. But it works as long as types are structural, including with iso-recursive types. The obvious downside of this approach is that it generally requires duplicating more types (and possibly type imports those types refer to). Depending on how each forward type is used in the remainder of the module, we can get away with a supertype for the forward declaration (at least under an equi-recursive regime), but if it occurs in positive position anywhere (e.g., as a result), we'll have to duplicate the entire type. This is not the case for the import version, where the import bound would only have to cover what's needed locally.

My intuition is that for many use cases where the type export is not meant to hide anything, the necessary type bound will be very close to the actual type because most of the type would end up being needed locally. Does that match your intuition? As the necessary bound becomes more precise, the necessary type duplication approaches the full duplication that is the worst case of the forward declaration approach, right?

Needless to say that none of this works with nominal heap types.

This is not needless to say because not everyone has been convinced of this, myself included ;) Once we're all on the same page about the assumptions we're making about the problem space, I'd like to dive into this more.

With those, you'd inevitably have to define all mutually recursive types in a single module. The only reason why cross-classfile recursion between nominal types works in languages like Java or C# is that they do not have type-safe linking and instead, their semantics conceptually performs a runtime type check on every object access. And those runtime checks cannot be eliminated without giving up on Wasm's AOT compilation model (and without other fundamental extensions to Wasm).

I agree that a linking approach as dynamic and unsafe as the approaches used by Java and C# would be inappropriate for Wasm 👍

@RossTate
Copy link
Contributor

RossTate commented Jun 2, 2021

Following up on @tlively's request for clarification, could you provide a more practical and complete example? In #217, I gave source code from a realistic surface language (e.g. Java). The example you provided has "coincidences" that are exemplary of taking code that was compiled all together and then artificially splitting it into separate modules.

Once we're on the same page, then I can go into how nominal types work and how the issues you've raised with Java and C# have nothing to do with nominal typing.

@fgmccabe
Copy link

fgmccabe commented Jun 2, 2021

I am puzzled about the comment about nominal types:

Needless to say that none of this works with nominal heap types. With those, you'd inevitably have to define all mutually recursive types in a single module.

  • Why not 'duplicate' nominal type definitions in each module (where the import is intended to be translucent). Then import would verify that the definitions are the same
  • Why is this worse than duplicating types as it would appear to be necessary with equi-recursive or iso-recursive definitions?

@timjs
Copy link

timjs commented Jun 3, 2021

I'm actually questioning myself if mutual recursion across module boundaries is something Wasm should support. Are there languages that need this? As far as I know, most (if not all) functional languages only support recursive data types inside one module and do not support cyclic imports ^[1]. But I don't know how this is for object oriented languages...

[1]: GHC does, but in a limited way and it is discouraged.

@tlively
Copy link
Member Author

tlively commented Jun 3, 2021

Languages like Java and C# support mutual recursion across classes and also use classes as their units of linkage. I think we can safely assume that we will want to support separate compilation of Java programs to WebAssembly as well, and I am assuming that Java implementers will want to treat either classes or source files as the separate compilation units. That means that mutual recursion will need to be supported across compilation units, but whether those compilation units must necessarily correspond to separate Wasm modules (rather than e.g. Wasm object files) is an open question that we need implementer feedback to answer.

For the sake of this discussion, let's assume we have a requirement that Wasm modules support cross-module mutual recursion and see where we get, but let's also keep in mind that the results will factor into our balancing of trade offs rather than automatically disqualify any particular type system.

@RossTate
Copy link
Contributor

RossTate commented Jun 7, 2021

While we wait for a more realistic example, I'll follow up on #224 and illustrate how nominal type systems can easily support separate compilation and mutual recursion. The key idea in #224 is the separation of type declaration from type definition. Once you've done this, it's really easy to support separate compilation and mutual recursion. To see why, the following expands upon the example in #224:

package someone.else;
import my.code.Foo;

public class Bar { public double x; private Foo f; }

Like with my.code.Foo, I can compile this to a WebAssembly module that imports not only the type for my.code.Foo but also the type for someone.else.Bar and the capability/obligation to define this type. Then, for an eager-loading system, I can define a module that declares the types for my.code.Foo and someone.else.Bar, passes the types to both modules and passes the definition-capability to the appropriate module. This will result in two mutually recursive definitions (and everything is sound).

Note that I declared both the fields of Foo and Bar that make them mutually recursive to be private. I did this to highlight the fact that no module in this system has to even know that mutual recursion is occurring. Unlike in @rossberg's example, even the "glue" module is unaware of the mutual recursion! This is indicative of how this approach offers better support for separate compilation. Even after the "glue" module has been compiled, one could add more fields to Foo or Bar, recompile either of them separately, and everything will continue to work (since the imports are the same and the exports have only increased).

As in #224, I haven't gone into advanced cases such as where Foo and Bar have methods that access (public) fields of each other, but as in #224 these advanced cases can be (soundly) addressed with things like staged modules (with or without deferred loading), which again I will defer elaborating on to keep things focused.

@rossberg
Copy link
Member

rossberg commented Jun 8, 2021

@tlively:

Can you explain this scheme more? By "locally duplicating" the types, do you mean pulling them out into a separate module to instantiate first and rewriting the two original modules to import the types from that new module?

Hm, what are you asking specifically? It's precisely what I showed in the code snippet doing the linking, which would work as is. No rewriting of, or importing in, the original modules is needed if the types are structurally (equi-recursively) equivalent. The linking module merely duplicates the definitions from each module (which it'll have to do anyway to some extend for the purpose of link-time checking).

My intuition is that for many use cases where the type export is not meant to hide anything, the necessary type bound will be very close to the actual type because most of the type would end up being needed locally.

Not necessarily. A bound would only need to specify what's used by the module itself. E.g., in a trivial example like Ross's, where it doesn't even access the other class, the bound could be entirely empty. Or if a module imports an instance type with 20 fields, but only uses the first field, then it doesn't need to enumerate the other 19, nor define or import any of the types needed to define them. This is transitive, so if you're lucky, it can make an exponential difference. In other cases it can make none. You can potentially do more pruning if you take advantage of co/contravariance. But of course it's hard to predict how effective this would be in practice.

@RossTate:

Following up on @tlively's request for clarification, could you provide a more practical and complete example?

In what sense is this not a realistic example? I can write this almost literally in, say, Haskell:

module A where
data T = T {x :: B.U}

module B where
data U = U {y :: A.T}

Are you implying that only Java examples count as realistic?

In any case, this example is enough to demonstrate my claim that cross-module type recursion cannot generally be expressed without equi-recursion.

@fgmccabe:

Why not 'duplicate' nominal type definitions in each module (where the import is intended to be translucent). Then import would verify that the definitions are the same

Well, the nature of nominal types is that you cannot duplicate them. And when you split them between two modules, then each of them would need to import from the other. That would require recursive linking: you'd need an instance of module $B to extract $u and instantiate $A with and simultaneously an instance of $A to extract type $t and instantiate $B. Deadlock. You can't do that in Wasm. And in languages where you can it's usually not type-safe.

@tlively:

For the sake of this discussion, let's assume we have a requirement that Wasm modules support cross-module mutual recursion and see where we get

To be precise, we need to be able to compile source-level cross-module recursion. But as always, this can be achieved with casts. So there certainly is no requirement that Wasm itself supports cross-module recursion.

In any case, that would be far outside the scope of the current proposal -- recursive modules are an order of magnitude more complicated than non-recursive ones, and a deep can of worms, especially if you want them to remain type-safe.

@RossTate:

I'll follow up on #224 and illustrate how nominal type systems can easily support separate compilation and mutual recursion.

It seems you are diverging into hand-wavy hypotheticals again. There is nothing "easy" about this at all. What do you mean by importing an "obligation" to define a type? How would it propagate through JS? It almost sounds like you are talking about a semantics like this. Because that is what you would need to make this kind of idea precise and sound. But that is the essence of recursive linking, and considerably more complicated than anything we are willing to swallow right now, perhaps ever.

And of course, your suggestion still runs afoul the export/import inversion problem, so does vastly alter the dependency topology of the source class graph. That would e.g. show as soon as you were to try to do layering or faithfully support dynamic linking with it (and fail). It's not an adequate nor a scalable solution.

@RossTate
Copy link
Contributor

RossTate commented Jun 8, 2021

In what sense is this not a realistic example?

Your "glue" module contains the entire type definitions of both modules. So it is not clear that that is a realistic example of separate compilation.

If you're willing to have the "glue" module contain the type definitions, then you can have the "separate" modules just import (constrained) types from the glue module rather than define the types themselves. This would work both nominally and equirecursively.

I can write this almost literally in, say, Haskell:

module A where
data T = T {x :: B.U}

module B where
data U = U {y :: A.T}

For one, I believe Haskell's data type (constructors) are nominal. But more importantly, @timjs already noted that Haskell compilers have unreliable support for mutual recursion of modules. In this document on how to support mutual recursion with GHC, it explains that to support your example (where A and B would need to be defined in separate .hs files) you also need to provide a .hs-boot file for at least one of the two modules. This file needs to provide an abstraction of the module (say A) that is not mutually recursive but is sufficiently precise to at least type-check the other module (B). Their example is almost identical to yours, so I can reuse the same A.hs-boot file they provide to fit your example:

module A where
data T

Thus B is linked to the abstract declaration of A's T type, and then in compilation of linking of A that abstract declaration gets instantiated with the definition provided by A.

So it seems that GHC uses the strategy I suggested, at least at a high-level.

Also, I noticed that the paper you linked to makes a proactive effort to avoid using equi-recursive types. It says the "equational theory is not fully understood" for equi-recursive type constructors. Maybe those limitations have since been addressed, but it seems difficult to claim equi-recursive types are necessary for mutual recursion when such works on mutual recursion actively avoid using them.

What do you mean by importing an "obligation" to define a type?

In addition to (import "..." "..." (type $t (sub func))), you could have (import "..." "..." (definable-type $t (....type definition....))) (or you could give the type definition elsewhere in the module - just a matter of syntactic encodings). The module linker would ensure that a declared type is only ever exported once as a definable-type and that any definable-type export is only ever given to one definable-type import.

How would it propagate through JS?

The JS wrapper would guard it with a boolean. When you provide the wrapper as an import to a wasm module, the JS API would check the boolean to see it had already been used. If not, the JS API would mark it as used. If it had been, it would be an error. This ensures it gets defined at most once.

This wouldn't ensure that it does get defined. But that's fine. Everything works correctly even if the type never gets defined. That's necessary to support deferred loading of types because the type's definition may never get loaded.

UPDATE: Accidentally clicked the button before finishing.

considerably more complicated than anything we are willing to swallow right now

Right now you are forcing everyone to swallow equi-recursive types even though no one's currently trying to support mutually recursive separate compilation. With a nominal type system, we could avoid all the complexity of equi-recursive types, and we would not need to add support for deferred loading or mutually recursive separate compilation until people want it. And if that time comes, what I described above seems much easier to implement, much more useful (as it also supports deferred loading, not just mutually recursive separate compilation), and much more efficient than equi-recursive types.

@tlively
Copy link
Member Author

tlively commented Jun 8, 2021

There are a lot of sub-discussions here, but I want to pull the focus back to the central claim that only equirecursive typing supports mutual recursion across modules.

@rossberg, you wrote:

module A where
data T = T {x :: B.U}

module B where
data U = U {y :: A.T}

...
In any case, this example is enough to demonstrate my claim that cross-module type recursion cannot generally be expressed without equi-recursion.

I still don't see how you arrive at this conclusion, so there must be some difference in the implicit assumptions we're making. Here's how this same example could be expressed using nominal types and importexport mechanism for weak definitions developed in #148.

;; A.wasm
(module
  (type $B.U (importexport "B" "U" as "B.U") (struct (ref $T))
  (type $T (importexport "B" "A.T" as "T") (struct (ref $B.U))
)
;; B.wasm
(module
  (type $A.T (importexport "A" "T" as "A.T") (struct (ref $U))
  (type $U (importexport "A" "B.U" as "U") (struct (ref $A.T))
)

The full type graph is included in both modules, just as it would have to be included in both modules if using equirecursive types in the case where the bounds could not be made less precise. importexport uses an imported type (using the module and name before as) if it is provided and otherwise defines a new type. importexport also exports the type (using the name after as) in either case. In this example A and B will end up agreeing on the identity of A.T and B.U no matter which of A and B is instantiated first.

@rossberg, besides the possibility of more general type bounds, am I missing something in this example that equirecursive types supports or provides but the importexport scheme does not?

@tlively
Copy link
Member Author

tlively commented Jun 9, 2021

@rossberg, I just realized that we have been ignoring the parenthetical attached to your original statement (emphasis added):

Only equi-recursive types can [handle any mutual recursion across module boundaries without casts] (in Wasm's given framework of AOT compilation and linking).

If we consider all new linkage mechanisms like importexport to be outside of "Wasm's given framework of AOT compilation and linking," then I think I agree with your statement. Is that what you had in mind when you wrote that? Would you consider importexport to meaningfully deviate from Wasm's current compilation and linking model?

@rossberg
Copy link
Member

rossberg commented Jun 9, 2021

@tlively, thanks for emphasising what I actually said!

Here's how this same example could be expressed using nominal types and importexport mechanism for weak definitions developed in #148.

Technically, I don't think that mechanism would be nominal types -- certainly not in the established sense of generative type definitions. Nobody has worked out a proper semantics for this idea (there is no precedent for something like it AFAICT (*)), so it's difficult to say what it actually would be. For example, if I externally project corresponding types from each module, are they equivalent or not? Note that you don't have their importexport names at this point anymore, because they're not part of the types themselves (and if they were, we'd be back to an anti-modular global type namespace). If this is coherently definable at all, I believe it's ultimately some hybrid that is more structural than nominal in nature, though in some odd way tied to module boundaries (which makes me wary, because that typically breaks composability).

But even then I don't see how it would solve the recursive linking problem without real recursive linking in Wasm. Concretely, how would you link your two modules given just Wasm's module instantiation constructs?

(*) Incidentally, I have co-authored a paper with a similar mechanism, where a module's public type members are indeed of this dual import/export nature and can be merged through linking. This is the only system I am aware of that has something similar to this idea. But there, (1) importexport only applies to type definitions that are equivalent a priori, (2) it depends on the complicated type system machinery I have mentioned before, and (3) it immediately opens the can of worms of recursive linking and is not very useful without. Extending this semantics to nominal types in the way imagined would likely amount to extending that work with what the module literature calls "applicative functor semantics". I have written papers on that as well, and it's fairly complicated in it's own right. We have never figured out how to combine the two into one system without blowing up any reasonable complexity budget, even by advanced type theory standards.

@RossTate:

Your "glue" module contains the entire type definitions of both modules. So it is not clear that that is a realistic example of separate compilation.

Under any type-safe linking regime it generally has to anyway (at least partially), unless the client uses the types entirely opaquely like in your toy example, which represents the exception, not the common case or a realistic program.

The JS wrapper would guard it with a boolean.

The type system will want to ensure that (1) every type is defined, (2) no type is defined twice, and ideally (3) that no type is defined vacuously by itself. This is (more than) a linearity property, and checking linearity typically requires global tracking in the typing environment. See either of the papers I linked to. Not sure how you envision achieving this across module and language boundaries. In any case, this is highly non-standard, complex, and I think firmly out of scope.

Right now you are forcing everyone to swallow equi-recursive types even though no one's currently trying to support mutually recursive separate compilation.

To the contrary, I've proposed iso-recursion as an alternative, based on the argument that this is indeed not a pressing problem. All I pointed out was that if we cared, then equi-recursion would be the only solution known to work under Wasm's existing constraints.

@tlively
Copy link
Member Author

tlively commented Jun 9, 2021

@rossberg, I responded to your questions and comments in #148 (comment) so as to not clutter this thread with the specifics of the importexport mechanism. Let's get on the same page there about the semantics of importexport and whether it meaningfully diverges from the current compilation and instantiation model. Then hopefully we will be able to return to this thread and all be on the same page here as well.

@RossTate
Copy link
Contributor

RossTate commented Jun 9, 2021

So, since @rossberg has clarified that he expects the glue/client module to need to have full knowledge of the type definitions inside the mutually recursive modules, here is how one can encode his Haskell example with nominal types (in the following, I use the same syntax as the MVP, but only assume a very weak type-equivalence relation rather than introducing specialized syntax for nominal types):

module A where
data T = T {x :: B.U}

module B where
data U = U {y :: A.T}
(module $A
  (type $B.U (import "B" "U"))  ;; may need a bound depending on local use
  (type $T (import "A" "T") (export "T") (eq (struct (field $x (ref $B.U)))))
  ...
)
(module $B
  (type $A.T (import "A" "T"))  ;; may need a bound depending on local use
  (type $U (import "B" "U") (export "U") (eq (struct (field $y (ref $A.T)))))
  ...
)
(module $Client
  (type $A.T (struct (field $x (ref $A.T))))
  (type $B.U (struct (field $y (ref $B.U))))
  (instance $A.T (import "B" "U" (type $B.U)) (import "A" "T" (type $A.T)))
  (instance $B.U (import "A" "T" (type $A.T)) (import "B" "U" (type $B.U)))
  ...
)

Note that A can be compiled without looking inside B and vice versa. As with @rossberg's solution, only the glue/client looks inside the two.

If there's any concern about the use of the eq constraint, you could also use a sub constraint and import the function for constructing $A.T or $B.U. If there's then any concern about the use of a sub constraint, you could alternatively import the field reference for $x or $y.

Thus it is my belief that we have yet to have an example that equirecursive types supports that nominal types does not also support.

@rossberg
Copy link
Member

@RossTate, this inverts the import/export dependencies of the program, so is not a faithful translation of the modular structure. You are essentially assuming a whole-program transformation.

@tlively
Copy link
Member Author

tlively commented Jun 10, 2021

Why should faithful translation of the modular structure be a requirement?

@tlively
Copy link
Member Author

tlively commented Jun 10, 2021

Also, I disagree that this requires a whole-program transformation for the same reason I gave in #148 (comment). Each strongly-connected component in the type dependency graph (e.g. each separate library) can have its mutually recursive types factored out into a provider module with no knowledge of the downstream users of that strongly-connected component.

@RossTate
Copy link
Contributor

Adding to @tlively's points.

First, as I suggested above, if we added support for separation of type declaration and type definition, we can eliminate the inversion:

(module $A
  (type $B.U (import "B" "U"))  ;; may need a bound depending on local use
  (define-type $T (import "A" "T") (export "T") (struct (field $x (ref $B.U))))
  ...
)
(module $B
  (type $A.T (import "A" "T"))  ;; may need a bound depending on local use
  (define-type $U (import "B" "U") (export "U") (struct (field $y (ref $A.T))))
  ...
)
(module $Client
  (declare-type $A.T)
  (declare-type $B.U)
  (instance $A.T (import "B" "U" (type $B.U)) (import "A" "T" (type $A.T)))
  (instance $B.U (import "A" "T" (type $A.T)) (import "B" "U" (type $B.U)))
  ...
)

Second, even if equi-recursive types were the only way to avoid inversion for mutually recursive Haskell modules, Haskell also requires fixpoints of higher-kinded types, and my understanding of the current state of the art of equi-recursive types is that this extension is believed to make type equivalence (at least) super-exponential-time with respect to the size of the types. Given those tradeoffs, I imagine we would accept module inversion over super-exponentional-time type-checking (though, to reiterate, I believe there are options besides equi-recursion that can avoid module inversion).

@tlively
Copy link
Member Author

tlively commented Jun 17, 2021

As a note for future readers, the bulk of conversation on the inversion of exports into imports has been taking place on #148 starting around here: #148 (comment).

@tlively
Copy link
Member Author

tlively commented Feb 10, 2022

We've settled on the isorecursive type system in #243 as our solution for sharing types across modules, so I'll close this issue.

@tlively tlively closed this as completed Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants