Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

call_indirect and function subtyping #329

Closed
manoskouk opened this issue Oct 10, 2022 · 60 comments
Closed

call_indirect and function subtyping #329

manoskouk opened this issue Oct 10, 2022 · 60 comments

Comments

@manoskouk
Copy link
Contributor

It is my understanding that function subtyping forces call_indirect to perform a runtime subtyping check. Can we confirm this?
On a related note, this check might impact MVP modules: consider an MVP module defining the type t1 = int32 -> int32 and using it as immediate signature for call_indirect. Despite this type not being nontrivially extensible, call_indirect still has to perform a runtime check: another module might always later define t1' = int32 -> int32 and t2 = int32 -> int32 <: t1'; t1, t1' canonicalize to the same type, therefore t2 <: t1. This should not impact running time of MVP modules (since we can include a fast path that checks for signature equality); however, it will impact binary code size. It would be unfortunate if wasm-gc ends up having impact on existing modules. Would it be reasonable to somehow consider MVP types final by default?

@rossberg
Copy link
Member

So far, the proposal errs on the side conservatism and does not change the semantics of call_indirect, i.e., it still requires equivalent types. At last week's meeting, there seems to have been agreement in principle that we'd rather want to allow subtyping there, but the impact on performance is not sufficiently clear yet.

The fact that the engine can't locally know whether a type has subtypes is one of the reasons for the concern. I mentioned that a notion of final could solve that, but the proposal does not currently have that, so would require extending it.

But I think we should experiment and measure the actual overhead first, otherwise it'd be premature optimisation.

@manoskouk
Copy link
Contributor Author

Thanks for the reply.
I am worried though that the conservative approach breaks subsumption in the presence of table.set.

@rossberg
Copy link
Member

rossberg commented Oct 11, 2022

Technically it doesn't, since runtime typing is separate from static typing. This would be clearer if RTTs weren't swept under the rug. ;) The type check for call_indirect is essentially a cast, and as I have noted in other contexts, there is no reason to expect that runtime casts can always reverse subsumption. The only guarantee you have is that if a dynamic subtype check succeeds, then the corresponding static subtyping relation holds; the inverse is not true.

In your example, you have t2 <: t1 <: func, and a function with runtime type t2 but static type func that you are trying to cast down to t1. For practical and philosophical reasons, I would much prefer if this succeeded, but it's not required for any theoretical or technical reason.

@titzer
Copy link
Contributor

titzer commented Oct 11, 2022

I think we should be complete and make call_indirect do a proper subtyping check. From what I remember of doing the V8 implementation, there are enough instructions generated for the bounds check and signature check that I considered pulling it out into a per-signature stub, in which case the additional slowpath checks for the subtype check would be less of an issue.

@eqrion
Copy link
Contributor

eqrion commented Oct 11, 2022

A proper subtyping check in call_indirect would likely be reasonable for SpiderMonkey to implement. We have the check on the callee side (so binary bloat wouldn't be too bad), and could fast-path the case where signatures are equal.

Regarding final and MVP function types, a mild suggestion could be to implicitly make function-types where all params and results are number-types final, because we'll never add a subtyping relation to those val-types, and so any declared subtyping would seem to be pointless. Then for those function-types, type identity would imply subtyping. I don't know how critical this is though.

@rossberg
Copy link
Member

Mh, if it comes to that, I'd much rather have all function types default to final uniformly than introducing irregular rules.

@manoskouk
Copy link
Contributor Author

@rossberg thanks for the explanation.

Another motivation for having subtypes succeed is that we can drop the type check altogether if the immediate type coincides with the table's type.

@RossTate
Copy link
Contributor

RossTate commented Nov 8, 2022

This seems like it would have a very inefficient slow path, and it might even noticeably slow down the fast path. It's also not clear how it would extend to programs using Post-MVP features (where the subtyping between, e.g. existential types representing objects, has to be explicitly proven by the producer rather than automatically determined by the engine).

I have two alternate solutions, one short-term and one long-term.

The short-term solution is to have a variant of ref.func wherein one specifies a list of signatures the indirect call's signature should match exactly with. This covers the whole-program scenarios the GC MVP is targeting.

The long-term solution is to use call tags with fallback behavior. This covers separate-compilation scenarios.

@rossberg
Copy link
Member

@RossTate, as mentioned various times before, adding new constructs, as useful as they might be, does not address the question at hand, which is concerned with the semantics of an existing construct and its interaction with subtyping.

Also, if we allow subtyping for call_indirect, then it would effectively become an exact macro for table.get + ref.cast + call_ref. So whatever post-MVP questions might arise wouldn't be specific to this instruction – to the contrary, such a uniform semantics would avoid the need for specialised answers.

@titzer
Copy link
Contributor

titzer commented Nov 10, 2022

I agree with @rossberg; without function subtyping, call_indirect has special, surprising semantics.

It's worth noting that the "slowpath" of checking for function subtyping would otherwise be a trap. The fast path remains an identity check, and thus should not slow down any program that exists today, nor any program that doesn't rely on function subtyping.

@RossTate
Copy link
Contributor

If you want it to work with subtyping, then solve the problem the same way you did for structs: use isorecursive types and implement function subtyping according to the explicit hierarchy rather than component-wise. This has the benefits of being more consistent with the rest of the GC proposal's static sub typing and dynamic casting designs, of supporting recursive function types, and of being efficiently implementable even on the slow path.

@jakobkummerow
Copy link
Contributor

I didn't think there was any disagreement that function subtyping for call_indirect, if we allow it, would use the same explicit subtyping mechanism as the rest of the GC proposal...?

@RossTate
Copy link
Contributor

Ah, then I misunderstood some of the earlier comments. Sorry.

Then to address this issue:

On a related note, this check might impact MVP modules: consider an MVP module defining the type t1 = int32 -> int32 and using it as immediate signature for call_indirect.

I would recommend that types not in recursion groups be final.

@askeksa-google
Copy link

I would recommend that types not in recursion groups be final.

Are you recommending that as a difference between types not in recursion groups and types in singleton recursion groups, as per the discussion in #334?

It's certainly useful for types that are not recursive to be used as supertypes. Would this suggestion then mean that in order to be used as supertypes, such types would have to be places in recursion groups? Wouldn't that conflict with other potential distinctions such as the ones @rossberg mentioned in #334 (comment)?

@rossberg
Copy link
Member

I would recommend that types not in recursion groups be final.

That strikes me as completely artificial conflation of concerns. If we wanted final, then we should be introducing it as an explicit feature, as discussed before. The use case for it is wholly unrelated to whether a type is recursive. Also, we already concluded that we need actual performance data before considering final.

@RossTate
Copy link
Contributor

Are you recommending that as a difference between types not in recursion groups and types in singleton recursion groups, as per the discussion in #334?

Yes. Or you could think of a type not in a recursion group as the same thing as a type in singleton recursion group and labelled final.

That strikes me as completely artificial conflation of concerns.

The concern expressed in the OP is that having existing function types not be final will incur costs for programs not using function subtyping. We know it will at least cost memory; it might also cost run-time performance. Having existing function types be final addresses that concern. Another way to think about it is that types are final by default and it is extensibility that needs to be explicit. This is how it works in some systems (e.g. Kotlin), and applying that to the MVP for struct types (i.e. struct types even in a recursion group with no subtypes are not extensible) would enable better casts for MVP GC as well (especially since MVP is focusing on whole-program compilation).

@rossberg
Copy link
Member

If we decide to introduce final then we can also make it the default, but that's still completely unrelated to type recursion.

@RossTate
Copy link
Contributor

Come to think of it, for the MVP, the sub clauses could be restricted to types declared in the current group. Then these groups can be thought of as defining subtyping hierarchies (that happen to permit recursion). (As a Post-MVP feature, you could declare some types as "extensible", in which case they can be referred to by sub clauses, but that seems to be only helpful for separate compilation and so not necessary for the MVP.) That would address the issues here, enable more optimizations for the MVP, and still offer a nice interpretation of what these groups are.

@rossberg
Copy link
Member

rossberg commented Nov 10, 2022

That still conflates unrelated issues and obviously would break even the simplest form of separate compilation that we aimed to support.

@conrad-watt

This comment was marked as resolved.

@RossTate
Copy link
Contributor

I honestly feel like I offered a suggestion that aligns well with how the MVP is currently being used, addresses the concerns multiple people raised here, addresses the concerns you and others raised about an earlier suggestion, and provides a clear extension for separate compilation (which could be integrated now if the group chooses). That seems like a productive thing to do, so I am at a loss as to why I am being attacked for it.

@titzer
Copy link
Contributor

titzer commented Nov 10, 2022

Forcing supertypes to be in the same recursion group conflicts with minimizing the size of recursion groups, which is one of the enablers for separate compilation. I don't understand the motivation for the restriction nor what benefit it would give. It just seems arbitrary, and highly likely something we'd labor to relax post-MVP.

edit to add: In general, given that we've reached Phase 3, I think tweaks to the MVP need to be strongly motivated by an important use case encountered in a real language targeting the existing MVP, or real performance data from a production engine implementing the MVP.

@tlively
Copy link
Member

tlively commented Nov 10, 2022

The J2CL and Dart teams have indicated that they may need to start looking at inter-module coordination sooner rather than later, so we (Google) don't want to place any restrictions on recursion groups that would hinder that use case.

@RossTate
Copy link
Contributor

Forcing supertypes to be in the same recursion group conflicts with minimizing the size of recursion groups, which is one of the enablers for separate compilation.

As I mentioned, a sub can mention any type in the same group or any type marked as "extensible". So you can still do the same minimization that's currently possible; you just have to explicitly mark the relevant types as extensible.

I don't understand the motivation for the restriction nor what benefit it would give.

Every cast to a non-extensible type with no subtypes in the group would be guaranteed to be exact. That would let call_indirect be implemented for all core-wasm function types just as it is now. That would also let casts to such "final" types be optimized.

In general, given that we've reached Phase 3, I think tweaks to the MVP need to be strongly motivated by an important use case encountered in a real language targeting the existing MVP, or real performance data from a production engine implementing the MVP.

This thread is about making a change to call_indirect. This change is not necessary to support the use cases. When a subclass overrides a superclass with a broader signature, an entry for the broader signature can be added to the v-table, and the entry for the superclass's method can be filled with a stub with the narrower signature that simply (tail) calls the implementation with the broader signature.

So if you are going to apply this reasoning about Phase 3, then you should be demanding that people demonstrate this causes substantial performance problems before tweaking the MVP to support a different implementation of the pattern. (I am not saying that should be necessary; just pointing out the inconsistent application of this principle.)

@rossberg

This comment was marked as resolved.

@titzer
Copy link
Contributor

titzer commented Nov 10, 2022

So if you are going to apply this reasoning about Phase 3, then you should be demanding that people demonstrate this causes substantial performance problems before tweaking the MVP to support a different implementation of the pattern. (I am not saying that should be necessary; just pointing out the inconsistent application of this principle.)

No, MVP is the null hypothesis. The burden of proof is on proposed changes to MVP. I am increasingly uncomfortable with how this conversation is starting to repeat patterns we've managed to break out of, particularly with somewhat vague speculations. We're fully in empirical mode, and that's gotten us lots of progress.

@RossTate
Copy link
Contributor

The burden of proof is on proposed changes to MVP.

Then can you point me to the empirical evidence you have suggesting that the MVP needs to be changed to support sub typing in call_indirect, rather than employing the technique I provided above? If so, that would be enlightening.

@titzer
Copy link
Contributor

titzer commented Nov 10, 2022

AFAICT it was always the intention that call_indirect would support subtyping and not including it was an oversight. So we are actually finishing the MVP design, rather than departing from it.

@RossTate
Copy link
Contributor

The reason that wasn't done before is because there were concerns about performance. As such, I'm offering a change to type validation that ensures core wasm performs as before. (And a small extension to support separate compilation where the "later" units can add more types to the hierarchy.)

A fine response is "It is useful to have that option on the table. We will evaluate the impact of not using that option has on core wasm modules, in terms of both memory and time, and present the results to the CG. Should they be dissatisfied, then this might be an option we build upon. But at present we would rather gamble on the changes being unnecessary."

@manoskouk
Copy link
Contributor Author

I ran some benchmarks to see the code-size impact of call_indirect subtyping on linear-wasm modules. I found it to be anything from 0.17% up to 10%.
I think this is unacceptable and we should lean towards interpreting current signature types as final.

@RossTate
Copy link
Contributor

RossTate commented Nov 11, 2022

I think we're on the same page. The important thing is that this discussion now seems to be focusing on how to address the issue in the OP, rather than why to not bother, and y'all are exploring some good pay-as-you-go solutions. Thanks, @manoskouk, for validating the concern!

@conrad-watt
Copy link
Contributor

With respect @RossTate, I hope you can take some lessons from this thread to improve your interactions with the group in future. Your comments generated a lot of noise in a situation where there was no disagreement from the outset that:

  1. we can add a concept of final to the GC proposal
  2. whether or not we should will depend on empirical considerations such as those brought up by @manoskouk

@dcodeIO

This comment was marked as resolved.

@titzer
Copy link
Contributor

titzer commented Nov 11, 2022

For the record, I was expecting that the plan of record was to 1.) add function subtyping to call_indirect and then 2.) gather feedback on the performance/size metrics from real engines. I assume we'll be following that with 3.) investigate alternative implementations to mitigate overhead and 4.) explore design options to mitigate overhead.

Thanks @manoskouk for gathering the data in step 2. AFAICT the empirical evaluation has advanced the discussion and I appreciate it!

@tlively
Copy link
Member

tlively commented Nov 11, 2022

@manoskouk, do you have further work planned to try to bring the cost of subtying in call_indirect down, or will solving this require a spec change?

@manoskouk
Copy link
Contributor Author

I would be very happy with any ideas on how to reduce the cost. Currently, the only idea that has come to my attention is to compile call_indirect for wasm-linear signature types as though they are final, and invalidate the generated code if such type happens to be extended. I think this is an overly complex solution and I don't see us going down that path anytime soon.

@titzer
Copy link
Contributor

titzer commented Nov 11, 2022

Two ideas: 1.) move the entire call_indirect sequence out-of-line to a per-signature shared stub[1]. 2.) move the exact-match-failed slowpath (i.e. subtyping) out-of-line (perhaps into a shared stub).

[1] I wanted to do this in V8 for years.

@conrad-watt
Copy link
Contributor

conrad-watt commented Nov 11, 2022

@manoskouk thanks again for your experiments! I agree that solution seems over-complicated. Do you have a sense of what kind of code is associated with the 10% executable binary size overhead? Was this a deliberately-engineered pathological program, or something from an existing codebase? I'd be particularly interested in knowing how bad the situation is for compiled C++ programs with lots of virtual function calls.

I'm sure others can offer their opinions on this, but if the "representative" overhead were closer to 0.2% than 10% I'd feel more comfortable with delaying a "final" extension to the type system. That being said if everyone is willing to put the effort in upfront, it's probably fine to add "final" now and make it the default.

@RossTate

This comment was marked as off-topic.

@conrad-watt
Copy link
Contributor

conrad-watt commented Nov 12, 2022

@RossTate if you, or others, have concerns about how this discussion is being moderated, please directly contact the chairs or the W3C ombuds.

@manoskouk
Copy link
Contributor Author

@conrad-watt These were existing programs that we use as performance benchmarks for V8. Two of them are real-world programs for which I got 7.5% and 10% for the optimized tier.
Regarding additional effort, unless I'm missing something, the only obligatory change is a check during subtype validation, and the rest is optional optimizations.
@titzer Thanks for the feedback. We could certainly try that, but I am worried that the runtime impact on wasm-gc programs with call_indirect will be significant.

@askeksa-google
Copy link

Regarding additional effort, unless I'm missing something, the only obligatory change is a check during subtype validation, and the rest is optional optimizations.

IIUC, the iso-recursive type canonicalization would also need to distinguish between final and extensible types.

@titzer
Copy link
Contributor

titzer commented Nov 14, 2022

Yes, canonicalization has to distinguish finality, otherwise a final definition could get canonicalized with non-final, and thus later be used as a supertype.

@rossberg
Copy link
Member

@manoskouk:

I ran some benchmarks to see the code-size impact of call_indirect subtyping on linear-wasm modules. I found it to be anything from 0.17% up to 10%.

10% sounds like a lot, I have difficulties explaining a number like that. Can you provide some statistics on the frequency of call_indirect in that code, and background on what code sequence you needed to add?

If a subtype check requires that much code, then I would imagine that other casts lead to a lot of code blow-up, too.

@manoskouk
Copy link
Contributor Author

In this benchmark, call_indirect amounts for about 0.85% of instructions. You are right that due to cacheable code and other technical reasons specific to call_indirect, this code is quite bloated in V8. I would appreciate other engines reporting their own numbers.
However the question is, how much overhead is acceptable? Say we drop it to 2%, would that be preferable over the spec change?

@titzer
Copy link
Contributor

titzer commented Nov 15, 2022

@manoskouk Have you measured any execution time overhead? IIUC the above reported results were code size only.

@rossberg
Copy link
Member

Good question. I'd say that depends more on the average than the worst case. 2% as worst case is probably acceptable, as long as the common case is only a fraction of that.

@manoskouk
Copy link
Contributor Author

@titzer I did not find measurable execution time overhead.

To give some more concrete points on generated-code-size regressions, I present the following for three real-world benchmarks: Baseline code size, baseline and optimized-tier regression, and percentage of call_indirect instructions in the .wasm binary.

Benchmark | Code Size | Baseline | Optimized | % call_indirect
--------------------------------------------------------------
Photoshop |    234.6M |    7.13% |     7.46% | 0.81%
Sqlite    |      3.8M |   10.03% |    10.21% | 0.84%
Unity     |      7.9M |    3.80% |     3.97% | 0.34%

@manoskouk
Copy link
Contributor Author

@askeksa-google and myself also implemented the final-types extension for dart2wasm and V8. We measured the code-size decrease (without accounting for call_indirect subtyping) to be 4.5% in the optimized tier.

@eqrion
Copy link
Contributor

eqrion commented Dec 7, 2022

Apologies for the delay, I've finally gotten code size data for adding the subtyping check for call_indirect. The mean and median code size increase for the corpus of modules we have was ~1.5%, min was 0.7%, max was 2.7%. This was measured on our optimizing tier which is the worst case as it emits the smallest code, causing any increases to be more significant.

We implement the subtyping check in an alternative function prologue that call_indirect uses, so the size regression is basically len(funcs) * len(subtypingInstructions). We have an optimization landing soon that will not emit this prologue if the function is not first class and therefore can't be the target of call_indirect, this would reduce the overhead of this even further. 40% of the median module's functions don't need this extra prologue.

I don't have any performance numbers yet. But it seems actually likely this could cause a perf regression because adding subtyping to call_indirect without final forces us to remove an optimization where we represent call_indirect runtime types for small function types as easy to generate immediates (instead of a load from the instance). It's for this reason (and the code size decreases for general purpose downcasts with final) that I'm in favor of adding final if we have call_indirect subtyping.

@askeksa-google
Copy link

I'm also in favor of final types. This is a convenient thing for producers to be able to express.

From what we have seen, it improves the code generation for casts in general, is straightforward to take advantage of in engines and is easy to emit by producers (the change to dart2wasm to mark appropriate types final is less than 10 loc).

With these, function subtyping in call_indirect seems like a no-brainer, as it makes it truly pay-only-for-what-you-use.

@alexp-sssup
Copy link

Just wanted to add our perspective as Wasm producers. In CheerpX (X86->Wasm JIT compiler) we see a 10% instruction count reduction when eliding call_indirect type checks. This was measured by in V8 using commit 4f0ef8c31de177c1f07ff6797b4d8a71fc49f1d9 vs the immediately preceding one.

It would seem to me that the proposal of marking MVP function types "final" by default does bring back the possibility of optimizing away call_indirect type checks, which is a good call from our point of view.

@rossberg
Copy link
Member

@alexp-sssup, just to clarify, final will not allow the runtime type check to be elided. It will merely allow to keep it as small as without function subtyping, because the engine then still only needs to check for type equality instead of subtyping.

@alexp-sssup
Copy link

@rossberg I will admit that I am confused as to how indirect_call to an MVP typed function (i.e. only using primitive types) cannot elide checks if the table type is known

It seems to me that the main purpose of typed function references in tables was eliding the check to begin with, isn't that the case?

@rossberg
Copy link
Member

@alexp-sssup, ah, indeed, if the table type is a concrete function type then no check is needed. For context, I believe most of the previous discussion wasn't about that case specifically (in theory, that doesn't require call_indirect, since you could equivalently use table.get + call_ref).

@tlively
Copy link
Member

tlively commented Dec 14, 2022

At the meeting today we decided to move forward with final types. We can close this once #339 is merged.

@rossberg
Copy link
Member

#339 is merged, closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests