-
Notifications
You must be signed in to change notification settings - Fork 74
call_indirect and function subtyping #329
Comments
So far, the proposal errs on the side conservatism and does not change the semantics of The fact that the engine can't locally know whether a type has subtypes is one of the reasons for the concern. I mentioned that a notion of final could solve that, but the proposal does not currently have that, so would require extending it. But I think we should experiment and measure the actual overhead first, otherwise it'd be premature optimisation. |
Thanks for the reply. |
Technically it doesn't, since runtime typing is separate from static typing. This would be clearer if RTTs weren't swept under the rug. ;) The type check for call_indirect is essentially a cast, and as I have noted in other contexts, there is no reason to expect that runtime casts can always reverse subsumption. The only guarantee you have is that if a dynamic subtype check succeeds, then the corresponding static subtyping relation holds; the inverse is not true. In your example, you have |
I think we should be complete and make |
A proper subtyping check in Regarding |
Mh, if it comes to that, I'd much rather have all function types default to final uniformly than introducing irregular rules. |
@rossberg thanks for the explanation. Another motivation for having subtypes succeed is that we can drop the type check altogether if the immediate type coincides with the table's type. |
This seems like it would have a very inefficient slow path, and it might even noticeably slow down the fast path. It's also not clear how it would extend to programs using Post-MVP features (where the subtyping between, e.g. existential types representing objects, has to be explicitly proven by the producer rather than automatically determined by the engine). I have two alternate solutions, one short-term and one long-term. The short-term solution is to have a variant of The long-term solution is to use call tags with fallback behavior. This covers separate-compilation scenarios. |
@RossTate, as mentioned various times before, adding new constructs, as useful as they might be, does not address the question at hand, which is concerned with the semantics of an existing construct and its interaction with subtyping. Also, if we allow subtyping for call_indirect, then it would effectively become an exact macro for table.get + ref.cast + call_ref. So whatever post-MVP questions might arise wouldn't be specific to this instruction – to the contrary, such a uniform semantics would avoid the need for specialised answers. |
I agree with @rossberg; without function subtyping, It's worth noting that the "slowpath" of checking for function subtyping would otherwise be a trap. The fast path remains an identity check, and thus should not slow down any program that exists today, nor any program that doesn't rely on function subtyping. |
If you want it to work with subtyping, then solve the problem the same way you did for structs: use isorecursive types and implement function subtyping according to the explicit hierarchy rather than component-wise. This has the benefits of being more consistent with the rest of the GC proposal's static sub typing and dynamic casting designs, of supporting recursive function types, and of being efficiently implementable even on the slow path. |
I didn't think there was any disagreement that function subtyping for call_indirect, if we allow it, would use the same explicit subtyping mechanism as the rest of the GC proposal...? |
Ah, then I misunderstood some of the earlier comments. Sorry. Then to address this issue:
I would recommend that types not in recursion groups be final. |
Are you recommending that as a difference between types not in recursion groups and types in singleton recursion groups, as per the discussion in #334? It's certainly useful for types that are not recursive to be used as supertypes. Would this suggestion then mean that in order to be used as supertypes, such types would have to be places in recursion groups? Wouldn't that conflict with other potential distinctions such as the ones @rossberg mentioned in #334 (comment)? |
That strikes me as completely artificial conflation of concerns. If we wanted final, then we should be introducing it as an explicit feature, as discussed before. The use case for it is wholly unrelated to whether a type is recursive. Also, we already concluded that we need actual performance data before considering final. |
Yes. Or you could think of a type not in a recursion group as the same thing as a type in singleton recursion group and labelled final.
The concern expressed in the OP is that having existing function types not be final will incur costs for programs not using function subtyping. We know it will at least cost memory; it might also cost run-time performance. Having existing function types be final addresses that concern. Another way to think about it is that types are final by default and it is extensibility that needs to be explicit. This is how it works in some systems (e.g. Kotlin), and applying that to the MVP for struct types (i.e. struct types even in a recursion group with no subtypes are not extensible) would enable better casts for MVP GC as well (especially since MVP is focusing on whole-program compilation). |
If we decide to introduce final then we can also make it the default, but that's still completely unrelated to type recursion. |
Come to think of it, for the MVP, the |
That still conflates unrelated issues and obviously would break even the simplest form of separate compilation that we aimed to support. |
This comment was marked as resolved.
This comment was marked as resolved.
I honestly feel like I offered a suggestion that aligns well with how the MVP is currently being used, addresses the concerns multiple people raised here, addresses the concerns you and others raised about an earlier suggestion, and provides a clear extension for separate compilation (which could be integrated now if the group chooses). That seems like a productive thing to do, so I am at a loss as to why I am being attacked for it. |
Forcing supertypes to be in the same recursion group conflicts with minimizing the size of recursion groups, which is one of the enablers for separate compilation. I don't understand the motivation for the restriction nor what benefit it would give. It just seems arbitrary, and highly likely something we'd labor to relax post-MVP. edit to add: In general, given that we've reached Phase 3, I think tweaks to the MVP need to be strongly motivated by an important use case encountered in a real language targeting the existing MVP, or real performance data from a production engine implementing the MVP. |
The J2CL and Dart teams have indicated that they may need to start looking at inter-module coordination sooner rather than later, so we (Google) don't want to place any restrictions on recursion groups that would hinder that use case. |
As I mentioned, a
Every cast to a non-extensible type with no subtypes in the group would be guaranteed to be exact. That would let
This thread is about making a change to So if you are going to apply this reasoning about Phase 3, then you should be demanding that people demonstrate this causes substantial performance problems before tweaking the MVP to support a different implementation of the pattern. (I am not saying that should be necessary; just pointing out the inconsistent application of this principle.) |
This comment was marked as resolved.
This comment was marked as resolved.
No, MVP is the null hypothesis. The burden of proof is on proposed changes to MVP. I am increasingly uncomfortable with how this conversation is starting to repeat patterns we've managed to break out of, particularly with somewhat vague speculations. We're fully in empirical mode, and that's gotten us lots of progress. |
Then can you point me to the empirical evidence you have suggesting that the MVP needs to be changed to support sub typing in |
AFAICT it was always the intention that |
The reason that wasn't done before is because there were concerns about performance. As such, I'm offering a change to type validation that ensures core wasm performs as before. (And a small extension to support separate compilation where the "later" units can add more types to the hierarchy.) A fine response is "It is useful to have that option on the table. We will evaluate the impact of not using that option has on core wasm modules, in terms of both memory and time, and present the results to the CG. Should they be dissatisfied, then this might be an option we build upon. But at present we would rather gamble on the changes being unnecessary." |
I ran some benchmarks to see the code-size impact of |
I think we're on the same page. The important thing is that this discussion now seems to be focusing on how to address the issue in the OP, rather than why to not bother, and y'all are exploring some good pay-as-you-go solutions. Thanks, @manoskouk, for validating the concern! |
With respect @RossTate, I hope you can take some lessons from this thread to improve your interactions with the group in future. Your comments generated a lot of noise in a situation where there was no disagreement from the outset that:
|
This comment was marked as resolved.
This comment was marked as resolved.
For the record, I was expecting that the plan of record was to 1.) add function subtyping to Thanks @manoskouk for gathering the data in step 2. AFAICT the empirical evaluation has advanced the discussion and I appreciate it! |
@manoskouk, do you have further work planned to try to bring the cost of subtying in |
I would be very happy with any ideas on how to reduce the cost. Currently, the only idea that has come to my attention is to compile |
Two ideas: 1.) move the entire [1] I wanted to do this in V8 for years. |
@manoskouk thanks again for your experiments! I agree that solution seems over-complicated. Do you have a sense of what kind of code is associated with the 10% executable binary size overhead? Was this a deliberately-engineered pathological program, or something from an existing codebase? I'd be particularly interested in knowing how bad the situation is for compiled C++ programs with lots of virtual function calls. I'm sure others can offer their opinions on this, but if the "representative" overhead were closer to 0.2% than 10% I'd feel more comfortable with delaying a "final" extension to the type system. That being said if everyone is willing to put the effort in upfront, it's probably fine to add "final" now and make it the default. |
This comment was marked as off-topic.
This comment was marked as off-topic.
@RossTate if you, or others, have concerns about how this discussion is being moderated, please directly contact the chairs or the W3C ombuds. |
@conrad-watt These were existing programs that we use as performance benchmarks for V8. Two of them are real-world programs for which I got 7.5% and 10% for the optimized tier. |
IIUC, the iso-recursive type canonicalization would also need to distinguish between final and extensible types. |
Yes, canonicalization has to distinguish finality, otherwise a final definition could get canonicalized with non-final, and thus later be used as a supertype. |
10% sounds like a lot, I have difficulties explaining a number like that. Can you provide some statistics on the frequency of call_indirect in that code, and background on what code sequence you needed to add? If a subtype check requires that much code, then I would imagine that other casts lead to a lot of code blow-up, too. |
In this benchmark, call_indirect amounts for about 0.85% of instructions. You are right that due to cacheable code and other technical reasons specific to call_indirect, this code is quite bloated in V8. I would appreciate other engines reporting their own numbers. |
@manoskouk Have you measured any execution time overhead? IIUC the above reported results were code size only. |
Good question. I'd say that depends more on the average than the worst case. 2% as worst case is probably acceptable, as long as the common case is only a fraction of that. |
@titzer I did not find measurable execution time overhead. To give some more concrete points on generated-code-size regressions, I present the following for three real-world benchmarks: Baseline code size, baseline and optimized-tier regression, and percentage of call_indirect instructions in the .wasm binary.
|
@askeksa-google and myself also implemented the final-types extension for dart2wasm and V8. We measured the code-size decrease (without accounting for call_indirect subtyping) to be 4.5% in the optimized tier. |
Apologies for the delay, I've finally gotten code size data for adding the subtyping check for call_indirect. The mean and median code size increase for the corpus of modules we have was We implement the subtyping check in an alternative function prologue that call_indirect uses, so the size regression is basically I don't have any performance numbers yet. But it seems actually likely this could cause a perf regression because adding subtyping to call_indirect without |
I'm also in favor of final types. This is a convenient thing for producers to be able to express. From what we have seen, it improves the code generation for casts in general, is straightforward to take advantage of in engines and is easy to emit by producers (the change to dart2wasm to mark appropriate types final is less than 10 loc). With these, function subtyping in |
Just wanted to add our perspective as Wasm producers. In CheerpX (X86->Wasm JIT compiler) we see a 10% instruction count reduction when eliding call_indirect type checks. This was measured by in V8 using commit It would seem to me that the proposal of marking MVP function types "final" by default does bring back the possibility of optimizing away call_indirect type checks, which is a good call from our point of view. |
@alexp-sssup, just to clarify, |
@rossberg I will admit that I am confused as to how indirect_call to an MVP typed function (i.e. only using primitive types) cannot elide checks if the table type is known It seems to me that the main purpose of typed function references in tables was eliding the check to begin with, isn't that the case? |
@alexp-sssup, ah, indeed, if the table type is a concrete function type then no check is needed. For context, I believe most of the previous discussion wasn't about that case specifically (in theory, that doesn't require call_indirect, since you could equivalently use table.get + call_ref). |
At the meeting today we decided to move forward with final types. We can close this once #339 is merged. |
#339 is merged, closing. |
It is my understanding that function subtyping forces
call_indirect
to perform a runtime subtyping check. Can we confirm this?On a related note, this check might impact MVP modules: consider an MVP module defining the type
t1 = int32 -> int32
and using it as immediate signature forcall_indirect
. Despite this type not being nontrivially extensible,call_indirect
still has to perform a runtime check: another module might always later definet1' = int32 -> int32
andt2 = int32 -> int32 <: t1'
;t1, t1'
canonicalize to the same type, thereforet2 <: t1
. This should not impact running time of MVP modules (since we can include a fast path that checks for signature equality); however, it will impact binary code size. It would be unfortunate if wasm-gc ends up having impact on existing modules. Would it be reasonable to somehow consider MVP types final by default?The text was updated successfully, but these errors were encountered: