-
Notifications
You must be signed in to change notification settings - Fork 74
Add br_on_cast[_fail] for non-nullable types #342
Comments
+1 to doing something here. It's unfortunate when perfectly non-nullable types go into a |
Yes, good point, that's an oversight. Will fix. |
Please see #343. The problem doesn't arise for ref.cast, because that has only one result type. Hence, if the operand is non-null, the non-null version of the cast can be used to give the desired result. |
Fixed by #343. |
It seems, I can't reopen tickets, only create new ones, but my concern is a follow-up to this issue: My understanding of the discussion was the following one:
Given This might be slightly more complicated on the implementation side but doesn't make it harder to understand from the user side as the engine infers a stricter type and all programs that worked before would still work with this change to the inferred type. |
To summarize, it seems like this current typing does the optimal thing when the input type is non-nullable or when the cast type is non-nullable, but there is room for improvement when both are nullable. |
Just to note, the typing rules prior to the change of #343 did have the desired property that @Liedtke's latest comment sketches (e.g see the old The #343 change has discarded this "output null" knowledge (presumably by accident?) in the course of making the instructions polymorphic on the "input null". I don't think there should be any issue with adding this strengthening back in. |
Relatedly, I wonder if there's more appetite now to revisit #201, which discusses another place where we are currently unnecessarily losing type information. |
intended to address remaining concerns in #342
Never mind, |
I actually considered that, but concluded it was a bit too special-casy, and couldn't think of a particularly good use case to justify the complexity. In fact, now that @tlively points out the relation to #201, I think the typing actually is still too lax even, because it creates the exact same complications that had us reject that. The fix would be to simplify further to
In the light of our decision to maintain explicit instruction types, cast instructions would indeed need a second type annotation if we wanted to allow more general typing. |
We have the maximally precise typing suggested by @conrad-watt implemented in Binaryen and we are happy with it. |
I believe that, yet it would violate what we agreed upon regarding explicit typing. Is there taste for having two type annotations? |
I support the most accurate (output) typing rules possible without adding a second type annotation. Are there any risks with having the most specific output type? |
This seems different from the type annotations on the accessor instructions because the typing does not require looking up the contents of a heap type. I don't think having the output nullability depend on the input nullability should require an extra type annotation, unless I'm missing some other precedent. |
@titzer, I am a little confused, as you were among the most vocal proponents of explicit typing. If you do not care in this case, why did you care before? Or asking in a different way, what is the technical property that we should be shooting for, and derived from that, what is the criterion to distinguish which types matter? If we want to resolve this in a non-conservative fashion, then I think we need to formulate and establish some proper principles that answer that, preferably some that are not merely driven by the needs of specific implementations. Otherwise we'll just end up making random decisions each time that invalidate whatever we decided randomly last time. I, for one, assumed that our principle going forward is that instruction types should be derivable from an instruction's immediates (other than for polymorphic operands that are completely ignored). Plain and simple. |
The key technical property I am shooting for is to avoid any overloading of semantics[2] based on the input type (i.e. type on the operand stack). That is, the semantics of a cast (i.e. the set of values accepted and rejected) are entirely determined by the type annotation. Of course, an engine is free to optimize a cast based on the input type to narrow down the set of values that need to be tested for. With this property, neither an interpreter nor a baseline JIT[1] need additional information beyond the local type annotation to interpret or generate code for a given cast instruction. AFAICT the above proposal for sharpening the output types based on nullability of the input type does not affect the semantics (i.e. does not introduce overloading). It's purely a validation type rule change. [1] Thus a baseline compiler need only model value representations (kinds) of stack slots, not full types, to generate code. |
A conservative relaxation of this principle might be that the only part of an instruction type we allow to be polymorphic is the nullability dimension of a reference type. This would cover both the |
@titzer, what is the semantic distinction by which "the semantics of a cast is entirely determined by the [target] type annotation" (i.e., the source type is not needed), but the same is not the case for, say, struct.get? It seems to me that this distinction is based on specific implementations much rather than semantics. On paper, neither operation is affected by the static type of the source. @conrad-watt, that is a reasonable principle. However, it is not sufficient to support the cases in question: these would require polymorphism in the heap type as well, although it is constrained by the target type through the subtype relation. That is, it can neither be chosen freely, nor is it determined by the immediates. See my previous comments here and following. (For |
@rossberg believe the issue is the type rules. The proposed type sharpening means that the output type of a cast depends on the input (stack) type again, which something we eliminated when we added full annotations and would be now adding back. The motivation for doing so is to accept more programs by having more precise output types. It really doesn't affect interpreter or baseline execution tiers (in Wizard, neither of these tiers model full types), and optimizing compilers are likely to be able to compute at least as precise types in their IR. So AFAICT this is entirely motivated by accepting more programs.
@rossberg I think we are saying the same thing but in different words, though your point about [1] Though there is an analogous type sharpening that we could add for |
Semantically, struct.get is just indexing into a tuple of values, completely agnostic of the tuple's type. Everything beyond that is an unobservable implementation detail – in particular, the field types only matter when you want to optimise the struct representation for space (as, presumably, production implementations want, but e.g. the reference interpreter does not). This is not different from projecting the RTT for a cast, which essentially is getting an implicit field. You may think that you know this field is always there, and always in the same place, so that access does never depend on the type. But that again is making certain assumptions about both implementation details and future language extensions. For example, with an extension to descriptors as discussed, it's not unthinkable that an implementation might need to know something about a struct's associated descriptor type to access the RTT correctly. That is, semantically, neither op needs the type. In an implementation, either may need it. If we want to make a distinction between the two ops, some additional argument is needed. |
It seems fine to me to make the distinction based on practical realities of how interpreters / base tiers are typically implemented, even if those arguments don't apply to every possible implementation design. We do that often enough informally when designing features, and it's not like the design rationale needs to be part of the spec formalism. The simple decision procedure of "add an annotation if and only if determining the result type of an instruction may require looking at the contents of a heap type definition" seems appropriately simple and addresses what in my mind was the most compelling argument for adding type annotations, which AFAIU was to allow in-place interpreters to avoid looking at separate data structures to determine output types. |
I don't think the result type matters specifically. For example, struct.set has a trivial result type. The only meaningful variation of that principle is whether "validating the instruction requires looking at the contents of a type definition". And that is the case for a cast's source type under the relaxed typing.
Well, I hope my previous comment explained why the static and the dynamic need to know the type are not generally correlated. It depends on both the operation and the implementation strategy. |
Just to check my understanding: the problem here is that we have heap type polymorphism with constraints that come from outside the instruction itself? Specifically for We talked about the use case of analyzing code snippets without their context and having to figure out their types. Was this hypothetical or was there a connection to implementation problems like having to validate unreachable code? Are there any other reasons implementations would want to require avoiding externally constrained polymorphism? Or is the purpose of the requirement to provide a consistent rationale for our choices of which instructions to annotate without depending on implementation details? |
More precisely, the interaction between type polymorphism (for operands/results) with constraints (from immediates). In all other places, polymorphic operand/result types are unconstrained value or heap types.
If by "implementations" you mean more than just engines, then yes: other consumers, e.g., tools for static analysis, refactoring, etc. may need to determine types in less unidirectional manners. As mentioned in my other reply, this is a form of principal type property, which generally is desirable to keep subtyping under control, or things tend to get more complicated/costly eventually. And yes, this was one of the arguments for type annotations before. We could weaken it now, but then it is more difficult to argue why we bother with the other annotations. FWIW, I don't think there's a problem with unreachable code for this one. |
Thanks! I'll copy my comment on principal types over here so the conversation can stay in one place:
I am sure that there could exist tools that would benefit from this property, but I'm not sure how painful a workaround would be for them. It would be ideal if we could hear from the implementers of such a tool. It's also unclear to me how much worse having to look at the typing context is than having to look at e.g. the types of locals to type |
After talking this over with the V8 team today, we agree that it would be good to have a consistent implementation-independent rule for when to apply type annotations so that we don't have to consider it on a case-by-case basis with the outcome depending entirely on who's in the room. We would prefer to achieve consistency by removing existing type annotations because we care a lot about code size and none of our implementations or tools benefit from the type annotations. We agree that it is important for tools to be able to find the types of arbitrary code snippets, but we think it's sufficient that those tools can precompute the types using a simple linear scan of the code. Since linear validation is a hard requirement of Wasm's design, finding types shouldn't become more expensive in the future. Alternatively, if we do require having a principal type for arbitrary code snippets (as long as you know the types of locals and labels, etc. but not necessarily the types of surrounding instructions), then it seems we would have to add annotations not just to the casts but also to |
We could, but there is a good argument why those are in a different, much simpler class: in the typing rule of these instructions, the occurring heap type
The guiding principle applies the same: the type of these instruction, and the relation between their operands and results, cannot be determined (up to type variables) without knowing what struct/array is accessed. Analogous to |
FYI, the typing rules with two annotations would be:
where
|
EDIT: after typing all this out, I believe this idea doesn't actually get us principal types, but I'm leaving it below for posterity. For example in the sketch rule for I've just realised that there might be another point in the design space which we've not considered, and I think it turns out to be quite pleasant, avoiding the need for a new type annotation. The insight is that we're trying to do two different things with
These uses are mutually exclusive. In the case that the input type is known to be non-null, the presence or absence of the null annotation has no semantic effect, and should have no effect on the label/output types. So I think the need for an additional type annotation is obviated if we add the following additional instructions
where the input type to these instructions is required to be non-null (and consequently there's no need for a null-check toggle). This means that the "nullish" variants ( Sketch rules:
|
For the null axis, isn't that equivalent to annotating both types? And for the heap type axis, doesn't it still suffer from a lack of principal types (unless resorting to side constraints), because of the subtyping constraints on the implicit |
Yes, you're right. As I realised just after posting the initial version of my comment, my idea doesn't actually help with the concern discussed in the meeting. |
Current situation:
br_on_cast
operates on nullable input types. One can choose which path thenull
value "takes" by either usingbr_on_cast
orbr_on_cast null
.Issue:
For any given non-nullable input type, one of the two output types (either on the branch or on the fall-through case) will be nullable.
Ideally both output types (on branch / on non-branch) should be non-nullable for non-nullable inputs to not require useless
ref.as_non_null
casts afterwards.Proposal:
This can be fixed by making
br_on_cast
andbr_on_cast_fail
polymorphic on the input type by changingbr_on_cast $l null? ht : [t0* (ref null ht')] -> [t0* (ref null2? ht')]
tobr_on_cast $l null? ht : [t0* (ref null4? ht')] -> [t0* (ref null2? ht')]
with
null2
(andnull3
in the label type) not appearing ifnull4
is is not there, independent of the presence ofnull?
.(The same applies to
br_on_cast_fail
).The change doesn't really have any impact on the semantics of the instructions and would only affect the inferred result types.
[Edit] Follow-up question:
Should the same be applied for
ref.cast null?
for consistency?That would mean that
ref.cast null
only produces a nullable type iff the input type is nullable.The text was updated successfully, but these errors were encountered: