Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

re-strengthen output null knowledge in cast instructions #347

Closed
wants to merge 6 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 23 additions & 19 deletions proposals/gc/MVP.md
Original file line number Diff line number Diff line change
Expand Up @@ -602,35 +602,39 @@ In particular, `ref.null` is typed as before, despite the introduction of `none`

#### Casts

TODO: ensure that this section (and implementations, and tests) is kept in sync with the results of the conversation [here](https://github.com/WebAssembly/gc/issues/342).

Casts work for both abstract and concrete types. In the latter case, they test if the operand's RTT is a sub-RTT of the target type.

* `ref.test <reftype>` tests whether a reference has a given type
- `ref.test rt : [rt'] -> [i32]`
- iff `rt <: trt` and `rt' <: trt` for some `trt`
- if `rt` contains `null`, returns 1 for null, otherwise 0
* `ref.test null? <heaptype>` checks whether a reference has a given heap type
- `ref.test null? ht : [(ref null ht')] -> [i32]`
- iff `ht <: tht` and `ht' <: tht` where `tht` is a common super type
- if `null?` is present, returns 1 for null, otherwise 0

* `ref.cast <reftype>` tries to convert a reference to a given type
- `ref.cast rt : [rt'] -> [rt]`
- iff `rt <: trt` and `rt' <: trt` for some `trt`
* `ref.cast null? <heaptype>` tries to convert to a given heap type
- `ref.cast null? ht : [(ref null ht')] -> [(ref null? ht)]`
- iff `ht <: tht` and `ht' <: tht` where `tht` is a common super type
Comment on lines +615 to +616
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `ref.cast null? ht : [(ref null ht')] -> [(ref null? ht)]`
- iff `ht <: tht` and `ht' <: tht` where `tht` is a common super type
- `ref.cast null? ht : [(ref null ht')] -> [(ref null2? ht)]`
- iff `ht <: tht` and `ht' <: tht` where `tht` is a common super type
- and `null? = null2?`

(Uses of ? are independent per se, they need to be forced equal explicitly.)

- traps if reference is not of requested type
- if `rt` contains `null`, a null operand is passed through, otherwise traps on null
- equivalent to `(block $l (param trt) (result rt) (br_on_cast $l rt) (unreachable))`
- if `null?` is present, a null operand is passed through, otherwise traps on null
- equivalent to `(block $l (param (ref null tht)) (result (ref null? ht)) (br_on_cast $l null? ht) (unreachable))`

* `br_on_cast <labelidx> <reftype>` branches if a reference has a given type
- `br_on_cast $l rt : [t0* rt'] -> [t0* rt']`
* `br_on_cast <labelidx> null? <heaptype>` branches if a reference has a given heap type
- `br_on_cast $l null? ht : [t0* (ref null3? ht')] -> [t0* (ref null2? ht')]`
- iff `$l : [t0* t']`
- and `rt <: t'`
- and `rt <: trt` and `rt' <: trt` for some `trt`
- and `(ref null4? ht) <: t'`
- and `ht <: tht` and `ht' <: tht` where `tht` is a common super type
- and `null? = null4? =/= null2? \/ null2? = null3? = null4? = epsilon`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is pretty complicated. I had to write out the full matrix to understand what a forward validation algorithm would have to do:

t' null null3  null2 null4
0 0 0  _ 0
0 0 1  1 0
0 1 0  0 0
0 1 1  invalid
1 0 0  _ 0
1 0 1  1 0
1 1 0  0 _
1 1 1  0 1

where _ means undetermined, in the case of null2 that implies picking 0 as principal. We need good test coverage for this.

This is even more complex if you didn't do a forward algorithm. In particular, there are no principal types in general, i.e., without having null3. I have to say that I feel uncomfortable about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there isn't a cleaner way to express the side conditions in terms of meets/joins on nullability?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more about what you mean about principal types and non-forward algorithms here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlively, in a type system with subtyping you usually want the principal types property, i.e., the property that, given an environment for identifiers, any expression (correspondingly, an instruction sequence in Wasm) has a most precise type. With the above rule for null that is no longer the case: e.g., (br_on_cast $l $t) where $l : (ref $t) has two possible types, (ref $t) -> (ref $t) and (ref null $t) -> (ref null $t), neither of which is more precise than the other. You'll need to already know either the input or the output type to decide. When you do full "forward" validation like in engine you have the former, but that's not necessarily the case in other circumstances or tools doing static analysis.

So, yeah, this is in fact exactly the same problem as with the missing heap type annotation. I was wrong in the meeting when I said that we wouldn't need to annotate the null flag for the source type. If we want to fix the annotation problem, we need two ref types, not just two heap types. Something like:

- `br_on_cast $l rt1 rt2 :  [t0* rt1] -> [t0* rt1']`
     - iff `$l : [t0* t']`
     - and `rt2 <: t'`
     - and `rt1 <: trt` and `rt2 <: trt` where `trt` is a common super type
     - and `rt1 = null1? ht1` and `rt2 = null2? ht2` and `rt1' = (null1? /\ ~null2?) ht1'`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I guess it would be nice in principle to always have a principle type so that an optimization like outlining could uniquely determine a function type to assign to outlined code fragments. But on the other hand, an optimizer/analyzer could also easily work around the lack of type annotations by looking at the context. In fact, we'll have to (and want to!) do this workaround in Binaryen anyway since we intentionally do not preserve these accessor type annotations in our IR.

- passes operand along with branch under target type, plus possible extra args
- if `rt` contains `null`, branches on null, otherwise does not
- if `null?` is present, branches on null, otherwise does not

* `br_on_cast_fail <labelidx> <reftype>` branches if a reference does not have a given type
- `br_on_cast_fail $l rt : [t0* rt'] -> [t0* rt]`
* `br_on_cast_fail <labelidx> null? <heaptype>` branches if a reference does not have a given heap type
- `br_on_cast_fail $l null? ht : [t0* (ref null3? ht')] -> [t0* (ref null2? ht)]`
- iff `$l : [t0* t']`
- and `rt' <: t'`
- and `rt <: trt` and `rt' <: trt` for some `trt`
- and `(ref null4? ht') <: t'`
- and `ht <: tht` and `ht' <: tht` where `tht` is a common super type
- and `null? = null2? =/= null4? \/ null2? = null3? = null4? = epsilon`
- passes operand along with branch, plus possible extra args
- if `rt` contains `null`, does not branch on null, otherwise does
- if `null?` is present, does not branch on null, otherwise does

Note: Cast instructions do _not_ require the operand's source type to be a supertype of the target type. It can also be a "sibling" in the same hierarchy, i.e., they only need to have a common supertype (in practice, it is sufficient to test that both types share the same top heap type.). Allowing so is necessary to maintain subtype substitutability, i.e., the ability to maintain well-typedness when operands are replaced by subtypes.

Expand Down