-
Notifications
You must be signed in to change notification settings - Fork 40
Remove type annotation on ref.is_null #99
Comments
Is
A nice property of this would be that it would disallow saying |
@eqrion, disallowing |
Ah sorry, I should have been more clear. I agree that you can't disallow the instruction to operate on subtypes, I was trying to say that it's nice to remove a degree of freedom from the text/binary encoding that can't be meaningfully used correctly. |
* Drop ref.is_null immediate The reference-types proposal has removed the immediate [1]. [1] WebAssembly/reference-types#99 * Mark some wabt tests as failing
Pre-emptively implementing this before it's 'officially' part of the reference-types proposal [1]. [1] WebAssembly/reference-types#99 Differential Revision: https://phabricator.services.mozilla.com/D80135
Pre-emptively implementing this before it's 'officially' part of the reference-types proposal [1]. [1] WebAssembly/reference-types#99 Differential Revision: https://phabricator.services.mozilla.com/D80135
I would prefer to keep the annotation, because I don't agree it's redundant. One of the motivations for splitting Removing the annotation would mean the engine would have to use the input type to the operation to determine what code to generate, which is a form of overloading and goes against the general precedent that an engine can determine exactly what code to generate for every bytecode from only its opcode and immediates, and not the types of its operands. |
@titzer, I wish you could have voiced that opinion a tiny bit earlier. ;) Moving forward, esp with GC and other extensions to reference types, there will potentially be a couple of dozens of instructions in the same category, with an infinite set of possible types, so fairly heavy annotations throughout. For example, should FWIW, I'd still hold the line on overloading. I see what you're saying when comparing to that, but at least technically, this is not overloading, because unlike e.g. addition, the observable behaviour does not depend on the type. It's polymorphism, but in a heap type as opposed to a value type. Whether an implementation chooses to type-specialise is a different question. But precedence for that already exists in some other cases, e.g., how to compile |
Just to capture some of the discussion from today, I think it's not such a big deal in this scenario. I think we need to be careful about removing type annotations too aggressively in the future. In particular, I think we do need to consider it on a case-by-case basis because a missing type annotation can impact ease of code generation and/or introduce some problems in an interpreter. (and yes, I believe an interpreter tier is increasingly important for big modules). Just off the top of my head for |
When an interpreter parses the wasm module, it will represent the code via some internal data structure. Can't that data structure be updated with whatever type information it needs for execution as the interpreter validates the module? |
See WebAssembly/reference-types#99. This can't land currently, since the testsuite hasn't been updated with these changes (looks like merge failure).
In general there is a very tricky space/time tradeoff here. An interpreter not concerned about space overhead can rewrite the bytecode into an more easily interpretable form (e.g. like wasm3's register machine). The problem is that the space cost is usually more than 2x, which basically defeats the space savings of an interpreter. It also mostly defeats the savings of compile time that an interpreter offers over a JIT, because this is essentially a JIT to a target bytecode. On the other end of the spectrum, an interpreter with no auxiliary data structures at all is trying to minimize space overhead. Such an interpreter interprets the original bytecode without the help of any side datastructures other than the instance. That is prohibitively slow once control flow gets complicated because Wasm doesn't have jumps, but br-to-beginning-of-block and br-to-end-of-block, whose offsets are implicit. What we did in the V8 interpreter, which admittedly is not a production tier, but meant only for debugging, is to add a sidetable so that control flow targets can be looked up by the branch pc, so branches are O(log(number of blocks)). That works well enough and is only a 10-20% space overhead, and doesn't require rewriting the original bytecode at all. It's possible to sort that datastructure and maintain a second pc so that branches can become O(1). In general I think with only minor consideration going forward we should try to preserve the option for a low-overhead reasonably fast interpreter tier, because some applications just have huge modules (> 50mb) with lots of cold, and even dead, code. |
Ah, thanks for the explanation. I have a better sense of your concern and use-case now. Could the problem be solved by having the interpreter maintain not just the stack's dynamic value but also the stack's static type (or some sufficient abstraction thereof)? |
@RossTate At least in the wabt interpreter that would be an unfortunate overhead, since the static type isn't needed anywhere else after you validate the wasm module. |
Oh cool. Good to know. In that case, given what already exists in wasm (including unannotated |
@RossTate can you explain what you mean by "non-size-polymorphic" a bit more? I didn't quite follow the examples you gave because I don't know exactly what you're assuming about a hypothetical |
Oops, I got my conversations crossed. I should have said By size polymorphic, I mean the only information you need from the type annotation in order to execute the instruction is the size of type. So for Is that any clearer? |
Yes, thank you. Are there any non-size-polymorphic instructions now or in any proposal? Is it possible to have a non-size-polymorphic instruction without violating our design principle of not having overloaded instructions? I think by definition it is not, in which case this doesn't seem like a useful distinction to make. |
There isn't really a definition of "overloaded". You can always recast an "overloaded" instruction as being "polymorphic over some (small) range of types". So I was trying to get a more concrete/actionable meaning for the term, and currently that appears to be "not size-polymorphic". With that in mind, it's still arguable whether the combination of And since wasm already has
Due to For example, consider Hopefully some of that manages to give some insight into how I'm trying to refine the problem statement and the design guidance it suggests 😅 |
Interesting, that's a fair point. I don't think we have considered that criterion before, but it sounds useful. I'll have to think about how that would affect various instructions.
I think this is mixing up several levels of abstraction and makes a number of hidden assumptions. In particular, Wasm has no notion of size or bits for reference types. Nor should we assume that all actual implementations will always represent null refs by zero, or uniformly across all reference types (if we restrict anyref). Likewise, whether the size of an operand to drop or select is needed is not determined by the language semantics, but by specific implementation details. In an interpreter, for example, which is the case that @titzer is concerned about, it will typically not matter.
If that range is carved out from a larger set that would otherwise be applicable in that place then that's exactly the definition of overloading, a.k.a. ad-hoc polymorphism. |
@rossberg is right that the "implementation size" of values generally doesn't matter for an interpreter because it will use a universal value representation for all values in locals and on the operand stack. That's why |
I didn't say that interpreters had to agree on their size abstraction. An interpreter's own size abstraction would be based on its own implementation details.
In this case,
This is clever, but SIMD is in Phase 3 with a 128-bit value type. Will the efficiency of this scale, especially on a 32-bit machine? I'm concerned that these approaches to interpreting won't scale and will bog down WebAssembly. Consider So would another approach to interpreting be that have the validator phase produce some compact "notes" structure. The interpreter then maintains two pointers: the pointer to the current real instruction, and the pointer into the "notes". After interpreting an instruction like Side note: Out of curiosity, I counted 56 single-byte instructions that would be unnecessary if wasm used "overloading". |
See WebAssembly/reference-types#99. This change also updates the testsuite, so the spec tests pass too. In addition, the behavior of `br_table` is no longer different from MVP, and has a text to confirm this. That is now fixed in `type-checker.cc` too.
I don't feel strongly about From this perspective, the debate about |
I agree. Given that jump targets and with Wasm GC also struct field offsets require additional information that is not present (and in the case of offsets it's platform dependent, so it can't possibly be added), it's pretty much given that a fast interpreter needs a sidetable of some sorts. And if the interpreter already has an additional data structure to store missing immediates, then there is not much additional overhead in having a few more immediates there. So while unnecessary immediates always lead to additional size overhead in the wire format, they probably won't really benefit an interpreter, which needs to store additional information for many instructions anyway. |
See WebAssembly/reference-types#99. This change also updates the testsuite, so the spec tests pass too. In addition, the behavior of `br_table` is no longer different from MVP, and has a text to confirm this. That is now fixed in `type-checker.cc` too.
This was done. |
Before we removed anyref, the
ref.is_null
instruction had a canonical type:One piece of the fallout from removing anyref was that this no longer worked. In order to avoid a dependency on the outcome of the wider discussion opened in WebAssembly/function-references#27, I added a type annotation on the instruction, so that it became
(with the understanding that the
<reftype>
would later be refined to a<heaptype>
as per the typed (function) references proposal).However, given that the discussion on WebAssembly/function-references#27 seems to show a common sentiment to avoid redundant type annotations -- especially considering the many more affected instructions added in something like the GC proposal -- it would be unfortunate if
ref.is_null
became an outlier. And having adapted all the tests, I can say that it is quite annoying in practice, too (ref.null is tedious enough already).So I propose removing the annotation and changing the instruction to
such that the a linear validator simply has to check that there is some
<reftype>
on the stack.Thoughts?
The text was updated successfully, but these errors were encountered: