-
Notifications
You must be signed in to change notification settings - Fork 15
Type annotations on new instructions #27
Comments
More examples of instructions that could or could not have redundant type annotations:
There will likely me many more such instructions in future proposals. For example, input and output type annotations for many of the instructions for stack primitives are redundant. |
@RossTate, yeah, but as mentioned elsewhere, block typing has sailed long ago, and changing it now would be rather tough. In any case, it's outside the scope of this proposal. |
That discussion indicated that the rationale behind the decision was that eliminating type annotations would complicate the pseudo-code interpreter by requiring That decision was also made at a time where only one block-like structure did not need an out-annotation, making the complexity seem unnecessary. But |
These type annotations increase binary size and mean more work for validation, but I don't see any clear benefits. In a fast single-pass compiler, you probably want to do validation together with code generation anyway, meaning the information in these immediates is available anyway. For a slower, optimizing compiler used for tier-up, this information could be cached or should be cheap enough to re-compute. For an interpreter working directly on the wire-bytes, it could in principle be helpful to have this type information right there, at least for some operations. But the Wasm binary format is already hard to interpret without additional sidetable information, and what you really want in an interpreter is often platform-specific, like real offsets in the case of Wasm GC object access. So this information needs to be put in a sidetable or engine-specific internal bytecode anyway. So I really can't imagine any situation where these immediates would be useful enough to pay for the increased size in the transport format, especially since this is a price paid by all users independently of the respective engine even profiting from it. |
For those like me who are curious about the history, this pull request appears to be where the blocks got types. It seems like the primary concern at the time was knowing how many stack items are retained by a branch ("arity"). |
Thanks, @taralx! Historical context is always super useful, and expect everyone involved in that conversation appreciates being saved from repeating the conversation 😄 Looking through it, it seems the option I am suggesting was never discussed: every block-like construct (i.e. has an |
To clarify, That's implicit in the typing rule as stated in the proposal, though it may not be super-obvious. I added an explicit note (and fixed a bug on the way :) ). |
Yes, I understand |
Unfortunately, there are no notes of that, as far as I'm aware; it was early times. FWIW, I argued against that choice at the time, and explicitly raised the question whether this should then apply to all block constructs that we may add in the future. The resolution was yes, because otherwise the primary motivation (simplicity of decoders and validators, avoiding a separate stack for tracking labels) would have been moot. That doesn't mean that we can't revisit, but none of the technical arguments to do so are specific to let. So such a discussion would probably deserve a more general topic and any resolution should consistently apply to all relevant block instructions. I would hence prefer to separate this particular discussion from the current proposal. Focus helps progress. |
+1 to not requiring redundant type immediates for new instructions, in the interest of saving module size. Obviously, instructions that create new values must know a type for them; whereas instructions that consume stack values don't need to encode the expected type of these values. From an engine implementation point of view, redundant type immediates only make us do more work at validation time, because equivalence of actual stack value type and expected type (per immediate) must be checked. Having the immediates provides no (performance or simplicity or safety) benefit. (Direct wire-byte interpreters might be an exception to this "no benefits" statement, but (1) as @tebbi explains above even that is unclear, and (2) it's also unclear whether that's the implementation strategy that Wasm should optimize for -- wouldn't most users of most engines expect to get the speed that comes from compilation to machine code?) From a language point of view, I can see that such additional "consistency checks" might be helpful for a human-written language, but they seem significantly less useful for a compilation target. (For background, I'm mostly looking at this from the GC proposal's perspective, but it's a design decision that should ideally be consistent. For example, |
One argument in favor of out annotations on block-like structures is that in a situation where the block leaves multiple values on the stack on the way out, a streaming compiler can pre-allocate space for those values to go into (they won't all fit in registers). As it is, our baseline compiler does not make use of that information and instead ends up emitting a fair amount of data shuffling code in some cases. I probably believe that it would be possible to get rid of that shuffling code if we can know something about how much space will be required along the exit edge from the block. I don't know that this is a particularly strong argument for us, because it's a baseline compiler - some inefficiencies are to be expected - and because I've not done the detailed analysis, nor am I likely to. I guess there could also be streaming compilers that are not quite so simplistic and would benefit more from information up-front. |
Hmm, interesting idea. But I suspect you'll have to have a bunch of shuffling code with that anyways. That is, even if you preallocate the space, you don't know which instruction outputs end up going into which spots until you reach the end of the block, so either you'll have to add shuffling code or you'll have to backpatch the code, which is probably what you would have done anyways 😕 But now that I understand where @rossberg meant to go with this post (sorry for the confusion, though the opening arguments do apply just as easily to block-like instructions), and that the pull requests for the relevant changes are in progress, it's probably best to let him close the issue when those are done and I'll open up another one to discuss the topic of annotations for blocks elsewhere. |
Sgtm, I'd suggest the design repo, since this could be a separate proposal. |
Thanks for the suggestion! I was wondering that exact question 😄 |
Done: WebAssembly/design#1352 |
I'll repeat here (with some delay) my argument for keeping at least some type annotations that "seem" redundant:
Basically, we should not trade static annotations for dynamic metadata and checks thereof. Even if a system doesn't spend any significant execution time in its interpreter tier, it has to pay the space costs of metadata forever. |
We still need to resolve this. Please let's focus on new instructions consuming reference operands, of the sort occurring in this and the GC proposal, e.g. AFAICS, the main arguments in this discussion so far have been: Against annotations:
For annotations:
For previous discussion, see also WebAssembly/gc#241. |
Type annotations also benefit baseline compilers. Otherwise baseline compilers have to track full types (rather than machine representations) to model the stack effects of instructions like |
I can offer two data points:
While that sounds good in theory, V8 does not care about this property, for two reasons:
I would not find it surprising if many/most other production-quality engines found themselves in the same situation. In the CalcEngine module that's produced by the current versions of J2Wasm and Binaryen, we have:
The total size of the Code section is 9.7 MB, so saving one byte each (which is a lower bound for dropping type immediates) for each of these instructions would save 173 KB or 1.8%. These are uncompressed numbers, and I don't have an easy way to measure the effect on compressed modules; lacking a concrete measurement I would guess that the relative savings are similar (which is what we saw for previous questions of "should we drop some bytes here or there"). Note that there is some fairly obvious opportunity for saving some code size in that module, so the relative impact of these type annotations will get bigger after we've picked other low-hanging fruit; also if we had to add immediates to Same table for Dart's "barista3" benchmark:
Saving one byte for each of these would add up to 35 KB out of 746 KB or 4.7%. (To clarify: the "average bytes" are for the entire instruction, so e.g. 4.25 means that |
Just to quickly recap, the actual numbers for barista3 are 5.8% uncompressed and 6.4% compressed (zopfli). |
It's already the case that different production engines have made different tradeoffs. For example, the SpiderMonkey baseline compiler has its own dedicated decoding logic and its abstract stack does not model full types (instead, kinds, constants, register classes, and spill status). That pays off: SpiderMonkey baseline compiler generates code 4x faster than Liftoff. While current production engine tiering configurations inform designs, we shouldn't allow them to dictate tradeoffs made in other, or future designs. This is what motivated my comments above. An entirely reasonable 3-tier design (interpreter, baseline compiler, optimizing compiler), or a design where validation is separate from baseline compilation for another reason, would be penalized by forcing them to model types. There are at least two other instances that I can think of where modeling types (i.e. reconstructing the validation algorithm) is a detriment: the dynamic control stack in |
I don't have an opinion on the larger issue here, but just wanted to clarify this point. Our baseline compiler performs function validation at the same time as code generation. Function validation requires the value stack, so we always have the full typing of the stack available. For code generation, we keep a parallel stack that models kinds/constants/register classes/spill status as you say. If we were to (for some reason) pre-validate functions before compilation, then theoretically we could try to eliminate the full type tracking. I could see a reason to do this for an interpreter, but I'm unsure the benefit for a compiler. Even if you compile functions lazily, my understanding is that it's legal to defer validation of a function until the point the function it is called for the first time. |
I was talking to @kripken about the graph coloring example offline, and we realized that we don't know how adding annotations would fix it. It seems like the only annotations you could actually add to that program would be for type @RossTate, are we missing something here, or does the type system in that example have NP-hardness even in the presence of annotations on struct.get and struct.set? |
I think we can sidestep the specifics of the graph colouring construction in that presentation by observing that we can't define a convincing linear scan validation algorithm for struct access instructions consuming an opaque type described only by multiple struct-shaped supertype bounds, if said struct access instructions don't have input type annotations. IIUC that's the "essence" that's important here. |
My understanding is that |
Aha, thanks, @RossTate! Before understanding that, it had seemed that the example had no bearing on the present discussion since it would be equally broken with or without annotations. |
@RossTate I may be missing something, but doesn't this break the definition of subsumption given above as well?
The side-condition 'type annotation must be a concrete struct type' seems identical in practice to 'input type must be a concrete struct type'? |
Good questions. Before getting technical, let me demystify and motivate subsumption a bit for everyone. We like to be able to inline functions. Consider a subtyping Now for the technical answers.
Subsumption is a property of a fixed instruction: if a given fixed instruction type-checks with certain inputs, that same instruction must necessarily type-checks with more precise inputs. A type annotation is part of the instruction, so changing the type annotation results in a different instruction; as such, subsumption says nothing about the relationship between these instructions.
That impression is probably because at present "practice" for WebAssembly is not including programs using type variables. If wasm ever gets to generics (which, admittedly, I'm pretty skeptical of at this point) or to powerful forms of separate compilation like Jawa (which I'm also pretty skeptical of at this point), then you will often have programs where you know a value belongs to some abstract type which you know is at least a struct with some fields, at which point requiring an annotation to be a struct type will be very different from requiring an input to belong to a struct type and not a type variable. If it helps to clarify things, there's another way the complexity issue can be rectified using annotations. At present, the annotation for As a reminder, I am giving these technical answers not to force a particular decision. Rather, their implication is just to make the group understand that you must make a decision: remove type annotations xor maintain forward compatibility with some features like multiple upper bounds on imported types. My recommendation would be to do the former, but it's your decision. |
A third option would be to compromise slightly on substitutability under the assumption that any tool that would benefit from substitutability would also be able to add the type annotation as necessary (under any type system that can otherwise work without annotations). |
You can't really compromise slightly on a global principle; it holds or it doesn't hold. Rather, what you're proposing is that the extensions in discussion would not be subtypes but would rather have erasable coercions. For example, a type import would not be a subtype of its additional upper bounds; rather, there would be an |
Thanks Ross for the answer, that helps a lot.
So if the abstract syntax of Personally, I would agree that it's unclear whether we'll have multiple upper-bounded type imports ever and if so, what restrictions we could add on them to accommodate design decisions we make now. I also go back and forth on whether a 6% binary size reduction is acceptable for an MVP. If current Wasm-GC ports are ~2x the size of equivalent JS, 5% would need to be part of a larger story of getting to parity. It would help to have that larger discussion to be able to prioritize this 6%. |
I did some more measurements to estimate the additional overhead of adding a type index to
The additional overhead looks tiny here, but I think these numbers underestimate the typical overhead of adding a type index to
It would be very interesting to see similar numbers from a compiler that uses |
As a quick estimate for Overall a size improvement of ~4% seems significant here and I think we should remove the annotations. My mind could be changed by a plausible example that could not be fixed by annotations later, but so far everything mentioned seems like it could work (e.g. with type imports, we could say that in the coloring example above |
@eqrion I'm not entirely I understand the option you're asking me feedback on, so here's my best guess. Using a fallible inference algorithm (in the future) to recover missing annotations has the same issues as using an incomplete validation algorithm. That said, I'm not sure we need more options to consider. It's been about a week since someone posed the question of whether there were concrete features people wanted to maintain forwards compatibility with. Since then, while I've articulated why you can't always simply add annotations later on to fix an incompatibility, my own answer to the question was that the features I could think of already have compatibility issues with other features of WebAssembly, and I don't believe anyone else has provided any specific features to consider. So maybe y'all are at a point where you can decide to go with removing annotations, which seemed to be the prevailing sentiment of comments in the past week? |
@eqrion I also feel like a 5% space savings doesn't really solve the issue if 2x is what is needed. Also, that 5% space is compressed size. My comment in the meeting a couple weeks back about including a general purpose compression algorithm in our calculations wasn't fully articulated. The compression schemes we have right now are 5% smaller without annotations, but if a different compression algorithm is invented that does better, our cost model is outdated. That's why I think it's odd that we include GP compression schemes in a criteria we are trying to optimizing for. To wit, a layer 1 compression algorithm that is aware of Wasm's type system can and should completely outclass GP compression. In particular, I suspect that the type annotations necessary for locals are also a big chunk of binary size. Layer 1 compression could probably eliminate those for definitely-initialized locals.
There is still disagreement about this. I don't think the discussion has turned up anything that has fundamentally changed the calculus, or any opinions, yet. |
Agreed. If anything, it turned up how slippery the slope would be. ;) |
I encourage you to develop and/or present such an algorithm. On its own, "we shouldn't do anything now because I'm sure someone else will invent a rounder wheel in the future, which will solve all our problems" is not a convincing argument.
In the Sheets module measured above, 200 KB are used for defining 61K locals across 28K functions, accounting for 2.1% of the uncompressed code section size. |
I had thought that the notable lack of motivating examples benefiting from annotations might have been a fundamental change to the decision calculus. If it would help move the discussion forward, I can explain why the one example that has been mentioned—Jawa—does not actually need multiple upper bounds on type imports, leaving us with no motivating examples. (The short of it is that, due to the complications caused by multiple inheritance, there's no significant run-time benefit to knowing an object implements an interface, so there's no need to encode subtyping between interfaces when compiling to wasm.)
Can you explain what you mean by this? On the larger issue of binary size, my understanding is that j2wasm (and maybe dart2wasm) is using aggressive closed-world AOT optimizations to make performance competitive with j2cl. Is that understanding correct? If it is, such performance optimizations can substantially increase code size, so what happens to binary size (with and without annotations) and performance if instead one optimizes for size rather than performance? |
Indeed dart2wasm performs closed-world optimizations, but most of these are ones that reduce code size, e.g. tree shaking, parameter elimination, constant propagation, devirtualization. |
Ah, I would have thought it did inlining as well largely due to its symbiotic role with devirtualization (devirtualization provides an inlining opportunity, which moves virtual calls to a more specialized context, which enables more devirtualization). |
About the importance of binary size, I look at it like this: if we could easily reduce the size of all images on the Web by 5% then we'd want to do that. Likewise for video, JS, etc. The scale of the Web is so big that even a 5% win on one format matters a lot! Maybe one user can't easily notice it by themselves, but plenty of data shows that even small changes to download size have real-world impacts on average user experience. And 5% less total bandwidth across the entire Web is a significant amount of work and electricity over billions of downloads. Because of that there are various projects working to shrink images, video, JS etc., both on the spec side and tools side. As wasm rises in popularity its size will matter more and get closer to those. In fact we already do a lot of work for code size on the tools level, in Binaryen and elsewhere - that has been motivated by existing important real-world use cases that care about binary size today. So even 5% is worth thinking about. Yes, there may be other ways to shrink size (L1 compression), and maybe we'll get to them, but that's uncertain. And, yes, maybe we are 2x larger than JS now, and maybe 5% won't fix that, but it's still 5% better. Whenever we have a good practical way to reduce size that's worth considering. Of course, we need to consider the other side of the tradeoff as well. |
@kripken Thanks for that, I think that's well said.
I just want to clarify my comments regarding the '5% vs 2x' point. I definitely agree that improvements are worth considering even if they don't solve the larger issue completely. I would just like to better understand why wasm-GC binary size is 2x JS binary size. This would help me to evaluate the relative importance of a 5% improvement vs any spec concerns. Are the wasm-GC toolchains doing more inlining? Are call sequences or prologues larger? Is there a runtime library required with wasm that JS doesn't need? Is the data/elem section significant? For example, if there's low hanging fruit elsewhere that doesn't have spec concerns then that seems like the first step before dropping annotations. If the other required fixes are likely to be difficult too, then it may make sense to start with dropping annotations. But without more context on the '2x' number, it's hard for me to weigh the improvement against theoretical spec concerns. |
Because JS is a high-level language and Wasm isn't. Respective size comparisons are mostly apples to oranges. Native code also tends to be much larger than JS source code, probably by more than 2x. |
I'm aware of the difference between JS and Wasm. All of the items I listed as possible reasons for a size difference would be due to Wasm being a lower-level language. What I'm asking for is quantification to help understand what, if anything, can be done about the difference. |
One thing that might help move the conversation forward is if we could get an example of a type system extension for which the existence of unannotated struct.get and struct.set (even in the presence of additional annotated versions) would be a problem. In other words, it would be nice to have an example where the decision procedure for determining whether an annotation is required is greater than constant time (amortized). Lacking such an example, it has been hard to convince ourselves that the risk of encountering such a situation is anything but vanishingly small and that the expected benefit of keeping annotations could outweigh the benefit of reducing code size. Note that for the example with multiple supertypes on type imports, I believe an efficient and sufficient decision procedure for requiring annotations would be to require them when the type of the stack is a subtype of a type import, which would be amortized constant time after preprocessing the types. |
Things are still evolving so any specific measurement may end up changing. But I think these are some of the main factors, which include things you mentioned:
Those are sorted in my current best guess at relative importance from the most to the least, but again, that's imprecise and may change. I'm not sure about the data/elem sections. J2Wasm doesn't emit them, but it does emit equivalent stuff in arrays and structs. It's hard to measure that since it's interleaved in the code.
That reminds me, I actually measured JS compared to native code over a decade ago 😄 yes, as you said, JS is surprisingly compact - and it's only gotten smaller since then! Yes, this is somewhat expected. But the bottom line though is that this matters. If wasm is (say) 2x larger than JS then that will prevent some people from being able to use it. The smaller it is, the more people can use it, and the better it will work for those that do. I wouldn't go as far as to say that "if we make wasm 5% smaller it will be 5% more impactful", but there is a positive connection there. |
Adding a note here that this was discussed in the GC subgroup meeting earlier this week (PR for notes here). The high order bit is that the poll of the room showed that the consensus is to retain type annotations on new instructions. Continuing the discussion for next steps here, some questions:
|
Let's move forward with keeping the existing annotations and adding annotations to |
Closing via #76. |
So far, Wasm has been consciously designed such that the type of every instruction is self-contained. That is, every operand's type is either fully known (possibly with the help of a type immediate) or fully polymorphic.
With instructions like call_ref or func.bind this design choice starts to become a nuisance, and will be even more so under future extensions like those of the GC proposal. Redundant type annotations can also increase the cost of validation, since more types needs to be compared against each other (although it's not clear whether this extra cost is relevant, given the relative infrequency of most affected instructions).
Currently, the proposal omits the type on call_ref and the source type on func.bind. It requires constype immediates on null-related instructions, though, to be consistent with the instructions in the reference types proposal. Should more annotations be added for consistency and ease of type checking, or more removed for convenience/compactness?
The text was updated successfully, but these errors were encountered: