Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Type annotations on new instructions #27

Closed
rossberg opened this issue May 27, 2020 · 99 comments
Closed

Type annotations on new instructions #27

rossberg opened this issue May 27, 2020 · 99 comments

Comments

@rossberg
Copy link
Member

rossberg commented May 27, 2020

So far, Wasm has been consciously designed such that the type of every instruction is self-contained. That is, every operand's type is either fully known (possibly with the help of a type immediate) or fully polymorphic.

With instructions like call_ref or func.bind this design choice starts to become a nuisance, and will be even more so under future extensions like those of the GC proposal. Redundant type annotations can also increase the cost of validation, since more types needs to be compared against each other (although it's not clear whether this extra cost is relevant, given the relative infrequency of most affected instructions).

Currently, the proposal omits the type on call_ref and the source type on func.bind. It requires constype immediates on null-related instructions, though, to be consistent with the instructions in the reference types proposal. Should more annotations be added for consistency and ease of type checking, or more removed for convenience/compactness?

@RossTate
Copy link

More examples of instructions that could or could not have redundant type annotations:

  • input type annotation of block
  • input type annotation of if
  • output type annotation of loop
  • input type annotation of try
  • input and output type annotations of let

There will likely me many more such instructions in future proposals. For example, input and output type annotations for many of the instructions for stack primitives are redundant.

@rossberg
Copy link
Member Author

@RossTate, yeah, but as mentioned elsewhere, block typing has sailed long ago, and changing it now would be rather tough. In any case, it's outside the scope of this proposal.

@RossTate
Copy link

That discussion indicated that the rationale behind the decision was that eliminating type annotations would complicate the pseudo-code interpreter by requiring end to check if the construct had an associated out-annotation and, if so, to restrict the outgoing types to that annotation and check them for compatibility.

That decision was also made at a time where only one block-like structure did not need an out-annotation, making the complexity seem unnecessary. But let will be another example where such an out-annotation is unnecessary, and there will likely be more examples down the line. let also introduces new complexities to end, such as cleaning up the local-variable stack, meaning that end will become more complicated anyways. So the landscape is changing in a way that suggests the decision should be reevaluated, and this proposal is the first sign of that change.

@tebbi
Copy link

tebbi commented May 29, 2020

These type annotations increase binary size and mean more work for validation, but I don't see any clear benefits. In a fast single-pass compiler, you probably want to do validation together with code generation anyway, meaning the information in these immediates is available anyway. For a slower, optimizing compiler used for tier-up, this information could be cached or should be cheap enough to re-compute. For an interpreter working directly on the wire-bytes, it could in principle be helpful to have this type information right there, at least for some operations. But the Wasm binary format is already hard to interpret without additional sidetable information, and what you really want in an interpreter is often platform-specific, like real offsets in the case of Wasm GC object access. So this information needs to be put in a sidetable or engine-specific internal bytecode anyway. So I really can't imagine any situation where these immediates would be useful enough to pay for the increased size in the transport format, especially since this is a price paid by all users independently of the respective engine even profiting from it.

@taralx
Copy link

taralx commented May 30, 2020

For those like me who are curious about the history, this pull request appears to be where the blocks got types. It seems like the primary concern at the time was knowing how many stack items are retained by a branch ("arity").

@RossTate
Copy link

Thanks, @taralx! Historical context is always super useful, and expect everyone involved in that conversation appreciates being saved from repeating the conversation 😄 Looking through it, it seems the option I am suggesting was never discussed: every block-like construct (i.e. has an end) specifies an arity to know how many values to leave on the stack (I suspect 0 is extremely common), and if the block-like construct provides a branch target (e.g. block, if, and loop) then specify a "label" type for the target (e.g. [i32, i32]). Now, given what already exists, it might just be easier to say that block-like constructs providing a branch target use block types, and those not providing a branch target (like let) use an arity. But we should try to reduce how much we rely on the compression granted by using type indices—as WebAssembly's type system grows there will be fewer and fewer identical block types within a module.

@rossberg
Copy link
Member Author

rossberg commented Jun 2, 2020

To clarify, let is consistent with other block constructs and binds a label -- as mentioned elsewhere, treating all block constructs uniformly as branch targets was the group's decision long ago.

That's implicit in the typing rule as stated in the proposal, though it may not be super-obvious. I added an explicit note (and fixed a bug on the way :) ).

@RossTate
Copy link

RossTate commented Jun 2, 2020

Yes, I understand let currently binds a label solely due to a decision made long ago when all block constructs naturally bound a label. Above I gave reasons to revisit that decision. Could you link to the relevant discussion behind that decision so that we can see if/how its rationale carries over to let and other non-control-flow constructs (in addition to the one linked above, which I've already tried to incorporate into the discussion here)?

@rossberg
Copy link
Member Author

rossberg commented Jun 2, 2020

Unfortunately, there are no notes of that, as far as I'm aware; it was early times.

FWIW, I argued against that choice at the time, and explicitly raised the question whether this should then apply to all block constructs that we may add in the future. The resolution was yes, because otherwise the primary motivation (simplicity of decoders and validators, avoiding a separate stack for tracking labels) would have been moot.

That doesn't mean that we can't revisit, but none of the technical arguments to do so are specific to let. So such a discussion would probably deserve a more general topic and any resolution should consistently apply to all relevant block instructions. I would hence prefer to separate this particular discussion from the current proposal. Focus helps progress.

@jakobkummerow
Copy link
Contributor

+1 to not requiring redundant type immediates for new instructions, in the interest of saving module size. Obviously, instructions that create new values must know a type for them; whereas instructions that consume stack values don't need to encode the expected type of these values.

From an engine implementation point of view, redundant type immediates only make us do more work at validation time, because equivalence of actual stack value type and expected type (per immediate) must be checked. Having the immediates provides no (performance or simplicity or safety) benefit. (Direct wire-byte interpreters might be an exception to this "no benefits" statement, but (1) as @tebbi explains above even that is unclear, and (2) it's also unclear whether that's the implementation strategy that Wasm should optimize for -- wouldn't most users of most engines expect to get the speed that comes from compilation to machine code?)

From a language point of view, I can see that such additional "consistency checks" might be helpful for a human-written language, but they seem significantly less useful for a compilation target.

(For background, I'm mostly looking at this from the GC proposal's perspective, but it's a design decision that should ideally be consistent. For example, struct.get could skip its type parameter and only have a field index, the semantics being "retrieve the n-th field from whatever struct is currently on top of the value stack; fail to validate if that struct doesn't have enough fields". In this proposal, ref.as_non_null could easily be polymorphic and simply return a non-null reference for whatever nullable reference type it found on the stack.)

@lars-t-hansen
Copy link
Contributor

One argument in favor of out annotations on block-like structures is that in a situation where the block leaves multiple values on the stack on the way out, a streaming compiler can pre-allocate space for those values to go into (they won't all fit in registers). As it is, our baseline compiler does not make use of that information and instead ends up emitting a fair amount of data shuffling code in some cases. I probably believe that it would be possible to get rid of that shuffling code if we can know something about how much space will be required along the exit edge from the block.

I don't know that this is a particularly strong argument for us, because it's a baseline compiler - some inefficiencies are to be expected - and because I've not done the detailed analysis, nor am I likely to. I guess there could also be streaming compilers that are not quite so simplistic and would benefit more from information up-front.

@RossTate
Copy link

Hmm, interesting idea. But I suspect you'll have to have a bunch of shuffling code with that anyways. That is, even if you preallocate the space, you don't know which instruction outputs end up going into which spots until you reach the end of the block, so either you'll have to add shuffling code or you'll have to backpatch the code, which is probably what you would have done anyways 😕

But now that I understand where @rossberg meant to go with this post (sorry for the confusion, though the opening arguments do apply just as easily to block-like instructions), and that the pull requests for the relevant changes are in progress, it's probably best to let him close the issue when those are done and I'll open up another one to discuss the topic of annotations for blocks elsewhere.

@binji
Copy link
Member

binji commented Jun 10, 2020

it's probably best to let him close the issue when those are done and I'll open up another one to discuss the topic of annotations for blocks elsewhere.

Sgtm, I'd suggest the design repo, since this could be a separate proposal.

@RossTate
Copy link

Thanks for the suggestion! I was wondering that exact question 😄

@RossTate
Copy link

Done: WebAssembly/design#1352

@titzer
Copy link
Contributor

titzer commented Jul 25, 2020

I'll repeat here (with some delay) my argument for keeping at least some type annotations that "seem" redundant:

Just off the top of my head for call_ref, an interpreter could benefit from a function type annotation to know how many arguments to pop, without having to inspect the value and get to the function's (dynamic) type. In general, the reliance on types only known from the operand stack can manifest in the interpreter as instead requiring dynamic type information from a value operand. (select was different because the interpreter can cheat with a universal value representation and doesn't need to inspect the true/false values.) So I would use this as a guide going forward: don't unnecessarily introduce a dependency on having dynamic type information attached to values when run on an interpreter.

Basically, we should not trade static annotations for dynamic metadata and checks thereof. Even if a system doesn't spend any significant execution time in its interpreter tier, it has to pay the space costs of metadata forever.

@rossberg
Copy link
Member Author

rossberg commented Jul 22, 2022

We still need to resolve this. Please let's focus on new instructions consuming reference operands, of the sort occurring in this and the GC proposal, e.g. call_ref, array.get, etc. Currently, some of these annotate the reference type, some don't. I think there is agreement that we should be consistent, but less so in which way.

AFAICS, the main arguments in this discussion so far have been:

Against annotations:

  • smaller code size
  • lower validation cost

For annotations:

  • less work for interpreters
  • lower risk of future issues with the type system

For previous discussion, see also WebAssembly/gc#241.

@titzer
Copy link
Contributor

titzer commented Jul 22, 2022

Type annotations also benefit baseline compilers. Otherwise baseline compilers have to track full types (rather than machine representations) to model the stack effects of instructions like call.ref or knowing the register class of the result of struct.get.

@jakobkummerow
Copy link
Contributor

I can offer two data points:

Type annotations also benefit baseline compilers. Otherwise baseline compilers have to track full types (rather than machine representations) to model the stack effects of instructions like call.ref or knowing the register class of the result of struct.get.

While that sounds good in theory, V8 does not care about this property, for two reasons:

  • We wouldn't want the code duplication of having one decoder implementation that tracks types (which is required for validation anyway), and another (for compiling) that doesn't. So we use a single (templatized) implementation; as a consequence there's practically no cost to having the compilers track static types of value stack entries.
  • In order to apply simple ("peephole", non-flow-dependent) optimizations, we need to track types on the stack. For example, for a sequence (struct.new ...) (struct.set ...) (or, now that we have 1a-NNLs, for a sequence (local.get $nnl) (struct.get ...)), in order to skip the null-check in struct.set, we need to track the type of the value stack entry. There's no benefit to tracking nullability-but-not-type of the value.

I would not find it surprising if many/most other production-quality engines found themselves in the same situation.


In the CalcEngine module that's produced by the current versions of J2Wasm and Binaryen, we have:

instruction count average bytes fraction of code section
struct.get 138138 4.25 6.1%
array.get 22708 3.02 0.7%
struct.set 8844 4.55 0.4%
struct.get_s 1714 4.68 0.1%
array.set 987 3.32 0.0%
array.get_u 677 3.00 0.0%
array.get_s 75 4.00 0.0%
struct.get_u 30 5.00 0.0%

The total size of the Code section is 9.7 MB, so saving one byte each (which is a lower bound for dropping type immediates) for each of these instructions would save 173 KB or 1.8%. These are uncompressed numbers, and I don't have an easy way to measure the effect on compressed modules; lacking a concrete measurement I would guess that the relative savings are similar (which is what we saw for previous questions of "should we drop some bytes here or there"). Note that there is some fairly obvious opportunity for saving some code size in that module, so the relative impact of these type annotations will get bigger after we've picked other low-hanging fruit; also if we had to add immediates to call_ref (and maybe others?) that would further increase the delta.

Same table for Dart's "barista3" benchmark:

instruction count average bytes fraction of code section
struct.get 23957 4.35 14.1%
struct.set 11040 4.63 6.9%
array.get 222 3.00 0.1%
array.set 189 3.06 0.1%
array.get_u 140 3.03 0.1%
array.get_s 6 3.00 0.0%

Saving one byte for each of these would add up to 35 KB out of 746 KB or 4.7%.

(To clarify: the "average bytes" are for the entire instruction, so e.g. 4.25 means that struct.get, on average, needs two bytes 0xfb 0x03 for the instruction itself and another 2.25 bytes for type immediate and field index immediate.)

@askeksa-google
Copy link

Just to quickly recap, the actual numbers for barista3 are 5.8% uncompressed and 6.4% compressed (zopfli).

@titzer
Copy link
Contributor

titzer commented Jul 22, 2022

I would not find it surprising if many/most other production-quality engines found themselves in the same situation.

It's already the case that different production engines have made different tradeoffs. For example, the SpiderMonkey baseline compiler has its own dedicated decoding logic and its abstract stack does not model full types (instead, kinds, constants, register classes, and spill status). That pays off: SpiderMonkey baseline compiler generates code 4x faster than Liftoff.

While current production engine tiering configurations inform designs, we shouldn't allow them to dictate tradeoffs made in other, or future designs. This is what motivated my comments above. An entirely reasonable 3-tier design (interpreter, baseline compiler, optimizing compiler), or a design where validation is separate from baseline compilation for another reason, would be penalized by forcing them to model types.

There are at least two other instances that I can think of where modeling types (i.e. reconstructing the validation algorithm) is a detriment: the dynamic control stack in wamr-classic, and lazily generating stack maps for GC. In general, though we are focused on execution tiers here, any consumer that wants to model the value stack, even just for its height, would benefit from the property that I outlined: the immediates are enough to correctly model the stack height/kind at every instruction.

@eqrion
Copy link
Contributor

eqrion commented Jul 22, 2022

I would not find it surprising if many/most other production-quality engines found themselves in the same situation.

It's already the case that different production engines have made different tradeoffs. For example, the SpiderMonkey baseline compiler has its own dedicated decoding logic and its abstract stack does not model full types (instead, kinds, constants, register classes, and spill status).

I don't have an opinion on the larger issue here, but just wanted to clarify this point.

Our baseline compiler performs function validation at the same time as code generation. Function validation requires the value stack, so we always have the full typing of the stack available. For code generation, we keep a parallel stack that models kinds/constants/register classes/spill status as you say.

If we were to (for some reason) pre-validate functions before compilation, then theoretically we could try to eliminate the full type tracking. I could see a reason to do this for an interpreter, but I'm unsure the benefit for a compiler. Even if you compile functions lazily, my understanding is that it's legal to defer validation of a function until the point the function it is called for the first time.

@tlively
Copy link
Member

tlively commented Aug 4, 2022

I was talking to @kripken about the graph coloring example offline, and we realized that we don't know how adding annotations would fix it. It seems like the only annotations you could actually add to that program would be for type $V, but that doesn't give you any new information.

@RossTate, are we missing something here, or does the type system in that example have NP-hardness even in the presence of annotations on struct.get and struct.set?

@conrad-watt
Copy link
Contributor

I think we can sidestep the specifics of the graph colouring construction in that presentation by observing that we can't define a convincing linear scan validation algorithm for struct access instructions consuming an opaque type described only by multiple struct-shaped supertype bounds, if said struct access instructions don't have input type annotations. IIUC that's the "essence" that's important here.

@RossTate
Copy link

RossTate commented Aug 4, 2022

My understanding is that $V would not be allowable as the annotation for struct.get; the annotation has to be specifically a struct type. That would mean each of the initial struct.get instructions would have to be annotated with the specific struct supertype of $V that makes the program type-check, so each such annotation would in turn specify the color of the respective vertex that demonstrates the graph is 3-colorable.

@tlively
Copy link
Member

tlively commented Aug 4, 2022

Aha, thanks, @RossTate! Before understanding that, it had seemed that the example had no bearing on the present discussion since it would be equally broken with or without annotations.

@eqrion
Copy link
Contributor

eqrion commented Aug 5, 2022

My understanding is that $V would not be allowable as the annotation for struct.get; the annotation has to be specifically a struct type. That would mean each of the initial struct.get instructions would have to be annotated with the specific struct supertype of $V that makes the program type-check, so each such annotation would in turn specify the color of the respective vertex that demonstrates the graph is 3-colorable.

@RossTate I may be missing something, but doesn't this break the definition of subsumption given above as well?

Substitutability/subsumption, by definition, says that if an instruction type-checks with a certain (list of) input type, then it also type-checks with all subtypes of that input type. So if it type-checks with an input type that is a struct with some field, then it must also type-check with a type variable whose upper bound is that struct type—even if that type variable also has another upper bound that is a struct with a differently typed field.

struct.get supertype-of-$V would validate, but struct.get $V would not in the above rules for annotations.

The side-condition 'type annotation must be a concrete struct type' seems identical in practice to 'input type must be a concrete struct type'?

@RossTate
Copy link

RossTate commented Aug 5, 2022

Good questions. Before getting technical, let me demystify and motivate subsumption a bit for everyone.

We like to be able to inline functions. Consider a subtyping A <: B and the obvious function $upcast_A_to_B: [A] -> [B] that simply returns its input, i.e. is the identify function except for a loss in type precision. Given what this function does, you'd expect to be able to inline a call $upcast_A_to_B (local.get $a) to just local.get $a. But, if your type system does not satisfy subsumption, then that increase in type precision could cause a subsequent instruction to fail. So, if you want inlining, you need subsumption.

Now for the technical answers.

My understanding is that $V would not be allowable as the annotation for struct.get; the annotation has to be specifically a struct type. That would mean each of the initial struct.get instructions would have to be annotated with the specific struct supertype of $V that makes the program type-check, so each such annotation would in turn specify the color of the respective vertex that demonstrates the graph is 3-colorable.

@RossTate I may be missing something, but doesn't this break the definition of subsumption given above as well?

Subsumption is a property of a fixed instruction: if a given fixed instruction type-checks with certain inputs, that same instruction must necessarily type-checks with more precise inputs. A type annotation is part of the instruction, so changing the type annotation results in a different instruction; as such, subsumption says nothing about the relationship between these instructions.

The side-condition 'type annotation must be a concrete struct type' seems identical in practice to 'input type must be a concrete struct type'?

That impression is probably because at present "practice" for WebAssembly is not including programs using type variables. If wasm ever gets to generics (which, admittedly, I'm pretty skeptical of at this point) or to powerful forms of separate compilation like Jawa (which I'm also pretty skeptical of at this point), then you will often have programs where you know a value belongs to some abstract type which you know is at least a struct with some fields, at which point requiring an annotation to be a struct type will be very different from requiring an input to belong to a struct type and not a type variable.

If it helps to clarify things, there's another way the complexity issue can be rectified using annotations. At present, the annotation for local.get essentially specifies what to upcast the input to, and the restriction that the annotation be a struct type guarantees that the retrieved value has a principle type to use. Instead, the annotation for local.get could be just the (upcast) type of the field at hand, and validation would just check that the input has a field of (at least) that type.


As a reminder, I am giving these technical answers not to force a particular decision. Rather, their implication is just to make the group understand that you must make a decision: remove type annotations xor maintain forward compatibility with some features like multiple upper bounds on imported types. My recommendation would be to do the former, but it's your decision.

@tlively
Copy link
Member

tlively commented Aug 5, 2022

A third option would be to compromise slightly on substitutability under the assumption that any tool that would benefit from substitutability would also be able to add the type annotation as necessary (under any type system that can otherwise work without annotations).

@RossTate
Copy link

RossTate commented Aug 5, 2022

You can't really compromise slightly on a global principle; it holds or it doesn't hold. Rather, what you're proposing is that the extensions in discussion would not be subtypes but would rather have erasable coercions. For example, a type import would not be a subtype of its additional upper bounds; rather, there would be an upcast_to_bound <n> : [(ref $V)] -> [(ref $bound_n_plus_1)] instruction, which is what call $upcast_V_to_BoundNPlus1 would inline to. This has a number of downsides but, given that I find it unlikely wasm will get to this point, that's fine with me.

@eqrion
Copy link
Contributor

eqrion commented Aug 8, 2022

Thanks Ross for the answer, that helps a lot.

Subsumption is a property of a fixed instruction: if a given fixed instruction type-checks with certain inputs, that same instruction must necessarily type-checks with more precise inputs. A type annotation is part of the instruction, so changing the type annotation results in a different instruction; as such, subsumption says nothing about the relationship between these instructions.

So if the abstract syntax of struct.get included a type annotation, but we introduced a fallible type inference algorithm (similar to how @titzer framed the issue above) to generate that type annotation from a binary format that doesn't include the annotation, that would not strictly violate subsumption? (in the world where multiple upper-bounded type imports is a thing). I'm just trying to understand the options here for the future, a fallible type inference algorithm could probably be introduced at the time it would solve an issue.

Personally, I would agree that it's unclear whether we'll have multiple upper-bounded type imports ever and if so, what restrictions we could add on them to accommodate design decisions we make now.

I also go back and forth on whether a 6% binary size reduction is acceptable for an MVP. If current Wasm-GC ports are ~2x the size of equivalent JS, 5% would need to be part of a larger story of getting to parity. It would help to have that larger discussion to be able to prioritize this 6%.

@askeksa-google
Copy link

askeksa-google commented Aug 9, 2022

I did some more measurements to estimate the additional overhead of adding a type index to call_ref. Here's the table from earlier with an extra "Full" row that includes these (edit: brotli columns added):

Module, uncompressed Module, zopfli Module, brotli Code section, uncompressed Code section, zopfli Code section, brotli
Full 1695174 521675 399958 1031085 373826 281649
Current 1694335 521514 399999 1030246 373537 281625
Without 1635418 495679 380343 971329 348648 261510
Saving vs Current 3.48% 4.95% 4.91% 5.72% 6.66% 7.14%
Saving vs Full 3.53% 4.98% 4.90% 5.80% 6.74% 7.15%

The additional overhead looks tiny here, but I think these numbers underestimate the typical overhead of adding a type index to call_ref, for two reasons:

  • In dart2wasm, call_ref is only used for closure calls, so there are very few call_ref in the program (less than 0.1% of instructions).
  • The function types for closure calls use only one type per arity, so the entropy of the call_ref type index is small.

It would be very interesting to see similar numbers from a compiler that uses call_ref more extensively and variedly, such as J2Wasm.

@kripken
Copy link
Member

kripken commented Aug 9, 2022

As a quick estimate for call_ref specifically in J2Wasm, there are 9 K call_refs in a 9 MB binary, and the number of types is around 18 K so type LEBs are probably 2-3 bytes, so the code size increase would be 0.2%-0.3%. There are over an order of magnitude more struct.gets, so those matter a lot more than call_ref for size.

Overall a size improvement of ~4% seems significant here and I think we should remove the annotations. My mind could be changed by a plausible example that could not be fixed by annotations later, but so far everything mentioned seems like it could work (e.g. with type imports, we could say that in the coloring example above $V - a type import - would not be allowable as an input to struct.get, which is basically parallel to what @RossTate said we need to do anyhow; and adding annotations later could be done in a single new instruction as @manoskouk mentioned in the meeting, which is not burdensome).

@RossTate
Copy link

RossTate commented Aug 9, 2022

@eqrion I'm not entirely I understand the option you're asking me feedback on, so here's my best guess. Using a fallible inference algorithm (in the future) to recover missing annotations has the same issues as using an incomplete validation algorithm.

That said, I'm not sure we need more options to consider. It's been about a week since someone posed the question of whether there were concrete features people wanted to maintain forwards compatibility with. Since then, while I've articulated why you can't always simply add annotations later on to fix an incompatibility, my own answer to the question was that the features I could think of already have compatibility issues with other features of WebAssembly, and I don't believe anyone else has provided any specific features to consider. So maybe y'all are at a point where you can decide to go with removing annotations, which seemed to be the prevailing sentiment of comments in the past week?

@titzer
Copy link
Contributor

titzer commented Aug 9, 2022

@eqrion I also feel like a 5% space savings doesn't really solve the issue if 2x is what is needed. Also, that 5% space is compressed size. My comment in the meeting a couple weeks back about including a general purpose compression algorithm in our calculations wasn't fully articulated. The compression schemes we have right now are 5% smaller without annotations, but if a different compression algorithm is invented that does better, our cost model is outdated. That's why I think it's odd that we include GP compression schemes in a criteria we are trying to optimizing for.

To wit, a layer 1 compression algorithm that is aware of Wasm's type system can and should completely outclass GP compression. In particular, I suspect that the type annotations necessary for locals are also a big chunk of binary size. Layer 1 compression could probably eliminate those for definitely-initialized locals.

which seemed to be the prevailing sentiment of comments in the past week?

There is still disagreement about this. I don't think the discussion has turned up anything that has fundamentally changed the calculus, or any opinions, yet.

@rossberg
Copy link
Member Author

I don't think the discussion has turned up anything that has fundamentally changed the calculus, or any opinions, yet.

Agreed. If anything, it turned up how slippery the slope would be. ;)

@jakobkummerow
Copy link
Contributor

@titzer:

a layer 1 compression algorithm that is aware of Wasm's type system can and should completely outclass GP compression.

I encourage you to develop and/or present such an algorithm. On its own, "we shouldn't do anything now because I'm sure someone else will invent a rounder wheel in the future, which will solve all our problems" is not a convincing argument.

I suspect that the type annotations necessary for locals are also a big chunk of binary size

In the Sheets module measured above, 200 KB are used for defining 61K locals across 28K functions, accounting for 2.1% of the uncompressed code section size.

@RossTate
Copy link

I don't think the discussion has turned up anything that has fundamentally changed the calculus, or any opinions, yet.

I had thought that the notable lack of motivating examples benefiting from annotations might have been a fundamental change to the decision calculus. If it would help move the discussion forward, I can explain why the one example that has been mentioned—Jawa—does not actually need multiple upper bounds on type imports, leaving us with no motivating examples. (The short of it is that, due to the complications caused by multiple inheritance, there's no significant run-time benefit to knowing an object implements an interface, so there's no need to encode subtyping between interfaces when compiling to wasm.)

If anything, it turned up how slippery the slope would be. ;)

Can you explain what you mean by this?


On the larger issue of binary size, my understanding is that j2wasm (and maybe dart2wasm) is using aggressive closed-world AOT optimizations to make performance competitive with j2cl. Is that understanding correct? If it is, such performance optimizations can substantially increase code size, so what happens to binary size (with and without annotations) and performance if instead one optimizes for size rather than performance?

@askeksa-google
Copy link

Indeed dart2wasm performs closed-world optimizations, but most of these are ones that reduce code size, e.g. tree shaking, parameter elimination, constant propagation, devirtualization.

@RossTate
Copy link

Ah, I would have thought it did inlining as well largely due to its symbiotic role with devirtualization (devirtualization provides an inlining opportunity, which moves virtual calls to a more specialized context, which enables more devirtualization).

@kripken
Copy link
Member

kripken commented Aug 11, 2022

About the importance of binary size, I look at it like this: if we could easily reduce the size of all images on the Web by 5% then we'd want to do that. Likewise for video, JS, etc. The scale of the Web is so big that even a 5% win on one format matters a lot! Maybe one user can't easily notice it by themselves, but plenty of data shows that even small changes to download size have real-world impacts on average user experience. And 5% less total bandwidth across the entire Web is a significant amount of work and electricity over billions of downloads.

Because of that there are various projects working to shrink images, video, JS etc., both on the spec side and tools side. As wasm rises in popularity its size will matter more and get closer to those. In fact we already do a lot of work for code size on the tools level, in Binaryen and elsewhere - that has been motivated by existing important real-world use cases that care about binary size today.

So even 5% is worth thinking about. Yes, there may be other ways to shrink size (L1 compression), and maybe we'll get to them, but that's uncertain. And, yes, maybe we are 2x larger than JS now, and maybe 5% won't fix that, but it's still 5% better. Whenever we have a good practical way to reduce size that's worth considering. Of course, we need to consider the other side of the tradeoff as well.

@eqrion
Copy link
Contributor

eqrion commented Aug 11, 2022

@kripken Thanks for that, I think that's well said.

So even 5% is worth thinking about. Yes, there may be other ways to shrink size (L1 compression), and maybe we'll get to them, but that's uncertain. And, yes, maybe we are 2x larger than JS now, and maybe 5% won't fix that, but it's still 5% better. Whenever we have a good practical way to reduce size that's worth considering. Of course, we need to consider the other side of the tradeoff as well.

I just want to clarify my comments regarding the '5% vs 2x' point. I definitely agree that improvements are worth considering even if they don't solve the larger issue completely.

I would just like to better understand why wasm-GC binary size is 2x JS binary size. This would help me to evaluate the relative importance of a 5% improvement vs any spec concerns. Are the wasm-GC toolchains doing more inlining? Are call sequences or prologues larger? Is there a runtime library required with wasm that JS doesn't need? Is the data/elem section significant?

For example, if there's low hanging fruit elsewhere that doesn't have spec concerns then that seems like the first step before dropping annotations. If the other required fixes are likely to be difficult too, then it may make sense to start with dropping annotations.

But without more context on the '2x' number, it's hard for me to weigh the improvement against theoretical spec concerns.

@rossberg
Copy link
Member Author

@eqrion:

I would just like to better understand why wasm-GC binary size is 2x JS binary size.

Because JS is a high-level language and Wasm isn't. Respective size comparisons are mostly apples to oranges. Native code also tends to be much larger than JS source code, probably by more than 2x.

@eqrion
Copy link
Contributor

eqrion commented Aug 11, 2022

@eqrion:

I would just like to better understand why wasm-GC binary size is 2x JS binary size.

Because JS is a high-level language and Wasm isn't. Respective size comparisons are mostly apples to oranges. Native code also tends to be much larger than JS source code, probably by more than 2x.

I'm aware of the difference between JS and Wasm. All of the items I listed as possible reasons for a size difference would be due to Wasm being a lower-level language. What I'm asking for is quantification to help understand what, if anything, can be done about the difference.

@tlively
Copy link
Member

tlively commented Aug 11, 2022

One thing that might help move the conversation forward is if we could get an example of a type system extension for which the existence of unannotated struct.get and struct.set (even in the presence of additional annotated versions) would be a problem. In other words, it would be nice to have an example where the decision procedure for determining whether an annotation is required is greater than constant time (amortized).

Lacking such an example, it has been hard to convince ourselves that the risk of encountering such a situation is anything but vanishingly small and that the expected benefit of keeping annotations could outweigh the benefit of reducing code size.

Note that for the example with multiple supertypes on type imports, I believe an efficient and sufficient decision procedure for requiring annotations would be to require them when the type of the stack is a subtype of a type import, which would be amortized constant time after preprocessing the types.

@kripken
Copy link
Member

kripken commented Aug 12, 2022

@eqrion

I would just like to better understand why wasm-GC binary size is 2x JS binary size.

Things are still evolving so any specific measurement may end up changing. But I think these are some of the main factors, which include things you mentioned:

  • Inlining. While engines can inline at runtime, I don't think we can assume they'll do as well as JS (JS is higher level so there is more to work with). Also runtime inlining takes time to warm up and may differ between VMs. So we inline more than JS toolchains do, and this helps benchmarks.
  • Library support. Atm we're shipping Java library code for strings, for example - the Strings proposal should remove some of that, but not all. JS has a great standard library that helps lots of things. Sometimes we can call out to JS (e.g. to do Math.log, rather than bundle a log impl) but that has runtime costs.
  • Repeating code patterns like "if null, throw an exception". In JS the code can rely on the VM, but in wasm we'd get a nonrecoverable trap - but J2Wasm needs the ability to catch some of these. Lots of other patterns exist, like x.push in JS is just a few bytes but in wasm it's going to need to handle reallocating the backing array etc.
  • Lots of small stuff like a.x in minified JS is a read from a field in 3 bytes (in wasm that takes more with or without annotations), JS not having type annotations in general (regardless of the current topic, the annotations we definitely will keep are a significant source of entropy), JS having "overloading" ("+" works on numbers or strings), etc. etc.

Those are sorted in my current best guess at relative importance from the most to the least, but again, that's imprecise and may change.

I'm not sure about the data/elem sections. J2Wasm doesn't emit them, but it does emit equivalent stuff in arrays and structs. It's hard to measure that since it's interleaved in the code.

@rossberg

Native code also tends to be much larger than JS source code, probably by more than 2x.

That reminds me, I actually measured JS compared to native code over a decade ago 😄 yes, as you said, JS is surprisingly compact - and it's only gotten smaller since then!

Yes, this is somewhat expected. But the bottom line though is that this matters. If wasm is (say) 2x larger than JS then that will prevent some people from being able to use it. The smaller it is, the more people can use it, and the better it will work for those that do. I wouldn't go as far as to say that "if we make wasm 5% smaller it will be 5% more impactful", but there is a positive connection there.

@dtig
Copy link
Member

dtig commented Aug 25, 2022

Adding a note here that this was discussed in the GC subgroup meeting earlier this week (PR for notes here). The high order bit is that the poll of the room showed that the consensus is to retain type annotations on new instructions. Continuing the discussion for next steps here, some questions:

  • Is the consensus in the meeting sufficient to move forward?
  • If so, what should the next steps be for the OP of this issue w.r.t. call_ref, func.bind?
  • Other thoughts from people who didn't attend the meeting?

@tlively
Copy link
Member

tlively commented Aug 25, 2022

Let's move forward with keeping the existing annotations and adding annotations to call_ref and friends as well. I don't see a need to hold a further vote. @rossberg, can you prepare a spec PR for that? We'll work on updating V8 and Binaryen, too.

@rossberg
Copy link
Member Author

@tlively, done: #76.

@rossberg
Copy link
Member Author

Closing via #76.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests