Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Remove redundant typeidx on gc instructions #241

Closed
wants to merge 3 commits into from
Closed

Conversation

rossberg
Copy link
Member

@rossberg rossberg commented Sep 8, 2021

This removes the remaining redundant type indices on GC instructions, following the discussion in WebAssembly/function-references#27 and #238. It makes GC instructions consistent with what we already decided on for other ref instructions, via WebAssembly/reference-types#99 and WebAssembly/function-references#31.

See MVP.md for the relevant changes. Interpreter and test suite are adjusted accordingly.

@jakobkummerow, PTAL.

@titzer, I know that you have argued against this change. However, we already do the same for other instructions, so do you think an interpreter can actually save much complexity if there are some instructions that have annotations while others don't? It seems like the general infrastructure for side tables is needed either way.

According to @askeksa-google's measurements, this change saves 3.5-5% in code size. It should also reduce validation time, since it avoids redundant checks.

Removing these annotations from static code and instead deriving them during validation could benefit interpreter performance as well. In particular, their side table could store more relevant information directly, e.g., field offset + field type, instead of having just an index of a struct type that it would have to look up and inspect each time, making accesses rather expensive.

@titzer
Copy link
Contributor

titzer commented Sep 8, 2021

I wasn't aware that we reached consensus on this issue, so I think this PR is a little premature, though I do appreciate the amount of work done here.

do you think an interpreter can actually save much complexity if there are some instructions that have annotations while others don't? It seems like the general infrastructure for side tables is needed either way.

Currently Wizard only uses side table entries for control flow, and I think it's best to avoid require new kinds of entries if avoidable. Keep in mind that side-table entries are space cost per instruction, as opposed to other choices like module-relative field indices, which would keep space costs per field declaration.

The only other option (I mentioned in #238) is to put field metadata hanging off object headers. In my opinion, looking at the whole system, that's a mixup of static and dynamic information that points to a clear design mistake in the bytecode. I think it makes more sense to keep metadata hanging off the module (instance) or the code (sidetable), whenever possible, and not objects themselves. (Just imagine what happens when an engine completes tier-up from interpreter to all-JIT for a module; at that point it should be able to free the space for interpreter side tables and module metadata. This would be impossible to do in a multi-module setting if metadata hangs off objects, because other modules that have not completed tiering up need to indirect through objects to find field metadata--that's a clear design flaw IMHO).

As for the other instructions like ref.is_null that eschewed static information, the consequence has been that it impacts runtime representations; eventually Wizard just gave up and settled on one representation for all null references. I don't know if V8 has to do something weird because of JavaScript's null; maybe it just chose JS null as Wasm null.

@titzer
Copy link
Contributor

titzer commented Sep 8, 2021

Also, just for the record, field access for GC objects in the interpreter is not going to be pretty. Since we want to pack objects and have unboxed representations, and interpreters usually have value tags, there are a number of things that need to be dealt with such as a.) field's actual offset into object, b.) field's size in bytes (typically 4 or 8 for non-packed fields), c.) field type's value tag. There'll be branches. It won't be pretty :-)

So please interpret (no pun intended) my above comment in terms of proper placing of metadata, as it probably makes only 1 or 2 loads' difference for an interpreter, no matter where you put that metadata.

Copy link
Contributor

@jakobkummerow jakobkummerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titzer: I don't understand why you say that maintaining type information for an interpreter's purposes requires per-instruction annotations in a side table (or maybe I'm misunderstanding that entirely). Doesn't your interpreter keep a value stack, and can't it store type information for the entries on this stack? I understand that it'll still be an additional thing that an interpreter has to do, but at least the memory cost should be per-value-stack-entry, right?

About this PR: I'm generally in favor, and I don't feel strongly about the timing either way. Considering that it's a fairly superficial change to make, we could postpone it until more substantial questions have been settled; but conversely it's also not much of a burden to make this change now. Breaking changes are always somewhat inconvenient, but I've just discussed with @tlively that we can introduce temporary flags to both V8 and Binaryen to keep both behaviors for a while and thereby enable a smoothened transition path. Exact timing of these implementation changes TBD, but that doesn't block us from updating this repository right now.

I'm not sufficiently familiar with the reference interpreter to do a meaningful review of those changes.

Regarding the "Field Accessors" idea proposed in #238 (which I don't have a whole lot of an opinion on myself), I don't think making this change now ends that discussion; if consensus emerges that we should do (or at least experiment with) something like Field Accessors, we can do so at any time. (Depending on what goals we ultimately settle on, we may even want/need both: classic simple struct.get for simple types from the same module, and struct.get_with_field_accessor for generics and/or unknown-supertype imported types or whatnot, with some binary size and/or performance penalty traded for improved flexibility.)

@@ -246,55 +246,55 @@ This can compile to machine code that (1) reads the RTT from `$x`, (2) checks th

#### Structures

* `struct.new_with_rtt <typeidx>` allocates a structure with RTT information determining its [runtime type](#values) and initialises its fields with given values
- `struct.new_with_rtt $t : [t'* (rtt n $t)] -> [(ref $t)]`
* `struct.new` allocates a structure with RTT information determining its [runtime type](#values) and initialises its fields with given values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both instructions, with distinct names to make the .wat format unique:

  • struct.new has a static type parameter and no RTT on the stack (performs an implicit rtt.canon)
  • struct.new_with_rtt has no static type parameter and takes an RTT from the stack.

Same for the _default variants, and arrays.

So for this patch, I suggest to undo this renaming, because otherwise you'll have to redo it very soon, and the churn just creates confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to a separate PR, but note that this is just fixing outdated info in the overview doc – test suite and interpreter have long changed the names (which you suggested yourself IIRC).

because otherwise you'll have to redo it very soon

Hm, there currently is no plan or concrete suggestion for having RTT-less instructions, other than some experiments? (FWIW, the discussion about that seems based on misconceptions about the motivation for having explicit RTTs. But let's not go into that here, since there are already too many off-topic tangents on this PR.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there currently is no plan or concrete suggestion for having RTT-less instructions, other than some experiments?

This depends on what you mean by experiments. In the August 24 meeting we had consensus to investigate a hybrid nominal-structural design throughout the ecosystem, including in the spec interpreter and MVP docs. Before updating this repo, we are iterating on the design in this doc, which currently describes new RTT-less instructions for operating on the nominal types. Happy to discuss this further in a different thread or on that doc, but I wanted to make sure we were all on the same page about this.

)
(func (export "get") (param $i i32) (result f32)
(call $get (local.get $i)
(array.new_default $vec (i32.const 3) (rtt.canon $vec))
(array.new_default (i32.const 3) (rtt.canon $vec))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, so this also doesn't match what Binaryen currently accepts (or produces when disassembling), it'd have to be array.new_default_with_rtt. I guess the cleanup effort would be significant, and hence better be done separately.

@titzer
Copy link
Contributor

titzer commented Sep 8, 2021

@jakobkummerow Hmm, that's an interesting idea to use the value type tags, but after a little bit of thinking, I think it has other downsides. Wizard does use value type tags for value stack slots, but they are broad classes (i32, i64, f32, f64, ref). I don't think it's good to rely on them for anything else than distinguishing value representations for when Wizard's GC scans value stacks, because that creates complexity elsewhere (e.g. a given reference now could have multiple different type tags, depending on where it came from, so comparing tagged values for equality in the engine is more complex). Requiring dynamic type tags to recover lost static information also eliminates an advantage of a conservative GC, if an engine chooses that. (E.g. I am talking to JSC folks about experimenting with Wizard's interpreter design in JSC, which does support conservative stack scanning, and would remove the need for type tags at all).

In short, I'll repeat what I've said before. Removing static type information from bytecodes because it is (or can only be) recovered from validation is a design mistake. We made that design mistake once with select and I think this change would be even worse. It removes many alternatives in runtime system design.

@RossTate
Copy link
Contributor

RossTate commented Sep 8, 2021

Field Accessors solved multiple problems. One of those problems can be addressed by adding more instructions in the future. But another problem cannot be fixed by adding instructions. This change makes type-checking exponential time when combined with support for multiple upper bounds, which is important to support for a number of Post-MVP features. So, this PR should probably include an accompanying change (such as one of the ones described in #238) to preserve the extensibility of the MVP. I can write that up, possibly tomorrow, to give a sense of what that would look like.

@jakobkummerow
Copy link
Contributor

@titzer: In V8, our solution is that we've refactored the former ValueType (which was as coarse as you describe) into a bit field: now we have 5 bits for i32|...|ref|rtt (we call it "kind"), 20 bits for the referenced heap type if kind==ref|optref|rtt|rtt_with_depth, and another 6 bits for the depth if kind==rtt_with_depth. Depending on use case, we can check for just the kind (e.g. when finding references to scan with the GC) or the other fields as well (e.g. when we want to know which specific struct type it is). I assume that a similar design would work for an interpreter, but I haven't tried it.

@RossTate: That's all fair; all I'm saying is that landing this PR now-ish doesn't prevent us from adding Field Accessors to the instructions that are losing their static type annotations here, if we decide (next week or next month or whenever) to do so. Regarding exponential-time checking in particular (which of course nobody wants), I'd assume that that, too, can be avoided with new instructions: the old instructions could simply not validate with types that have multiple upper bounds.

@RossTate
Copy link
Contributor

RossTate commented Sep 8, 2021

I'd assume that that, too, can be avoided with new instructions: the old instructions could simply not validate with types that have multiple upper bounds.

Unfortunately your suggestion is essentially to break subsumption, a property that's important for keeping transformations like function inlining type-preserving. In fact, removing type indices only works because you have subsumption, so your fix might actually make type-checking exponential time itself.

So I don't think it's safe to assume we can solve the problem by adding new instructions. The accompanying change I have in mind is simpler anyways.

@RossTate
Copy link
Contributor

RossTate commented Sep 9, 2021

So WebAssembly has the subsumption rule/property that implies that if an instruction type-checks with ref A on the (input) stack, then it also type-checks with ref B on the (input) stack whenever B is a subtype of A, i.e. B <: A. Besides formalizing an intuition for what subtyping means, subsumption is also very important for transformability of code (e.g. inlining) and reduction of code size (otherwise you need explicit upcasts all over the place).

But subsumption can be problematic for decidability (and yet, at the same time, help with decidability). The issue here is that if you have a ref X on the stack, where X is some "abstract" type variable, and you have some instruction like struct.get that expects some more "concrete" type, then subsumption requires you to enumerate through all the concrete supertypes of X in order to see if any of them make the instruction type-check. And, since the output type of struct.get depends on the choice of concrete type, you get exponential type-checking if there is no "principal" choice.

The way we can fix this is to encode how iTalX addressed the problem: ensure that either every abstract type variable has a principal concrete supertype or is uninstantiable (i.e. there is no concrete type that satisfies the constraints on the type variable, which implies the current code location is unreachable). How do we do these? Well, if our concrete types (i.e. types providing fields) are nominal types, we can restrict definitions of new nominal types to specify at most one nominal supertype. This makes the nominal subtyping hierarchy into a tree, which ensures a nice property: a (finite) collection of nominal types has a common nominal subtype if and only if one nominal type in the collection is a subtype of all the others.

As a consequence, to type-check struct.get when there's a ref X on the stack, we can look through the concrete upper bounds of X: if one of them is a subtype of the rest, then we use the field type that one specifies; otherwise, X is uninstantiable and we deem the instruction's return type to be unreachable.

Now, it's already the case that we restrict nominal type definitions to declaring at most one supertype. So why does this require a change? Well, the issue is that the above type-checking rule for struct.get is only sound if we have full knowledge of the subtyping relationships between the concrete supertypes of X. But if we can arbitrarily import two separate concrete types A and B, then an X that is a subtype of both A and B might be instantiable even if neither is a subtype of the other according to what the importing module knows of them. So we need to restrict the arbitrariness of imports.

One way you might think to do so is to say that if you import both A and B without any subtyping relation between them, then they must be unrelated in the exporting module. That's along the right line, and it does make type-checking for struct.get sound, but unfortunately it breaks module composition. For example, if you combine one module importing A with another module importing B, the combined module importing both A and B now is restricted in how they could be instantiated even though the original separate modules had no such restriction.

To solve this, we can add a notion of "hierarchies". A module can define/import hierarchies, and every hierarchy provides a root/top nominal type. When you define/import a nominal type, you always specify a nominal supertype, which ensures that all nominal types belong to some hierarchy. If you import two nominal types from the same hierarchy, then if your imports don't explicitly say they're related, it must be the case that they're not related in the exporting module. If you import/parameterize-by an abstract type variable, you specify the hierarchy that it belongs to along with as many upper bounds of abstract/concrete types from that hierarchy as you want (but you don't specify any fields, as all fields are derived from the upper bounds).

This keeps everything efficiently decidable, keeps modules composable, and makes room for some useful Post-MVP extensions (and one thing regarding v-tables that might make sense for the MVP, but I'll leave that for later if it's utility comes up).


With the rationale done, here are the accompanying changes to this document:

  • eliminate the non-_extending type definitions
  • add a hierarchy type definition code whose immediates are 1, 0, vec(fieldtype)

If a u32 is used to denote a nominal type and references the hierarchy type definition, it denotes the "top" nominal type of the hierarchy, whose fields are the specified vector.

The reason for the immediates 1 and 0 is that, in the Post-MVP, it will be useful to associate more things with a nominal name (in order to support things like generics). So the 1 indicates that there's one element in the vector, and 0, vec(fieldtype) defines that element. The 0 then is a code indicating that the element specifies a list of fields.

@jakobkummerow Do those accompanying changes seem feasible? (There are also changes to type imports, but my understanding is that type imports is not currently implemented, so I'm not going into those.)

(As a separate note, I would suggesting renaming _extending to _refining, since those codes are letting you refine the existing fields of the supertype, and since there will be uses for just extending a supertype with new fields.)

@jakobkummerow
Copy link
Contributor

@RossTate

Do those accompanying changes seem feasible?

No, not at this time. The stated goal (see #234) is to maintain the "structural" type definitions we've had so far, in order to allow comparative evaluations, so we're not going to eliminate them for now.

As for the more distant future, nothing is set in stone. We can add hierarchy whenever we want. (Personally I'd like to see a somewhat more complete description of its semantics, and some discussion of its merits, before starting to implement.)

After reading your post, I also still think that we could (theoretically; not making a statement on likelihood) ship 0xfb03 = struct.get (without type immediate, as specified in this PR) in the MVP, and afterwards (all at once):
(1) refine its definition (in a backwards-compatible manner, i.e. preserving existing semantics) to require an element with unique supertype chain on the value stack ("a ref X with concrete type X", as you call it)
(2) introduce new types that wouldn't fulfill this property
(3) introduce a new 0xfbYZ = struct.get_new_fancy with the ability to deal with these types.
Alternatively, these things could happen at some point between now and shipping the MVP; in which case we could re-evaluate whether the old struct.get is still needed then; I'd guess that simple accesses to simple types will likely still be useful for the common (probably?) case of entirely intra-module-contained operations.

Put differently: before this PR here, the type immediate of struct.get is/was 100% redundant information: the value actually found on the value stack had to match it (i.e. its type must be a subtype of the immediate), else validation failure. I don't see how dropping this redundant information paints us into any unrecoverable corner.

@RossTate
Copy link
Contributor

Oh, I forgot about keeping the structural stuff around. In that case, the change would be to require that the u32 immediate for _extending type definitions must refer to either hierarchy or other _extending type definitions.

Put differently: before this PR here, the type immediate of struct.get is/was 100% redundant information: the value actually found on the value stack had to match it (i.e. its type must be a subtype of the immediate), else validation failure. I don't see how dropping this redundant information paints us into any unrecoverable corner.

If "match it" meant "be equal to", then it would be redundant. But, you said "subtype of". The effect of that is that the output type of struct.get with an explicit type is determined by that explicit type rather than the type of the value on the stack in potentially a non-deterministic manner (due to subtyping). If you remove that immediate, then the output type of struct.get is determined by the type of the value on the stack. Last year, I presented a proof (slides) that (1) this change to index-only instructions, combined with (2) this core WebAssembly typing rule and (3) multiple arbitrary upper bounds on type variables/imports, makes type-checking NP-Hard (as type-checking solves vertex coloring).

I can't really do anything about the consequence of that theorem. It just exists as a truth we have to deal with. What I can do is prevent one of the three assumptions of that theorem from holding. Field accessors work to prevent (1), while also providing additional functionality at the same time. Hierarchies work to prevent (3), by ensuring that either there's necessarily a principle upper bound (if any) that provides fields or the instruction is provably unreachable. Any solution that does not prevent either (1) or (3) necessarily must prevent (2), i.e. must remove a key typing rule from core WebAssembly.

@jakobkummerow
Copy link
Contributor

Do you agree that with the current type system, (3) can't happen?

If so, then what I'm saying is that we can put mitigations in place against (1)+(2)+(3) occurring together if and when we're making (3) a possibility.

Pragmatically: right now, in a single pass over any given function, we can determine in O(1) and unambiguously the type for any value on the stack, so we can also deterministically and efficiently determine the type of any field loaded by a struct.get.
For the sake of argument, suppose we ship the current rules in the MVP, and then afterwards want to add the ability for modules to import or define types with multiple upper bounds. Backwards-compatibility means we can't break MVP-conformant modules. So at that time, we can prevent (1) by changing the validation rules of struct.get to include "if the type of the n-th field of the value on the stack cannot be uniquely and deterministically determined in constant time, the instruction fails to validate" (or similar). That would mean that old modules continue to behave as they used to; while newly created modules that want to make use of the new types can be built to use struct.get_new_fancy instead.

Another thing that's becoming clear from these discussions is that struct.get and struct.new_with_rtt fall into slightly different categories. IIUC, @RossTate's concerns only apply to the former; so one way to address them would be to limit the type-annotation-dropping change to just the {struct,array}.new_{,default_}with_rtt instructions, where we currently require exact equality of the type immediate and the RTT's type. (That said, I still insist that that whole discussion doesn't need to block this PR from landing; if needed we can revert half of it later, as opposed to landing only half of it now.)

@RossTate
Copy link
Contributor

Another thing that's becoming clear from these discussions is that struct.get and struct.new_with_rtt fall into slightly different categories. IIUC, @RossTate's concerns only apply to the former; so one way to address them would be to limit the type-annotation-dropping change to just the {struct,array}.new_{,default_}with_rtt instructions, where we currently require exact equality of the type immediate and the RTT's type.

Yes, this is fine. Because rtt is invariant, it uniquely determines the type that directs type-checking.

So at that time, we can prevent (1) by changing the validation rules of struct.get to include "if the type of the n-th field of the value on the stack cannot be uniquely and deterministically determined in constant time, the instruction fails to validate" (or similar).

Let's suppose ref A has the property that the type of its n-th field is uniquely/deterministically determined to be tA, and as such struct.get accepts it. Let's suppose an instruction instr preceding struct.get outputs ref X, and ref X is a subtype of ref A but does not have a uniquely determined n-th field. According to this core WebAssembly rule, the instruction sequence instr; struct.get must still type-check with output type tA despite the field not being uniquely determined for the type resulting from instr. This applies to every supertype of ref X that has a uniquely determined n-th field. So if ref X is also a subtype of ref B whose n-th field's type is uniquely tB, then instr; struct.get must also be accepted with output type tB. This makes your restricted type system still NP-Hard.

That's the gravity of subsumptive subtyping, which is the property that core rule captures. C# threw out that rule in order to be efficiently type-checkable. As a consequence, it is quite a pain to safely perform code transformations on C# code. That's not a huge problem for C#, though, because most transformations are actually done at the bytecode level, which uses field accessors to keep type-checking efficiently decidable. (And our typed assembly language for C# used hierarchies to keep type-checking efficiently decidable.) Java had to throw out declarative typing rules altogether, because people were having difficulties porting their code between Java compilers because they were necessarily all incomplete implementations of the type system. For Java 8, they got all the compiler writers together and standardized a type-checking algorithm they would all use, which became part of the spec. In the process, they developed a superalgorithm that was more precise than all the other algorithms. But they found that the improvement actually broke some key codebases because it discovered ambiguities (along the lines of non-uniquely determined field types - though really method types) in them that none of the previous systems had noticed. So they broke backwards compatibility and worked with those teams to change the code to conform to the new standard. None of this affected the Java bytecode though, because the bytecode uses field accessors.

So I understand the appeal of the simplicity of this change, and I recognize that it's unintuitive how problematic the change is permanently, but throughout my decade of consulting with language teams I have seen many times how these seemingly benign little things end up biting you in the ass.

@askeksa-google
Copy link

The measurements mentioned by @rossberg which indicated a code size saving of 3.5% uncompressed and 5% compressed just removed the type from struct.get and struct.set. With the type removed from all of the instructions in this PR, I see a saving of 5.8% uncompressed and 6.4% compressed (zopfli).

@titzer
Copy link
Contributor

titzer commented Sep 10, 2021

@askeksa-google Do you have numbers if the field index were not type-relative but module relative (i.e. bigger)?

@askeksa-google
Copy link

That comparison, without the change in this PR, is described here (2% smaller uncompressed, 0.2% bigger zopfli-compressed for the version that avoids the redundancy of superclass fields). With this change on top (for all the other instructions), I see a total saving of 4.2% uncompressed and 1.1% zopfli-compressed. So compared to leaving out the type, module-global field indices cost more than 5% on the compressed size in both cases.

@tlively
Copy link
Member

tlively commented Sep 10, 2021

Let's suppose ref A has the property that the type of its n-th field is uniquely/deterministically determined to be tA, and as such struct.get accepts it. Let's suppose an instruction instr preceding struct.get outputs ref X, and ref X is a subtype of ref A but does not have a uniquely determined n-th field.

@RossTate, could you provide a (hypothetical) concrete example for this situation? I follow your logic from here, but I'm not sure where this premise would apply. Could we add extra restrictions to disallow X from being a subtype of A for all X and A that would otherwise satisfy this premise?

@RossTate
Copy link
Contributor

Sure thing, but can you first confirm that you are asking for a Post-MVP application where support for the described situation (i.e. support for type variables/imports with multiple upper bounds) would be helpful? I want to make sure I answer the intended question 😃

@tlively
Copy link
Member

tlively commented Sep 10, 2021

Yes, I am asking for a hypothetical Post-MVP extension here :) I don't think we should spend a lot of time trying to predict whether we will or will not want to add any particular post-MVP feature, but I would appreciate a taste of how these problems might come up.

I was also chatting with @jakobkummerow about this offline, and he pointed out that in the worst case we could completely disallow struct.get in modules that use some fancy new kind of type and mandate that struct.get_new_fancy be used instead. That would be stronger than just disallowing struct.get on the new kinds of types and should work around any typing issues AFAICT.

@RossTate
Copy link
Contributor

I would appreciate a taste of how these problems might come up.

Happy to oblige. There are two use cases that I know of. The first is multiple upper bounds on type imports that is useful for things like Jawa. Now, bounds on type imports are static, so we have more control here and might be able to come up with a means to restrict the collection of upper bounds in order to ensure a principle "concrete" upper bound, though it might not be possible to do so in a way that supports useful module operations like partial instantiation.

Regardless, the second use case is more difficult. This use case is user-defined casting. Right now we're building in a specific casting model, which is essentially the JVMs model for classes. That's perfectly good for our pressing use cases, but it's not great for a number of languages. And, if we ever get to generics, I think it's unlikely that any baked-in casting model will work well. For example, I know of 4 casting models that are used in practice for Pair<A,B>/A * B, each of which has very different memory-efficiency/time-efficiency/expressiveness characteristics that make them each better suited to different languages.

Type-checking user-defined casting is all about reasoning about type variables and bounds. As a first step, rather than using ref Foo to denote the type of an instance of Foo, you use exactref X where X is a type variable representing the (unknown) exact type of the instance and has Foo as a (concrete) upper bound. If the user wants to implement something like rtt-style casts, then the reference's v-table/object-descriptor has a table of run-time identifiers of supertypes of X, i.e. an array (exists y :> X. Identifier(y)). If the user wants to cast the instance to, say, Bar, they load an element of this array. This element has an existential type, so the type-checker (or a pseudo-instruction) "opens/unpacks" the existential by introducing a fresh type variable with the respective constraint. This results in, say, a type variable Z, which is now a new upper bound on X. This means that we're "dynamically" introducing upper bounds on type variables, meaning we can't rely on solutions that constrain the collection of upper bounds. Next, we check that the resulting unknown identifier, whose type is Identifier(Z), is equal to some run-time identifier for Boo, whose type is Identifier(Bar). If so, on the "equal" branch we deduce that Z = Boo, and consequently that X <: Boo. This means we know that the reference has all the fields provided by Boo and can access them in this branch (yay!). It also means that X now has two upper bounds: Foo and Bar. Plus, you can show that subsumption implies that this needs to work regardless of what Foo is, which means we really need a way to type-check struct.get/set on a ref X even for arbitrary combinations of concrete upper bounds on X.

Phew, sorry for the big dense paragraph. Note that in the above I used existentials because I know you, specifically, are familiar with them. I've been working on a more engine-friendly formulation of the ideas based on feedback from other discussions. But regardless of the specifics, type variables or abstract types and multiple concrete upper bounds seem to be fundamental to the space, which is why I'm pressing to maintain forwards compatibility with them.

we could completely disallow struct.get in modules that use some fancy new kind of type and mandate that struct.get_new_fancy be used instead

That seems heavy handed, especially compared to declaring types slightly differently. But if what's prompting that (correct) suggestion is a desire to pull this in now, since it now seems like there's an informed understanding that doing so comes with a cost of a further change (ideally to the MVP so that the change can be to just type definitions rather than instructions), I'm fine with pulling this now and collaborating on how best to make that change in a separate thread. (Though first we should also conclude discussion of the other issue, regarding interpreters, if this one is resolved.)

@tlively
Copy link
Member

tlively commented Sep 13, 2021

Thanks for the examples, @RossTate! My curiosity is satisfied.

It sounds good to me to pull this in once(/if) we resolve other discussions about it with the knowledge that it is a potential future compatibility hazard and with willingness to roll it back if beneficial in the future.

@jakobkummerow
Copy link
Contributor

I suppose it boils down to how likely we think it is that we'll need a variant of struct.get with static type annotation in the future. Apparently @RossTate thinks that this is very likely :-)

As an incremental way to make progress, @rossberg, how do you feel about limiting this change to the new family of instructions for now? That part seems to be uncontroversial (and will at least save some bytes, though likely far fewer than the whole thing), and that way we can allow ourselves some more time to come to a conclusion on the get/set instructions. Going back and forth does have a cost, so while it's certainly possible to land everything now and revert partially later if needed, we can also optimize for reducing churn if we think that that partial revert is going to happen more likely than not.

@RossTate
Copy link
Contributor

I'm confused. I said it's fine to not ever have any annotation on struct.get so long as we slightly change how types are defined. That is, the (future) accompanying change (for the MVP) would be a change to the Types section and nowhere else.

@jakobkummerow
Copy link
Contributor

@RossTate Oh, sorry, then I misunderstood "we really need a way to type-check struct.get/set on a ref X even for arbitrary combinations of concrete upper bounds on X". If you're happy with this change after all then I guess we can land it as it is.

@RossTate
Copy link
Contributor

RossTate commented Sep 13, 2021

Ah. That way could either be have an annotation (to prevent ambiguities when there would otherwise be multiple incomparable applicable concrete types) or impose some more structure (such as hierarchies) in the MVP on how concrete types are defined/imported (so that one of the applicable concrete types is always guaranteed to subsume the rest).

I'm happy with either way, and at present it seems the latter has significant impact on reducing compressed binary size.

@titzer
Copy link
Contributor

titzer commented Sep 14, 2021

I think we should not optimize for binary size until we are almost finished with the design, and then we should do so carefully and in a coordinated fashion. There are too many moving parts right now.

Copy link
Member Author

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@titzer, I'm a bit confused regarding the space usage argument for the side table.

Is it a given that a side table requires space proportional to the number of instructions? Can't it be represented more compactly?

Furthermore, my understanding was that you need a side table anyway. So haven't you already sunk that cost, and would just be filling it with additional entries?

Assuming this can be represented compactly in an interpreter, it seems like a clear win not to require this information to be present in all code, and only have it take up space in engines that require it.

(I'd assume it's also a clear win in terms of performance, because there is less to check. That's even for an interpreter that needs it, because it still costs less to store the implicit information from validation than to validate explicit information.)

The main motivation for making this PR is to make the proposal consistent, with itself and preceding proposals. Right now, there are instructions that still have these annotations, and others that don't. Clearly, that doesn't make sense. So something needs changing either way.

@jakobkummerow:

As an incremental way to make progress, @rossberg, how do you feel about limiting this change to the new family of instructions for now?

Yeah, not sure that's worth doing. As just said, the main goal here is to make things consistent, and that wouldn't get us there. If we can't get consensus about this being the right direction, then I'd rather park it for a while.

@RossTate, AFAICT, nobody has suggested giving up principal types. I obviously agree that it is a crucial property. Nobody has yet made a concrete proposal for multiple bounds on abstract types either. We can productively discuss how to maintain principal types in the presence of multiple bounds when we add the latter. There is more than one way that can be solved, regardless of this PR.

@@ -246,55 +246,55 @@ This can compile to machine code that (1) reads the RTT from `$x`, (2) checks th

#### Structures

* `struct.new_with_rtt <typeidx>` allocates a structure with RTT information determining its [runtime type](#values) and initialises its fields with given values
- `struct.new_with_rtt $t : [t'* (rtt n $t)] -> [(ref $t)]`
* `struct.new` allocates a structure with RTT information determining its [runtime type](#values) and initialises its fields with given values
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to a separate PR, but note that this is just fixing outdated info in the overview doc – test suite and interpreter have long changed the names (which you suggested yourself IIRC).

because otherwise you'll have to redo it very soon

Hm, there currently is no plan or concrete suggestion for having RTT-less instructions, other than some experiments? (FWIW, the discussion about that seems based on misconceptions about the motivation for having explicit RTTs. But let's not go into that here, since there are already too many off-topic tangents on this PR.)

@RossTate
Copy link
Contributor

We can productively discuss how to maintain principal types in the presence of multiple bounds when we add the latter. There is more than one way that can be solved, regardless of this PR.

#160 has been open for nearly a year, with no particularly concrete suggestions on how to address the issue, let alone in a way that is expressive enough to address the use case I outlined above (based on how the only type assembly languages that have scaled to major languages support casts and generics). At present the only solution I know of requires changing the (type section of the) MVP (sometime after this PR), and the reasoning above strongly suggests that the MVP would necessarily not be suitably extensible without some such change. I would be happy to learn of alternatives in #160, if you would like to offer them, but in the meanwhile I think we should keep the door open for at least one solution that is known to work in this space.

As I already stated, I am fine with this PR being pulled now, given that others have expressed enough understanding of the issue.

@rossberg
Copy link
Member Author

@RossTate:

#160 has been open for nearly a year, with no particularly concrete suggestions on how to address the issue

That's not unexpected, because it's a Post-MVP question, and we have more immediate problems to work on.

@RossTate
Copy link
Contributor

RossTate commented Sep 14, 2021

That's fine. I'm just pointing out that, as a consequence of not working on that issue, there is currently only one known annotation-free solution to it, and so we should keep the door open for that solution. It's risk mitigation.

@titzer
Copy link
Contributor

titzer commented Sep 16, 2021

Is it a given that a side table requires space proportional to the number of instructions? Can't it be represented more compactly?

Just to be clear, it's space proportional to the instructions where annotations would now be missing (not unrelated, existing instructions). So it basically undoes any over-the-wire savings gained here. And then some, as the side-table entries are bigger. We could drill into the details and options, but just to summarize: I think the most space-efficient solution would be to have a side-table entry that is just an index into a module-local table that contains the field offset, field size (or maybe mask), and value tag. So that'd basically end up making the LEB struct-relative field index immediate redundant, which makes it fairly obvious to me that that's the wrong choice. But we could just as well put in the module-relative field index as the LEB immediate.

Furthermore, my understanding was that you need a side table anyway. So haven't you already sunk that cost, and would just be filling it with additional entries?

Again, it adds new types of entries and they're encoded differently. It just makes things more complicated.

Assuming this can be represented compactly in an interpreter, it seems like a clear win not to require this information to be present in all code, and only have it take up space in engines that require it.

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Besides, I think removing "redundant" type annotations is fraught with problems, as we already saw with select and others have pointed out in this thread.

(I'd assume it's also a clear win in terms of performance, because there is less to check. That's even for an interpreter that needs it, because it still costs less to store the implicit information from validation than to validate explicit information.)

I don't think we need to optimize for interpreter performance, as memory is way more important--other than really poor design choices. There's going to be some dozen instructions for field access no matter what. But given that, there's at least one LEB immediate to decode (or skip); adding additional side table operations is just additional work. From an interpreter perspective, the fewer things to decode is better, so the LEB in the immediate might as well be worth something.

The main motivation for making this PR is to make the proposal consistent, with itself and preceding proposals. Right now, there are instructions that still have these annotations, and others that don't. Clearly, that doesn't make sense. So something needs changing either way.

Yes, we should be consistent. I think annotations are better future-proofing against unforeseen disadvantages (i.e. select) and given that, with care we can easily make annotations useful for all tiers.

I am also coming around to the idea of importing fields across modules, and using a module-relative field index for field access instructions is a path to that.

@rossberg
Copy link
Member Author

Just to be clear, it's space proportional to the instructions where annotations would now be missing (not unrelated, existing instructions).

Ah, okay, that's what I would have hoped. Thanks for the clarification.

Assuming this can be represented compactly in an interpreter, it seems like a clear win not to require this information to be present in all code, and only have it take up space in engines that require it.

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Hm, frankly, this sounds as if the reasoning is: the extra price for this space overhead should be payed everywhere, rather than just in interpreters that need it, so that these interpreters don't lose in comparison. I'm not sure if that is the best global criterion for optimising the design. :)

Besides, I think removing "redundant" type annotations is fraught with problems, as we already saw with select and others have pointed out in this thread.

True, but we addressed it without mandatory annotations in the common case by only requiring them in relevant cases. The same strategy would work if a comparable problem should ever come up with the instructions in question here (essentially what @jakobkummerow's called struct.get_new_fancy above).

That said, I'm not too concerned that this situation would arise for the instructions in question (select is rather different as a generic data flow join), unless we screw up badly elsewhere.

(I'd assume it's also a clear win in terms of performance, because there is less to check. That's even for an interpreter that needs it, because it still costs less to store the implicit information from validation than to validate explicit information.)

I don't think we need to optimize for interpreter performance

Agreed, my point above was that every implementation will have a performance benefit, even interpreters.

I am also coming around to the idea of importing fields across modules, and using a module-relative field index for field access instructions is a path to that.

Yeah, I fail to see the benefit so far, other than obfuscating operational aspects that should naturally be explicit in an assembly-level language. But that's for another discussion thread.

@titzer
Copy link
Contributor

titzer commented Sep 16, 2021

I don't agree that it is a clear win, even for space. It's only a win for binary size, and that ignores memory footprint. A major selling point of an interpreter is the memory savings, and this works against that. It just inflates the additional metadata you need besides the bytecode.

Hm, frankly, this sounds as if the reasoning is: the extra price for this space overhead should be payed everywhere, rather than just in interpreters that need it, so that these interpreters don't lose in comparison. I'm not sure if that is the best global criterion for optimising the design. :)

No, an interpreter will easily win over any JIT tier, which inflate code anywhere from 2x to 5x. This isn't an attempt to "make interpreters look better" by making the other tiers worse. In fact, I never think of tiers being in competition. They are complementary. JITs can ditch the bytecode altogether, so it's actually no space overhead for them, whichever choice we make here. Static compilers definitely can and will dump the bytecode, so it makes no difference. Rather, my point is that one of the major advantages of interpreters is space, and choices here just reduce that advantage, reducing their complementary advantage in a multi-tier system.

I foresee that many systems will likely employ 3 tiers; in fact, JSC already does. I think Wizard will end up with 3 tiers in the end, too. The performance results from its fast interpreter are compelling enough that I think engines will consider adopting its in-place design.

In general, I think we should reason about whole systems and not just one aspect like binary size.

I don't think we need to optimize for interpreter performance
Agreed, my point above was that every implementation will have a performance benefit, even interpreters.

I think we both agree that it makes no difference for the code generated, but codegen speed for baseline JITs is also affected by how much metadata they must reconstruct as well. In particular, a consquence of this change would be that baseline JIT like Liftoff would also have to model types to determine fields. Liftoff has already been tuned to turn off several validation checks when re-walking bytecode (using advanced templating tricks in the common function body decoder) and that does make a measurable codegen speed difference.

This is what I mean by taking a wider view of systems. "Redundant" annotations are only redundant w.r.t. to the specific validation algorithm that reconstructs types in its abstract interpretation. There are a lot of other things that process bytecode, and changes like this force them to replicate parts of the validation algorithm just to recover information that should be present in bytecode but is not encoded properly, e.g. in immediates. E.g. just basic questions like "where is this field used in the code" could not be answered by simply iterating bytecode-to-bytecode. This information is present in Java class files, DEX files, and C# assemblies, and every bytecode format I can think of offhand. So this decision would be a deviation. That feels like a mistake. Binary size does not seem worth it.

@titzer
Copy link
Contributor

titzer commented Sep 16, 2021

I am also coming around to the idea of importing fields across modules, and using a module-relative field index for field access instructions is a path to that.

Yeah, I fail to see the benefit so far, other than obfuscating operational aspects that should naturally be explicit in an assembly-level language. But that's for another discussion thread.

I agree it deserves a different thread, but just briefly, it's pretty straightforward from an engine perspective: field accesses to fields of fully-defined structs in this module are a single, static offset, whereas field accesses to imported fields are another indirection away (a table hanging off the instance). This is a pretty easy analog with how low-level linking of object files works.

@titzer titzer mentioned this pull request Feb 11, 2022
@tlively
Copy link
Member

tlively commented Jul 21, 2022

@titzer, what is your latest thinking here? Would you be ok with removing the type immediates from these instructions?

@titzer
Copy link
Contributor

titzer commented Jul 21, 2022

I don't support completely removing type indexes. That's not motivated primarily by in-place interpretation; for various reasons, wizard will probably want to have side-table entries for GC instructions. That's motivated by other tiers like baseline compilers. I don't think we want all tiers to need to do full type reconstruction to generate code. Rather, the immediates to instructions should be enough to generate code.

@rossberg
Copy link
Member Author

I think this is one of the open issues we still need to resolve. At this point I don't care one way or the other, but some change is needed. As is, it's simply inconsistent. For example, if we keep the annotations, then instructions like call_ref should have them, too. Adding that would be the smaller change, but I'm fine going the other way as well if somebody wants to argue against redundant annotations, e.g., on the grounds of code size or validation overhead.

@titzer
Copy link
Contributor

titzer commented Jul 21, 2022

I think this is one of the open issues we still need to resolve.

Indeed.

To a first approximation the main thing I care about is not requiring every tier to model types in order to model the abstract stack. For compiler tiers, the abstract stack need only have track machine representations in order to know register classes; the immediates should be enough to know how many values to push/pop off the operand stack and their register classes.

call.ref would need an immediate for this, and struct.get and struct.set would need to have an immediate that is specific enough to know the declared field.

@rossberg
Copy link
Member Author

I suggest discussing this on WebAssembly/function-references#27.

@tlively
Copy link
Member

tlively commented Aug 25, 2022

@rossberg, looks like we don't need this PR any more.

@rossberg rossberg closed this Aug 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants