Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Field Accessors #238

Closed
RossTate opened this issue Aug 25, 2021 · 30 comments
Closed

Field Accessors #238

RossTate opened this issue Aug 25, 2021 · 30 comments

Comments

@RossTate
Copy link
Contributor

This is a suggestion for a different way to design the instructions for getting/setting fields, which for lack of a better term I'll refer to as "field accessors".

Design

After processing the types section of the module, we'll get two lists: the list of types, and the list of field accessors. When processing a (nominal) type in the types section, for every field of the type a corresponding field accessor is automatically added to the field-accessor list. The type of the field accessor depends on the qualities of the field, e.g. what is its value type, and is the field writable? The "receiver" type of the field accessor is whatever (nominal) type it was generated from.

So, if one defines a nominal type, say $Foo, with a mutable i32 field followed by an immutable i64 field, then there will also be two field accessors, say $x and $y, where the type of $x is "receiver type is $Foo, value type is i32, and mutability is mutable" and the type of $y is "receiver type is $Foo, value type is i64, and mutability is immutable". The text format could provider room for declaring the field-accessor names $x and $y, which get bound to their corresponding indices in the generated field-accessor list.

Then the instructions for getting and setting a field simply include a field-accessor index, rather than both a type and an index. The type of the corresponding field accessor determines the receiver type that must be on the stack and the value type that either is returned (for getting) or must be on the stack (for setting). For setting, the field accessor must also be mutable.

To clarify, this is not the same as the field members described in the Post-MVP. Field members in the Post-MVP are essentially the first-class variant on field accessors.

Motivation

Short-term reasons to use field accessors:

  • Primary: Reduces code size without introducing any type-checking complexity: the get/set instructions now have only one index rather than two.
  • The names section can give a name to each field accessor, which is helpful for debugging
  • I also wouldn't be surprised if it's more straightforward/efficient to compile and interpret.

Longer-term reasons to use field accessors:

  • Another way to reduce code size is to simply synthesize the receiver's type rather than explicitly specify its index in the get/set instruction, but that runs into type-checking complexity issues when combined with extensions like multiple upper-bounds (as described in Complications with multiple upper bounds #160), whereas field accessors retain efficient type-checking even in the presence of such extensions.
  • Field accessors make it easy for one to import just the fields of another module's types that it actually depends on, and similarly to give other modules access to just the public fields of one's own types without having to also give them access to private fields.

Accompanying Extension

Field accessors would also facilitate another extension or variation. When one defines a nominal type that extends another nominal type, we could avoid repeating the fields of the other nominal type and simply state the new fields of the new type. In the short term, this would reduce the size of the types section. In the long term, this would facilitate separate compilation of subclasses (particularly in the presence of private fields of the superclass).

Whether we actually want this accompanying extension should be discussed later and somewhere else; I just mention it here as another motivation for field accessors.

Transfer

Although I came up with this for nominal types, the idea likely transfers to structural types.

@tlively
Copy link
Member

tlively commented Aug 25, 2021

Thanks for the write up! It should be fairly simple to measure the code size benefit of switching from two indices to one index on accessor instructions. Since the precise semantics of this change would depend on what core type system we settle on and also because we have agreed not to consider field privacy in determining that core type system, I suggest that we consider this change after we have settled the type system question.

@RossTate
Copy link
Contributor Author

Um, none of the short-term motivations mention privacy, and I mentioned that this would probably be workable for structural types. Also, I thought the plan everyone just agreed to was to not the settle the type-system question for quite a while. So now I'm confused; I had thought I had already taken all of yesterday's discussion into account in my post.

@titzer
Copy link
Contributor

titzer commented Aug 25, 2021

Afaict, this is just an additional (derived) index space for fields. This would actually be useful for an interpreter to have only one LEB index in struct.get and struct.set instructions. It's completely orthogonal to nominal vs. structural types.

@tlively
Copy link
Member

tlively commented Aug 25, 2021

Sorry @RossTate, I stated my reasoning imprecisely and too strongly. I agree that this is orthogonal to the type system question and I agree this is worth considering for the MVP. I would like to consider this after settling the type system question only because choosing a type system is the most important design problem right now and does not depend on this immediate choice, nor on the longer-term consequences of this choice. If anyone wants to investigate this in parallel to ongoing type system work, that is fine by me.

@RossTate
Copy link
Contributor Author

Thanks for clarifying, @tlively! I see your reasoning now. My concern is that my understanding of yesterday is that we were not going to be choosing a nominal vs. structural type system for a while, so I'm worried about deferring improvements until after that choice is made (unless the plan is "officially" changed). So I figured that we could incorporate this into the "official" nominal system while we're creating that anew anyways. This, along with the accompanying extension, were the two biggest core deltas that came to mind (beyond the switch to nominal), which is why I mentioned them now.

@titzer's summary seems accurate. It occurs to me there is one more observation to make regarding the accompanying extension. If we were to only allow "extending" types to add more fields—not allowing them to redeclare/refine fields—then every field is defined exactly once. This would make the field-accessor list essentially just the field list (plus imported fields). A nice thing about this is that it means a debugger could use the dynamic type of the object to uniquely determine the names of its fields (specified in the names section of the unique module defining the field). While, upon reflection, the other pros and cons of field accessors seem to apply equally well to structural and nominal types, this pro seems specific to nominal types where we have an obvious "owning" module of the field definition whose name section we should clearly use. (Note that I wouldn't consider that a significant enough reason to favor nominal over structural; I'm just trying to be transparent/complete.)

@rossberg
Copy link
Member

FWIW, removing redundant type index annotations is orthogonal. I believe I mentioned before that I'd like to remove them on GC instructions, per discussions like WebAssembly/function-references#27 and consistent with what already happened in WebAssembly/reference-types#99 and WebAssembly/function-references#31. I have delayed making this change for GC instructions, though, since V8 folks expressed their desire to not make breaking changes for the time being. @jakobkummerow, is that still the current status?

@jakobkummerow
Copy link
Contributor

I have delayed making this change for GC instructions, though, since V8 folks expressed their desire to not make breaking changes for the time being. @jakobkummerow, is that still the current status?

@rossberg: I think right now is a particularly good time for making such breaking changes, as everyone is getting ready to implement new things anyway (nominal and iso-recursive+explicit-inheritance types).

Reduces code size without introducing any type-checking complexity: the get/set instructions now have only one index rather than two.

@RossTate: It would be good to measure this hypothesis on production-scale examples. Since we use LEB encoding, two smaller numbers don't necessarily (though certainly sometimes) take more space in the binary than a single bigger number.

@RossTate
Copy link
Contributor Author

Ah, good point. Is there a public collection of production-scale binaries to examine?

Of course, there are motivations besides smaller size that are worth discussing in parallel. I remember @jakobkummerow and @tlively were expressing interest in names for fields for debugging, so I'm interested to hear your thoughts on whether this helps with that (especially with respect to the ability to report the name based on the dynamic type of the reference). As another topic, it sounds like @titzer agrees this would likely be helpful for interpreters; I'm curious if @jakobkummerow or other engine implementers think it might help with faster compilation. I figured engines could have an array that simply maps a field-accessor index to its corresponding offset within the object, and suspected that would be simpler than the current type-dependent mapping-to-offset calculation. Then again, maybe this is small in the grand scale of things. There's also the long-term considerations, but my understanding is that we're trying to focus discussions on more immediate concerns.

One question I have is has anyone thought of a reason this would be substantially worse than the current approach? It would be good to know of potential issues sooner rather than later.

@titzer
Copy link
Contributor

titzer commented Aug 26, 2021

The issue with interpreters is that if the index to struct.get and struct.set is just a field index within the struct (0...struct.numfields), then an interpreter lacks the necessary metadata to figure out a.) the actual field offset b.) the runtime value-tag of the value (e.g. i32, i64, f32, f64, ref, and others), which is necessary for interpreters with tagged values, c.) the size of the field (32 or 64 bits for non-packed types). Without enough information in the bytecode, an interpreter has to put it elsewhere. E.g. in Wizard, some additional metadata could be inserted into the side-table for bytecode. If a side-table is not possible, then the only remaining place to put such metadata is an indirection off of the actual runtime object.

This is why I posted on #27. Removing information from bytecodes that can only be derived through validation complicates interpreter designs.

A single field index that is in the index space of all fields declared (or perhaps imported, that's a post-MVP discussion) in a module is a better choice for an interpreter design, because it'd then be easy to have a single array of field metadata hanging off the module representation.

@askeksa-google
Copy link

I did an experiment on the Dart barista3 benchmark, where I tried out three different encodings for struct.get and struct.set:

  • The current one: type id followed by field index
  • Single value, enumerating all fields
  • Single value, enumerating only fields added relative to the superclass

The results are as follows:

Uncompressed zopfli zstd -19
Current 1767168 584314 484815
All fields 1734066 587983 489575
Added fields 1732614 585543 487459

The new encoding does give a roughly 2% size reduction for the uncompressed module. But when the module is compressed, the picture is reversed: the new encoding is bigger than the current one, presumably due to the irregularity of the combined encoding compared to the separate one. With a modern compression format, the difference is even more pronounced, with even the "Added fields" version being about 0.5% bigger than the current encoding.

@RossTate
Copy link
Contributor Author

Wow, thanks @askeksa-google for performing the experiment! That's interesting how compression cancels out the savings.

I had hoped the numbers would give a clear win one way or the other, as we've found numbers more quickly bring us to group consensus. but it seems it's about a wash here. Given that we're now building something for nominal types, I'd still prefer for us to build on this over the type+index approach for the various other advantages that have been discussed above (though I'd still take that over the index-only approach).

I'm going to throw one more long-term advantage in the mix as well. We know that a number of languages will benefit from support for nested structures. There's also the special case of #222. The Post-MVP outlines support for nested structures. But this design relies on using engine-supported interior references and a sequence of instructions to actually get at the contents of these nested structures. While interior references are useful, they're also quite challenging, and applications can implement (wide) interior references with field members anyways (though this has caveats with multithreading), so I'm concerned about nested structures depending on engine-supported interior references.

With field accessors, when you specify a structure containing a nested structure, you get a field accessor that, when given access to the outer struct, gives you access to the inner struct. We can then provide a way to declare a field accessor that is the composition of two field accessors, so that if one goes from Outer to Inner and the other goes from Inner to i64, their composition is a field accessor from Outer to i64. This then gives you a single-instruction means to get the i64 nested inside Outer without any interior references.

Now, besides supporting languages like C#/Go, there's another benefit to supporting nested structures and field accessors. For (presumably) the Post-MVP, I've been investigating how to support separate compilation for Java-like languages without essentially baking in direct support for it into the engine (or relying on dynamically generating wasm modules). I've found that the "a subclass's definition specifies all the fields of its superclass" doesn't work very well. On the other hand, if we have "C extends B extends A", then modeling an object as a sequence of nested structures "(A's new fields) followed by (B's new fields) followed by (C's new fields)" we can define C's structure without depending on the full details of A or B's structures. The module for C can just import the (fixed-sized) structures for A and B's (new) fields, along with the field accessors of A and B's (public) fields (punting some details I've worked out on how to make C not have to know which of A or B declares the field). Looking beyond linear inheritance, this approach also extends well to multiple inheritance, traits, mixins, and the like.

So, if all's roughly even numbers-wise, I'd prefer this approach for the new nominal system because, in addition to (I suspect) easier compilation/interpretation and better debugging support, my investigations so far suggest that it also lays better foundations for various Post-MVP extensions.

@askeksa-google
Copy link

For completeness, the index-only version added to the table (first three rows are the same as above):

Uncompressed zopfli zstd -19
Current 1767168 584314 484815
All fields 1734066 587983 489575
Added fields 1732614 585543 487459
Field index only 1704611 554750 460848

Saves 3.5% uncompressed and 5% compressed.

@RossTate
Copy link
Contributor Author

Ah, that is compelling. For completeness, I believe there is a way to solve the multiple-bounds problem for the index-only approach. What you have to do is—rather than a bunch of separate imports of types—import an entire hierarchy, where as part of that import you specify a DAG/tree of (imported) types and subtypings that must be in the hierarchy. The caveat is that, when checking that an exported hierarchy+types matches an imported hierarchy+types, in addition to checking the given subtypings one must also check that joins and meets are preserved as well (which is trivial to do for DAGs/trees). Doing so essentially enables the solution that iTalX used for this problem. Of course, it's worth noting that this does not address other problems with index-only (such as for interpreters) nor solve other problems that field accessors manage to solve (such as separate compilation and nested structs), but it is an option that apparently takes less binary space, and design is about considering tradeoffs.

@RossTate
Copy link
Contributor Author

I had a chance to consider the hierarchy approach in more depth and was able to make it much simpler than what I described above. That said, I can't figure out how to make the index-only approach to work well with separate compilation. Separate compilation for many object-oriented languages fundamentally relies on being able to build the layout of an object by composing portions of the object that are specified across multiple modules. The "index field from start of object" approach just doesn't accommodate that.

But I think I found a way to synthesize some of the variations above. When you define a new type, you specify just its new fields (relative to the type it extends, if any). To get/set a field, you specify the type and the index of its new field. So, if $C extends $B extends $A, and $B declares three fields "$x: i32, $y: f64, $z: externref", if you want to get the field $y of a $C reference, you say get $B $y (where $y is 1). You use 1 for $y regardless of how many $A fields there are. So it's a slight variation on the current design that works even if $A is imported from another module. It's also extensible with more general field accessors. Essentially, when you define a type, its fields are its first field accessors. But you can add more field accessors (though not more fields) later in order to handle nested structures. Similarly, you can import field accessors for an imported type. So the instruction really is get <type index> <field-accessor index>, and it happens to be the case that the field-accessor index for newly defined fields of a type is simply their index. So we're essentially building per-type field-accessor tables. I think that would still be interpreter/compiler-friendly, and should compress similarly to (or even better than) the current approach.

Does that make sense?

@jakobkummerow
Copy link
Contributor

There is indeed a fundamental incompatibility between fast(est possible) field accesses and separate compilation (or more specifically: unknown supertypes), and we'll have to decide eventually which compromise we want to pick.

The fastest machine code a field load like struct.get $B $y can possibly be compiled to (and currently will be compiled to!) is a single instruction:

// assume RAX holds the struct
// assume we want the property in RBX
// assume $y's offset from object start is 40
mov RBX, [RAX + 40]

When generating this code, the compiler must obviously know the full layout of $A and $B so that it can compute the final byte offset of field $y from the beginning of the object.
If we want to support unknown supertypes, i.e. generate code before the supertype's size is known and not have to recompile it when the supertype changes size, then we have to add an indirection there, which will have a performance cost. We can no longer embed a simple constant like 40. We may end up with a sequence like:

// assume RCX can be used as temp register
mov RCX, [RAX + 0]  // load type info
mov RCX, [RCX + 8]  // load supertype size
// assume $y's offset from $B start is 4
mov RBX, [RAX + RCX + 4]

That'd be 3 (chained!) loads instead of 1. I'm not sure this is exactly the code we'd end up with; it's meant to be a representative illustration. I doubt it could be more efficient than this. I also doubt that we'd be happy with a design where every field access has to pay this cost (but that's just my gut feeling).

@RossTate
Copy link
Contributor Author

I agree that we don't want 3 loads for a field access (especially chained ones!). But I don't believe what I'm suggesting here makes that necessary. Let me break it down into cases for get $B $y.

First, if you know the full supertype chain for $B and all the fields of those supertypes at compile time, you can precompute the offset as you did above. This is the only situation which is currently expressible. That means there is no performance loss incurred by this design relative to the current design.

So the real consideration is whether the flexibility granted by this design can be implemented efficiently enough that people would want to use it. Remember that the alternatives are not great. Either you compile the subclass according to a fixed representation of its superclass, or you JIT your own WebAssembly module at load time. And in either case, the module implementing the superclass has to export all the internals of the superclass module just so that you can use its indexing scheme and the linker can check you used it correctly. (Note that both of these are fine for the MVP, as we might not even support separate compilation at all for the MVP, but I'd like an approach that at least is compatible with a good Post-MVP option for separate compilation.) So people might be fine with some pay-as-you-go performance hit for the significant improvement in flexibility—the key word being some (we already have pay-as-you-go per the observation above).

Now let's consider the worst-case scenario. Unfortunately, I'm not sure how you arrived at your first "load type info" instruction, so I can't speak to that. But here's what I would expect. When you import an (extensible/fixed-size) type like $A, the module-instance structure would store the size of that type. So first you'd have to load that size from the module instance, then you'd have to add that size, plus the offset of field $y within $B, to RAX, and load from that address. Thus the worst-case scenario seems to be 2 chained loads.

The first thing to consider is that the first load is from the module instance. Presumably this is in the cache (even the L1 cache?). As such, this first load is likely to be (very) fast.

The second thing to consider is that the first load is of immutable data. Thus it can be amortized across loads and writes of various fields of extensions of $A in the function. The main cost of this will be register pressure. But, again, since the value is likely in the L1 cache it's not unreasonable to forget it to relieve register pressure and reread it again later. Nonetheless, I suspect that in the common case this value can already be in the register, in which case we implement get $B $y with a single load, as before (though with a dynamic offset, which I understand is pretty cheap).

Lastly, I'm still not convinced that WebAssembly can efficiently support even basic ecosystems without some form of instance-specific compilation at the least in the optimizing tier, whether it's inlining, "fusion" (a la Interface Types), or—in this case—filling in immediates. But even putting that aside, by the above reasoning I suspect people would find the pay-as-you-go costs to be pretty minor and well worth the flexibility granted, especially with respect to the alternatives. (And, to repeat, it might be that we decide the MVP does not support any separate compilation whatsoever; this design just leaves a path forward for separate compilation in the Post-MVP, with no performance cost to MVP programs, and with other advantages for specifically the MVP discussed above.)

@jakobkummerow
Copy link
Contributor

the module-instance structure would store the size of that type. So first you'd have to load that size from the module instance

I see two implementation options that both lead to three chained loads.
The first is what I described: store all type information on the object (in the RTT/"shape descriptor"). You'd have to get the RTT, then load the offset from it, then do the actual load.
The second is what you've described: keeping a pointer to the instance around, type sizes can be stored on there; realistically that'll be in a separate array though, so the first load would get the "supertypes sizes list" from the instance, the second load would get an entry from that array, and again the third load would do the actual struct field access.

Hardware caching and/or compiler-side de-duping can certainly help; whether it'll help enough to make this viable/desirable will likely have to be determined experimentally at some point.

I didn't mean to imply that the Field Accessors idea in itself requires any of that; I was commenting specifically on separate compilation, with or without Field Accessors.

I'm still not convinced that WebAssembly can efficiently support even basic ecosystems without some form of instance-specific compilation at the least in the optimizing tier,

Yes, I'm not convinced of that either; giving up on separate/ahead-of-instantiation-time optimized compilation is one of the possible compromises we could choose.

@RossTate
Copy link
Contributor Author

Ah, thanks for the clarifications. But now I have a question about how things are currently implemented. I had thought that WebAssembly modules were designed so that things like globals and imports could be stuck directly in the instance rather than indirectly through arrays (in which case supertype sizes would be in the instance). Is that not the case then? Do gets of globals and calls to imported functions require 2 chained loads in the worst case?

@jakobkummerow
Copy link
Contributor

The instance object (like the module object) has a fixed size, so everything whose length is specified by the module is an array: globals, functions, tables, types, you name it. At least we can stick the pointers to those arrays directly into the instance object :-)

Direct-calling an imported function takes 4 loads (two pairs of two chained loads, both best and worst case). Indirect-calling takes 7 (longest chain 3) or 9 (longest chain 4) loads depending on whether the table index is 0 or non-zero, plus two comparisons; whether the callee is imported makes no difference. call_ref is currently at 6 loads best case (longest chain 4). (None of this is expected to change in the foreseeable future.)

@RossTate
Copy link
Contributor Author

Oh wow, that's grim. Thanks for the improved understanding (and for clarifying load-count vs. chain-count).

@titzer
Copy link
Contributor

titzer commented Sep 1, 2021

Keep in mind that globals can be imported/exported, so their storage needs to be indirected from instances.

I don't understand why call_ref is 6 loads, can you explain?

@aardappel
Copy link

9 loads worst case, wow! The cost of just even a small bit of abstraction/indirection is intense!

I presume part of this is because we must accommodate an imported function to be changed at any moment? What if we had a way to "seal" a Wasm module after instantiation?

@jakobkummerow
Copy link
Contributor

@titzer Pseudo-code for call_ref is:

// Given (dynamic): func_ref, current_instance
if (func_ref == null) { Trap() }  // optimized away if statically provable
shared_info = Load(func_ref, kSharedFunctionInfoOffset)
func_data = Load(shared_info, kFunctionDataOffset)
instance = Load(func_data, kRefOffset)
instance_map = Load(instance, kMapOffset)
instance_type = Load(instance_map, kTypeOffset)
if (instance_type == Pair) {
  Store(instance, kValue1Offset, current_instance)
}
target = Load(func_data, kTargetOffset)
if (target == kNull) {
  target = Load(func_data, kWrapperOffset) + kInstructionStart
}
Call(target, instance)

The conditional Store (which is needed for calling funcrefs referring to JS functions) is particularly ugly; but it's much better than the allocation that was there before I optimized it to its current state.

A lot of this complexity is due to the fact that V8 is both a Wasm and a JS engine (and a funcref can come from either world); in a pure-Wasm engine this could be a lot simpler.

@titzer
Copy link
Contributor

titzer commented Sep 2, 2021

@aardappel Imported functions can't be changed, but tables can. call-indirect is complex because of signature checking and the possibility of a cross-module call. TurboFan generates all of that inline, but most of it could be moved out-of-line into a per-signature stub.

@titzer
Copy link
Contributor

titzer commented Sep 2, 2021

@jakobkummerow Ah, yes, I hadn't considered JS functions flowing into a wasm call ref and then presumably escaping again via anyref. There are several ways of increasing intrusiveness that can make that sequence better:

  1. Add a field to JSFunction objects that is essentially a pointer to an entrypoint specifically for wasm call-ref.
  2. At the boundary between JS and Wasm, i.e. all wrappers, coerce JSFunctions into a different representation (e.g. by having an additional field that points to a wasm closure, which also points back, for identity).

The second one implies doing work for every reference (not statically known not to be potentially a JSFunction) that crosses between JS and Wasm, either at a function call, or being written or read from a table.

Thanks for the info, though I did drag this a bit off topic.

@jakobkummerow
Copy link
Contributor

re 1: Yes, putting another pointer (or two) into JSFunction could save a load (or maybe two?), but given all the efforts we've put into making JSFunctions as small as possible (because there are typically many of them), I'm hesitant to make them bigger.
re 2: Interesting idea, I'll have to think through that (including whether that can be pulled off without making JSFunctions bigger).

@rossberg
Copy link
Member

rossberg commented Sep 3, 2021

My assumption for function references always was approach (2), i.e., coerce at the boundary, like you do for imports. With that, call_ref should be no slower than a regular call to an imported function. In fact, shouldn't it be faster, because it saves the load from the import/func array?

@tlively
Copy link
Member

tlively commented Jul 21, 2022

Is anyone interested in investigating this option further? If not, we can close this issue.

@titzer
Copy link
Contributor

titzer commented Jul 21, 2022

I don't think object models will work exactly along the lines in this thread, but I can see a use case for importing field accesses. That said, there are several issues in this thread and I think the call.ref overhead has been addressed elsewhere. I'd be OK closing this issue.

@tlively
Copy link
Member

tlively commented Nov 1, 2022

Now that we're in phase 3, the design is largely stable, so we won't be adding field accessors to the MVP. If it would make sense as a post-MVP feature, PRs it to the post-MVP doc would be very welcome.

@tlively tlively closed this as completed Nov 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants