Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Initializing array from data or element segment #260

Closed
askeksa-google opened this issue Dec 8, 2021 · 32 comments
Closed

Initializing array from data or element segment #260

askeksa-google opened this issue Dec 8, 2021 · 32 comments

Comments

@askeksa-google
Copy link

In various discussions, we have touched upon the subject of having a version of the array.init instruction which initializes the array from a data or element segment rather than from the stack. Most recently, @rossberg suggested some variations of the operation in #250 (comment).

Of the various proposed instructions for this purpose, I think the ones that create a new array are the most important, as these provide new capabilities for global initializers. These would be (staying with the init nomenclature):

array.init_from_data $a (data $d) : [i32 i32 (rtt $a)] -> [(ref $a)]
array.init_from_elem $a (elem $e) : [i32 i32 (rtt $a)] -> [(ref $a)]

each taking an offset (byte resp. element) and the array length from the stack.

I did an experiment in dart2wasm, replacing the use of array.init for string constants with array.init_from_data as specified above.

For a module with 34k of string data, it reduced the module size by 48k uncompressed. Compressed (zopfli), the size reduction was less dramatic: 300 bytes if the instruction takes the offset first, then the length, or 1200 bytes if it takes the length followed by the offset.

But this is not merely a matter of module size reduction. Due to the limitation of array.init that there is a maximum length it can accept, these instructions are essential to enable arrays of any length to be initialized in global initializers. This is important for the implementation of Dart constant expressions, for instance.

@tlively
Copy link
Member

tlively commented Dec 11, 2021

SGTM, and I think the names are appropriate as well. The memory.init instruction uses [dest_offset, source_offset, length] order for its operands, so it would be good to use [source_offset, length] in these instructions. Interesting that the compression was better the other way around, though.

@askeksa-google
Copy link
Author

One thing to keep in mind is the interaction with data.drop and elem.drop.

Ordinarily, constant expressions are independent of all side effects, which means an engine is free to postpone initialization of a global until e.g. the first time a function using the global is compiled. However, if we specify that these instructions will trap when referring to a dropped data or element segment (which would be a natural thing to do), then such postponing becomes observable.

To restore the illusion that global initialization happens fully up front, the engine would need to perform at least a quick pass over the global section to identify references to data and element segments. It can then instantiate those arrays up front or keep the segments alive internally until all uses in initializers have been executed.

@rossberg
Copy link
Member

Ah, I was actually assuming that these would be copying operations analogous to memory/table.init. I guess it also makes sense to have versions that are fused with allocation, but those I would probably name array.new_from_data/elem.

@askeksa-google
Copy link
Author

Right, with other instructions with init in their name being copying instructions without allocation, it makes sense to distinguish by calling the allocating ones something with new. But then we should also rename array.init to array.new_something.

@rossberg
Copy link
Member

Well, true, but array.init is not in the proposal yet. :)

@manoskouk
Copy link
Contributor

manoskouk commented Dec 21, 2021

I see a couple of issues here:

  • We introduce circular dependencies between the global and element sections through global.get and array.init_from_elem.
  • We introduce dependencies from any section that contains constant expressions to the data section. This could be resolved by somehow moving the data count section earlier in the module.
  • We introduce dependencies between element segments. We could resolve this like we plan to do with globals, i.e. by only allowing an element segment to reference previous ones.

@askeksa-google
Copy link
Author

Those are very good points. For the last two, I agree with your suggestions on how to solve them.

The first one is more tricky. One approach could be to let the global initialization allocate these arrays with uninitialized elements, and then fill them in when reading the element section. The engine would need to remember the element index and offset for each array.init_from_elem instruction along with a reference to the array to be initialized. To make it easier for the engine to allocate space for this auxiliary information, we should provide the number of array.init_from_elem instructions occurring in the globals section somewhere before that section.

We can make the process slightly easier for the engine by requiring that element indices occurring in array.init_from_elem instructions throughout the globals section are monotonically (though not necessarily strictly) increasing. However, given that this just saves a single sort by element index, it may not be worth the reduced flexibility. In any case, processing of the element section would walk through a queue of array.init_from_elem instructions, matching the element index at the head of the queue to each newly processed element.

If it's important to validate the array.init_from_elem instructions as they are encountered, we will also need an element count before the globals section. Alternatively, out-of-bounds element indices can be rejected during processing of the element section.

This approach allows the construction of cyclic constants. This can be considered a feature. If cyclic constants would cause trouble for some reason, we can avoid them by requiring any global.get occurring inside an element to refer to a global whose index is strictly lower than the one containing the next array.init_from_elem instruction in the queue. This will require streaming loaders to also remember which global contained each array.init_from_elem instruction. On the other hand, this (in conjunction with the requirement that element indices in array.init_from_elem instructions are monotonically increasing) would enable non-streaming loaders to avoid the auxiliary queue of array.init_from_elem instructions by processing the global and element sections side by side.

@manoskouk
Copy link
Contributor

Another question for array.init_from_data is: How should integer values bigger than one byte be interpreted in the data section? We could interpret them as LEB128 to be consistent with other parts of the binary format, but that would prevent us from using memcpy to initialize arrays. I think we should interpret them as full-sized values.

@tlively
Copy link
Member

tlively commented Jan 13, 2022

Full-sized values based on the type of the array makes sense. So an i8 array would have an entry for each source byte and an i32 array would have an entry for each little-endian set of four bytes. Should the size be given in terms of array length rather than byte length to avoid having leftover bytes?

@askeksa-google
Copy link
Author

For consistency with other instructions that specify an array length, I'd say specify the array length rather than the byte length.

@rossberg
Copy link
Member

Mutual dependencies between sections are a reoccurring problem, esp wrt globals. Some might remember that that's also what caused us to introduce the declare segments for first-class references. As I argued at the time, that was a hack, and I would have preferred a cleaner and more general solution that scales to other instructions, like we have now.

The simplest way out would be requiring dependency ordering between sections, as you say. But that also requires allowing multiple occurrences of each section type to handle all cases. This keeps coming up with almost any interesting addition to the module scope, so perhaps we should just bite?

Another question for array.init_from_data is: How should integer values bigger than one byte be interpreted in the data section?

A data segment is a a sequence of bytes, so I don't think there is much leeway in interpreting it differently. And that instruction should amount to a memcopy anyway. So yes, the data segment has to lay out the values accordingly, as if they were read with the analogue of sequential load instructions of the right size.

@manoskouk
Copy link
Contributor

manoskouk commented Jan 26, 2022

Update: array.init_from_data is now available in V8. As usual, it is specified in our V8 spec document.
Note: For now we implemented the version which automatically includes the canonical rtt for the specified array type.

@titzer
Copy link
Contributor

titzer commented Jan 26, 2022

FWIW I implemented unrestricted section repetition in Wizard. It require some readjustment to module representations to keep a sane order for initialization, but otherwise I don't see a major blocker to doing this. For this proposal I suggest we at least consider repeating the type section as the mechanism to denote recursion groups for iso-recursive typing.

@rossberg
Copy link
Member

rossberg commented Jan 28, 2022

For this proposal I suggest we at least consider repeating the type section as the mechanism to denote recursion groups for iso-recursive typing.

I thought about this, but turns out it's not quite that simple. For one, it would not be backwards-compatible to make the existing type section iso-recursive, since all existing types would then suddenly be reinterpreted as part of a recursive group (even if they aren't actually recursive), which drastically affects type equivalence. That would break all uses of call_indirect.

So, at a minimum, we'd need to introduce a new kind of recursive type section, that is distinct from the one we already have. So it's a new construct either way.

I'd also like a coherent story for sections in general. Currently, they have at least the following properties:

  1. Sections can only refer to declarations from previous sections, but are not recursive themselves.
  2. Concatenating sections preserves semantics of the contained definitions (modulo index shifts).

For this new type section we'd need to abandon these and make that section a special case. That might complicate some tooling. Now it's no longer just ordering of definitions that's relevant, but section boundaries themselves – two consecutive sections of the same kind would not be equivalent to their concatenation in this case. For example, a static linker would have to treat the new section differently when merging modules.

All this considered, it's not clear that using section boundaries to delimit recursion would be simpler overall. Being more explicit and uniform has benefits.

@manoskouk
Copy link
Contributor

If we allow these expressions as constant, we must be careful how we specify their interaction with dropped segments, which could allow them to observe mutable state.

@titzer
Copy link
Contributor

titzer commented Jun 14, 2022

Isn't it the case that all initializers must run before the start function, which is the earliest that a segment could be dropped?

@manoskouk
Copy link
Contributor

Conceptually yes, but so far we had only had pure constant expressions, so evaluation time did not matter. Since now it is possible to invoke a constant instance of array.new_elem/data from a function body (e.g. through table.init where the element segment contains array.new_elem), we have to clarify how this interacts with dropped segments.

@rossberg
Copy link
Member

rossberg commented Jun 14, 2022

I'm not sure I follow. What do you mean by "invoking a constant instance"? Can you give a code example?

Just to be sure, just because some instruction can be (part of) a constant expression, does not imply that it is a constant expression in all places.

Edit: I think I see what you mean, technically, the behaviour now depends on mutable state. But FWIW, even without that, evaluation time would still matter, since these instructions can also trap (e.g., if the init range is out of bounds for the segment). That is (externally) observable. So they aren't fully "pure" anyway. Neither would be something like integer division, if we allowed it. That said, I think it would make sense to require that all constant expressions are at least pure up to failure.

@manoskouk
Copy link
Contributor

manoskouk commented Jun 14, 2022

To make things more clear, consider this example:

(type $t (array funcref))
(table $table 10 10 (ref null $t))
(elem $e1 funcref (ref.null func))
(elem $e2 (ref null $t) (array.new_elem $t $e1 (int32.const 0) (int32.const 1)))

(func ...
  (table.init $table $e2 (i32.const 0) (i32.const 1))
  (elem.drop $e1)
  (table.init $table $e2 (i32.const 0) (i32.const 1))
)

In this example, table.init will invoke the evaluation of the constant expression array.new_elem, once before and once after dropping the element segment e1 that defines the array's contents. These two invocations should not produce different results IMO.

@rossberg Right, "pure" is too strong. We should at least specify this so a constant expression always produces the same value or always traps.

@rossberg
Copy link
Member

@manoskouk, thanks for the example. I think I can answer that one quite easily: as far as the semantics is concerned, it's not table.init that triggers the evaluation of the elements; rather, that happens during module instantiation already. (Of course an engine is free to optimise that to something lazy, but only as long as it doesn't change the observable behaviour.)

@titzer
Copy link
Contributor

titzer commented Jun 15, 2022

Yes, that matches my understanding too. Element segments (which themselves contain constants) are evaluated at instantiation time, before the start function.

...well technically speaking, they could be lazily evaluated and their results cached.

@manoskouk
Copy link
Contributor

manoskouk commented Jun 15, 2022

Thanks for the clarification. It is still worth noting that these are the first instructions where lazy evaluation of elements (which we do use in V8 for these instructions) might change observable behavior for non-trapping programs.

@rossberg
Copy link
Member

@manoskouk, yes, fair point.

@manoskouk
Copy link
Contributor

Here is a concrete proposal to make the data and element sections visible in constant expressions:

  • Introduce an "element type section" after the function section. It contains a vector of types and predeclares the types of element segments. If it is present, segments in the element section have to coincide by number and types with the predeclaration.
  • Introduce an "early data count section" between the element type and the table sections. It is identical with the data count section, and its count has to coincide with both the data count and the data section.

I understand having two data count sections is not the most elegant design, but it is more consistent than having a section with flexible position.

@rossberg
Copy link
Member

Yeah, throwing in a bunch of new sections just for this may not be the best way to go. We discussed allowing repeated sections on various occasions, and they are desirable for a number of reasons. With those, the problem is solved neatly. So I think we should rather wait for, or move forward with, those.

@manoskouk
Copy link
Contributor

@rossberg Generalizing what I suggested (which I think you suggested yourself at some point), we can have a declaration and a definition part of each section other than types and imports. Declarations should come after the import section and before all definition sections. We are already doing this in essence for functions/code and data segments. I find generalizing it to other sections more elegant than introducing repeated sections.
One issue is that we are stuck with a late data count section, but I think we can tolerate that.

@rossberg
Copy link
Member

rossberg commented Jun 27, 2022

Yes, I agree that would have been a better design choice, but it's too late to do that without creating a mess now – global, table, memory, and element sections essentially already combine declaration and definition.

Also, such a design is no substitute for repeated sections. It wouldn't help to disentangle the circularities between type and import sections, as they'll arise with type imports, or similar circularities we'd get from the module linking proposal. Not to mention other problems for which repeated sections have been suggested.

@tlively
Copy link
Member

tlively commented Jun 27, 2022

@askeksa-google, do I remember correctly that at one point you had performance data showing the benefit of initializing these constants in globals as well?

Generally having these as constant instructions seems like an important and common enough use case that it would be unfortunate to not support it in the MVP because we are waiting for a follow-on section repetition proposal that has not been started yet. Allowing declarations and definitions to be split up more generally seems like the simpler solution overall and although it would be messy in the binary format, the translation from the old style to new style encoding should be straightforward so I don't think it would be that bad.

@conrad-watt
Copy link
Contributor

conrad-watt commented Jun 27, 2022

Is it possible for us to add the "early" data count section to the binary format in such a way that we can reinterpret it as an example of a more general repeated sections proposal once/if this is standardised - e.g. by giving it the same section id as the current data count section?

EDIT: also, in this case would we want the values of the two data count sections to accumulate, or would we require them to be equal?

EDIT2: I guess the two sections can have different ids for now, and when the more general proposal drops, we merge these ids together and say that both are permitted values to denote the data count section. We'd still need to double check we're picking the right semantics now wrt accumulation.

@tlively
Copy link
Member

tlively commented Jun 27, 2022

I think for the data count section we can frame it as allowing the existing data count section (with the existing section id) to appear in one of multiple possible locations. I don't think there's any reason we would need to want multiple data count sections right now.

The more interesting change would be to introduce an element type section, since that's not just a count of the element segments, but also their types (maybe run-length encoded like locals or something like that). I don't see a simple way of recasting that as a repeated element section unless we introduce a new form of element segment declaration that just gives a type and still requires a later definition.

@askeksa-google
Copy link
Author

@askeksa-google, do I remember correctly that at one point you had performance data showing the benefit of initializing these constants in globals as well?

The numbers I had was that generally moving Dart constants to global initializers rather than initializing them lazily improved both code size and performance by 18-20%.

For the discussed instructions specifically, it's mainly about getting rid of the size limitation imposed by the instruction formerly known as array.init (now called array.new_fixed). With that limitation (i.e. without the discussed instructions), constants involving large arrays are forced to be initialized lazily, and by extension all constants depending on them. It's doable, but messy.

For string constants specifically, there's also a code size benefit, as mentioned in the initial comment. Possibly a startup time difference as well, I guess, though I haven't measured that specifically.

@tlively
Copy link
Member

tlively commented Nov 1, 2022

I propose that we not consider array.new_elem or array.new_data constant in the MVP to avoid having to figure out repeated sections or some other mechanism. Globals can still be initialized in constant expressions using array.new_fixed. I'll go ahead and close the issue, but please reopen it if you think this merits more discussion before phase 4.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants