Immutable arrays? #250

kripken · 2021-10-13T15:16:38Z

The spec does not allow immutable arrays atm. When optimizing things for j2wasm we noticed that it seems like we could support them, if we have array.init which takes the values to initialize with, which is also being experimented with.

It seems like the rules for an immutable array could be like those of a struct, that is, subtypes would need to match it precisely on everything? Structs can at least add more fields, but not arrays, so this makes subtyping less useful in the array case, of course. Still, immutable arrays can be useful because of things like itables: An itable call does a read from an array that is in practice immutable, so if the toolchain and VM know it is immutable that can help.

Opening this issue for thoughts as we look into experimenting with it.

The text was updated successfully, but these errors were encountered:

RossTate · 2021-10-13T16:46:49Z

Making itables immutable is also important for enabling better speculative optimization. (I'll give some more context in another post.)

array.init only handles the case where the array's size is known at generation time, but that's probably sufficient for supporting at least the immutable arrays used in runtimes.

Note, though, that i-tables will be a particularly inefficient implementation of interface-method dispatch in WebAssembly. Besides the i-table being immutable, normally the i-table is flattened directly into the object descriptor (or v-table), as are the method tables for each of the interfaces in the i-table. That locality has a big impact on performance. And then there's the fact that, in WebAssembly, you'll have to cast the method table for the interface once you find it. Personally, I'd recommend interface-method tables (using call tags) over i-tables, which we've recently observed to implement interface-method dispatch just as efficiently as class-method dispatch, and which can probably be much more easily implemented than providing generators a way to specify flattened data structures.

jakobkummerow · 2021-10-13T16:54:51Z

I support this.

I think you have a mixup in the subtyping description: for mutable arrays, subtypes have to match the field type exactly. For immutable arrays, subtypes of the array can specify a non-trivial (non-identical) subtype for the field type. The mutability attribute itself must be identical in a subtype and its supertype. The summary is what you said: arrays would then behave for subtyping like structs with a single field.

kripken · 2021-10-13T18:09:36Z

@jakobkummerow Oh right, yes, I had that part wrong.

rossberg · 2021-10-14T10:15:59Z

To clarify, subtyping on immutable arrays is already allowed. The following module validates just fine in the current MVP and the interpreter:

  (module
    (type $t (struct (field i32)))
    (type $u (struct (field i32 i32)))
    (type $at (array (ref $t)))
    (type $au (array (ref $u)))
    (func $f (param (ref $at)))
    (func $g (param (ref $au)) (call $f (local.get 0)))
  )

In fact, I just noticed that the interpreter even allows allocating immutable arrays, in contrast to what the MVP doc says. This runs fine:

      (call $f
        (array.new $au
          (struct.new $u (i32.const 1) (i32.const 2) (rtt.canon $u))
          (i32.const 100)
          (rtt.canon $au)
        )
      )

However, that's not particularly useful, which is why I changed it in the doc. I should adjust the interpreter.

What's missing for immutable arrays to be useful is the ability to allocate an immutable array with different elements. Technically, we could of course add an instruction like

array.new_with_multi $a N : [t^N (rtt n $a)] -> [(ref $a)]
  where $a = (array (mut t))

However, obviously N would have to be a producer-time constant. That makes it mostly redundant, because in most cases where N is known, the producer could just as well use a struct type instead, which is cheaper to access as well. The remaining cases are probably fairly niche. (Turns out I happen to have a use case in the Wob compiler, but it's somewhat benign.)

My main motivation for not adding this was that most practical cases of initialising arrays need something more general anyway, and can really only be handled with readonly fields. So if we think (quasi-)immutable arrays are important for the MVP, then we should perhaps rather consider up-prioritising readonly, which wouldn't be difficult (and has other benefits).

(Another reason I'm hesitant about an array.new_with_multi instruction is the concern that it may encourage the use of N = 1000 or worse in real code, which might turn out bad in terms of validation cost, and possibly other compilation phases, compared to straightline assignments. We could of course impose an arbitrarily low limit on N, but that would feel ad-hoc as well.)

manoskouk · 2021-10-14T10:45:58Z

The instruction you are describing is pretty much the same as array.init that we are experimenting with in V8. The only difference is that we do not allow an additional length argument. I think its use is not that niche, as you can define constants for types which are generally represented as arrays of arbitrary length, e.g. strings.
We have set the limit of N to 999, which, as you suggested, is arbitrary. To circumvent this limitation for string constants, we are discussing loading elements directly from raw bytes as opposed to a series of i32.const, through some sort of data segment, or maybe immediate arguments.

jakobkummerow · 2021-10-14T10:50:22Z

subtyping on immutable arrays is already allowed

That statement appears to be a somewhat misleading technicality given that MVP.md clearly states:

Note: in the MVP, all arrays must be defined as mutable

Regarding the limit for N: we've reused the same value we're currently using for the number of fields in a struct. We expect the specific value to be standardized eventually; the current value is just an initial guess. In practice, the array.init occurrences I've seen so far have a handful of elements, but I haven't run any thorough analysis.

rossberg · 2021-10-14T11:41:57Z

@manoskouk, ah, the i32 operand was c&p, fixed.

I agree that the ability to initialise an array from a data segment is missing. I have run into that myself, and having to go through an aux memory for string constants is kind of stupid. Especially for the string use case, that might actually be the more appropriate solution than array.new_multi.

The main functionality would be something like:

array.init_data $a (data $d) : [(ref null $a) i32 i32 i32] -> []
array.init_elem $a (elem $e) : [(ref null $a) i32 i32 i32] -> []

analogous to memory.init and table.init, where $d and $e are a data or element segment index, respectively. The former is usable for arrays over number types, the latter for arrays over reference types.

On top of that, we could have

array.new_data $a (data $d) : [i32 i32 i32 (rtt $a)] -> [(ref $a)]
array.new_elem $a (elem $e) : [i32 i32 i32 (rtt $a)] -> [(ref $a)]

But it is a slippery slope to adding ever more variants of array.new, e.g., to initialise from memories, tables or another array. All these variants are unnecessary in the presence of readonly.

OTOH, I could imagine

array.copy $a1 $a2 : [(ref null $a1) i32 (ref null $a2) i32 i32] -> []
array.copy_from_memory $a (memory $m) : [(ref null $a) i32 i32 i32] -> []
array.copy_from_table $a (table $t) : [(ref null $a) i32 i32 i32] -> []
array.copy_to_memory $a (memory $m) : [(ref null $a) i32 i32 i32] -> []
array.copy_to_table $a (table $t) : [(ref null $a) i32 i32 i32] -> []

as operations analogous to memory.copy and table.copy.

@jakobkummerow:

That statement appears to be a somewhat misleading technicality given that > MVP.md clearly states:

Note: in the MVP, all arrays must be defined as mutable

Fair enough, that's rather ambiguous. I believe it was meant to talk about arrays, not array types per se, otherwise there would have been an explicit condition about <fieldtype>. But then it should better say "created" not "defined". That said, I'd be fine to resolve it either way.

kripken · 2021-10-14T17:11:56Z

@rossberg Oh, thanks, I had read the statement @jakobkummerow quoted and also understood it to mean this was disallowed. Good to hear that this is already ok then.

tlively · 2022-04-05T00:10:11Z

Note: in the MVP, all arrays must be defined as mutable

... I believe it was meant to talk about arrays, not array types per se, otherwise there would have been an explicit condition about <fieldtype>. But then it should better say "created" not "defined". That said, I'd be fine to resolve it either way.

@rossberg, I still don't understand what this note would mean if it read "all arrays must be created as mutable." It would be great to clarify this language in the doc; I think that's the only thing preventing us from closing this issue.

jakobkummerow · 2022-04-05T11:26:07Z

Now that we have array.new_fixed and array.new_data, we should allow immutable arrays. We all agreed that the only reason not to have them was that there was no way to create them (with non-default values), but that's no longer the case.

IIRC they are useful in particular for itable implementations, where the immutability helps toolchains/engines apply optimizations/inferences.

rossberg · 2022-04-05T14:43:36Z

What @jakobkummerow said. The note was obsolete and I just forgot to remove it. Done now.

tlively · 2022-04-05T16:26:26Z

Great, I'll close this issue, then.

Previously, we used mutable array-based interface tables (itables) without a specific reason. However, accessing mutable arrays can lead to missed optimization opportunities. For instance, loop-invariant code motion (LICM) won't be able to move certain parts of interface dispatching operations out of loops: When we invoke a method using interface table dispatching, it requires retrieving an itable from the class's itable array, which is considered non-pure. In fact, reading from an array is marked as unsafe to move in wasm-opt's LICM implementation: https://github.com/WebAssembly/binaryen/blob/34ad6a7598e662e9ff357987f2c81fde1e05c522/src/ir/effects.h#L207-L210 https://github.com/WebAssembly/binaryen/blob/34ad6a7598e662e9ff357987f2c81fde1e05c522/src/passes/LoopInvariantCodeMotion.cpp#L125-L128 (Note: We may not run wasm-opt's LICM in favor of the VM's LICM, as discussed in WebAssembly/binaryen#6924) This commit changes itables to be immutable structs, making itable retrieval a pure operation. Here's a comparison of benchmark results between commit 4c9494e and this change: | | array itables | struct itables | | ----------- | ------------- | -------------- | | sha512 | 12128.73213 | 12107.94142 | | queens | 3.632068223 | 3.655815179 | | list | 84.61223471 | 65.69730564 | | richards | 92.12833022 | 91.19028905 | | cd | 46267.10318 | 46574.81215 | | gcbench | 116540.568 | 112594.324 | | tracerFloat | 2569.675034 | 2551.227714 | | tracer | 1725.591249 | 1697.907466 | | sudoku | 14863.33909 | 12066.23882 | | nbody | 366388.3292 | 199625.971 | | permute | 1006.215249 | 983.9858283 | | deltaBlue | 2205.002528 | 2320.038418 | | kmeans | 1752457.829 | 1754425.973 | | brainfuck | 36983.97583 | 36093.7837 | | json | 1205.92764 | 1191.768986 | | bounce | 21.14001572 | 20.8851082 | While most benchmark results remain largely unchanged, some tests such as `list`, `sudoku`, and `nbody` show improved performance compared to the array-based itables. This improvement may be attributed to the VM's optimization capabilities. **immutable array?** In this change, we implement itables using immutable structs However, we can use immutable arrays as an alternative. We implemented and benchmarked itables using immutable arrays, and the results show nearly same performance as the immutable struct-based implementation. WebAssembly/gc#250 v8/v8@6e36e3e Let's proceed with the immutable struct-based implementation for now because - While V8 seems to optimize immutable array-related operations, other VMs may not offer similar optimizations (?) - Currently, wasm-opt doesn't take into account the mutability of arrays in its optimization process.

Previously, we used mutable array-based interface tables (itables) without a specific reason. However, accessing mutable arrays can lead to missed optimization opportunities. For instance, loop-invariant code motion (LICM) won't be able to move certain parts of interface dispatching operations out of loops: When we invoke a method using interface table dispatching, it requires retrieving an itable from the class's itable array, which is considered non-pure. In fact, reading from an array is marked as unsafe to move in wasm-opt's LICM implementation: https://github.com/WebAssembly/binaryen/blob/34ad6a7598e662e9ff357987f2c81fde1e05c522/src/ir/effects.h#L207-L210 https://github.com/WebAssembly/binaryen/blob/34ad6a7598e662e9ff357987f2c81fde1e05c522/src/passes/LoopInvariantCodeMotion.cpp#L125-L128 (Note: We may not run wasm-opt's LICM in favor of the VM's LICM, as discussed in WebAssembly/binaryen#6924) This commit changes itables to be immutable structs, making itable retrieval a pure operation. Here's a comparison of benchmark results between commit 4c9494e and this change: | | array itables | struct itables | | ----------- | ------------- | -------------- | | sha512 | 12128.73213 | 12107.94142 | | queens | 3.632068223 | 3.655815179 | | list | 84.61223471 | 65.69730564 | | richards | 92.12833022 | 91.19028905 | | cd | 46267.10318 | 46574.81215 | | gcbench | 116540.568 | 112594.324 | | tracerFloat | 2569.675034 | 2551.227714 | | tracer | 1725.591249 | 1697.907466 | | sudoku | 14863.33909 | 12066.23882 | | nbody | 366388.3292 | 199625.971 | | permute | 1006.215249 | 983.9858283 | | deltaBlue | 2205.002528 | 2320.038418 | | kmeans | 1752457.829 | 1754425.973 | | brainfuck | 36983.97583 | 36093.7837 | | json | 1205.92764 | 1191.768986 | | bounce | 21.14001572 | 20.8851082 | While most benchmark results remain same, some tests such as `list`, `sudoku`, and `nbody` show improved performance compared to the array-based itables (may be thanks to the VM's optimization capabilities). **immutable array?** In this change, we implement itables using immutable structs However, we can use immutable arrays as an alternative. We implemented and benchmarked itables using immutable arrays, and the results show nearly same performance as the immutable struct-based implementation. WebAssembly/gc#250 v8/v8@6e36e3e Let's proceed with the immutable struct-based implementation for now because - While V8 seems to optimize immutable array-related operations, other VMs may not offer similar optimizations (?) - Currently, wasm-opt doesn't take into account the mutability of arrays in its optimization process.

askeksa-google mentioned this issue Dec 8, 2021

Initializing array from data or element segment #260

Closed

tlively closed this as completed Apr 5, 2022

tanishiking mentioned this issue Sep 17, 2024

Wasm: Make itables immutable struct scala-js/scala-js#5038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Immutable arrays? #250

Immutable arrays? #250

kripken commented Oct 13, 2021

RossTate commented Oct 13, 2021

Uh oh!

jakobkummerow commented Oct 13, 2021

Uh oh!

kripken commented Oct 13, 2021

Uh oh!

rossberg commented Oct 14, 2021 •

edited

Loading

Uh oh!

manoskouk commented Oct 14, 2021

Uh oh!

jakobkummerow commented Oct 14, 2021

Uh oh!

rossberg commented Oct 14, 2021 •

edited

Loading

Uh oh!

kripken commented Oct 14, 2021

Uh oh!

tlively commented Apr 5, 2022

Uh oh!

jakobkummerow commented Apr 5, 2022

Uh oh!

rossberg commented Apr 5, 2022

Uh oh!

tlively commented Apr 5, 2022

Uh oh!

Immutable arrays? #250

Immutable arrays? #250

Comments

kripken commented Oct 13, 2021

RossTate commented Oct 13, 2021

Uh oh!

jakobkummerow commented Oct 13, 2021

Uh oh!

kripken commented Oct 13, 2021

Uh oh!

rossberg commented Oct 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manoskouk commented Oct 14, 2021

Uh oh!

jakobkummerow commented Oct 14, 2021

Uh oh!

rossberg commented Oct 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Oct 14, 2021

Uh oh!

tlively commented Apr 5, 2022

Uh oh!

jakobkummerow commented Apr 5, 2022

Uh oh!

rossberg commented Apr 5, 2022

Uh oh!

tlively commented Apr 5, 2022

Uh oh!

rossberg commented Oct 14, 2021 •

edited

Loading

rossberg commented Oct 14, 2021 •

edited

Loading