Add other shuffles back? #70

penzn · 2019-03-07T19:54:13Z

From #30, there may be use in shuffle instructions other than v8x16. Pros are simpler runtime code generation (there are hardware instructions that are a direct match) and space savings (less indices to store). The latter can be furthered by packing the indices. The cost is more complexity in the spec (multiple instructions instead of one).

The text was updated successfully, but these errors were encountered:

penzn · 2019-03-07T19:55:27Z

FYI, the other shuffles were removed in 8a1f98c

tlively · 2019-03-07T21:55:50Z

I feel that this isn't really necessary. Yes, it could have some space savings, but it does not add any new functionality or optimization opportunity. I could be convinced that adding these extra shuffles would be worth the effort by real-world data showing a non-negligible code size win.

lemaitre · 2019-03-08T08:37:33Z

I want to express my position on this as I'm the one who proposed to put them back.

First, I don't think adding new instructions increases the complexity of the spec (it will be definitely longer, but not harder).
So the question is not about spec complexity, but number of opcodes.

Is it worth to "waste" 3 opcodes for that?

Pros:

It definitely saves binary space.
The maximum expected gain would be 15 bytes per shuffle if you need v64x2.shuffle (this would be 9 bytes "only" if immediates are packed Packed lane indices #69 )
For a horizontal add of floats (for example), you could then save 15 + 5 bytes in total (for a single reduction).
It actually reduces the complexity of the virtual machine by avoiding the need for complex pattern matching of the shuffle rule.
Indeed, it conveys more semantic from the compiler: more semantic => simpler efficient translation.

Cons:

3 more opcodes consumed (might be an issue in the future, but definitely not now)
Larger spec (do we really care about this one?)

In my opinion, it is worth specifying those 3 extra instructions.
Also, please remember that shuffling is an important part of most non-regular SIMD code (ie: everything that is not pure linear algebra basically).

penzn · 2019-03-08T17:27:08Z

See @binji's comment in #69, WASM spec hasn't tried to do this type of optimization yet:

In general we haven't tried too hard to minimize the size of uncompressed wasm. For example, every load/store has an extra two bytes at least for the alignment and offset.

It would make sense to do space optimizations across all WASM, not just SIMD.

penzn mentioned this issue Mar 7, 2019

Shuffle with immediate indices specification #30

Closed

penzn mentioned this issue Aug 14, 2019

Inefficient x64 codegen for swizzle #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add other shuffles back? #70

Add other shuffles back? #70

penzn commented Mar 7, 2019

penzn commented Mar 7, 2019

tlively commented Mar 7, 2019

lemaitre commented Mar 8, 2019

penzn commented Mar 8, 2019

Add other shuffles back? #70

Add other shuffles back? #70

Comments

penzn commented Mar 7, 2019

penzn commented Mar 7, 2019

tlively commented Mar 7, 2019

lemaitre commented Mar 8, 2019

Is it worth to "waste" 3 opcodes for that?

penzn commented Mar 8, 2019