Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Add other shuffles back? #70

Open
penzn opened this issue Mar 7, 2019 · 4 comments
Open

Add other shuffles back? #70

penzn opened this issue Mar 7, 2019 · 4 comments

Comments

@penzn
Copy link
Contributor

penzn commented Mar 7, 2019

From #30, there may be use in shuffle instructions other than v8x16. Pros are simpler runtime code generation (there are hardware instructions that are a direct match) and space savings (less indices to store). The latter can be furthered by packing the indices. The cost is more complexity in the spec (multiple instructions instead of one).

@penzn
Copy link
Contributor Author

penzn commented Mar 7, 2019

FYI, the other shuffles were removed in 8a1f98c

@tlively
Copy link
Member

tlively commented Mar 7, 2019

I feel that this isn't really necessary. Yes, it could have some space savings, but it does not add any new functionality or optimization opportunity. I could be convinced that adding these extra shuffles would be worth the effort by real-world data showing a non-negligible code size win.

@lemaitre
Copy link

lemaitre commented Mar 8, 2019

I want to express my position on this as I'm the one who proposed to put them back.

First, I don't think adding new instructions increases the complexity of the spec (it will be definitely longer, but not harder).
So the question is not about spec complexity, but number of opcodes.

Is it worth to "waste" 3 opcodes for that?

Pros:

  • It definitely saves binary space.
    The maximum expected gain would be 15 bytes per shuffle if you need v64x2.shuffle (this would be 9 bytes "only" if immediates are packed Packed lane indices #69 )
    For a horizontal add of floats (for example), you could then save 15 + 5 bytes in total (for a single reduction).
  • It actually reduces the complexity of the virtual machine by avoiding the need for complex pattern matching of the shuffle rule.
    Indeed, it conveys more semantic from the compiler: more semantic => simpler efficient translation.

Cons:

  • 3 more opcodes consumed (might be an issue in the future, but definitely not now)
  • Larger spec (do we really care about this one?)

In my opinion, it is worth specifying those 3 extra instructions.
Also, please remember that shuffling is an important part of most non-regular SIMD code (ie: everything that is not pure linear algebra basically).

@penzn
Copy link
Contributor Author

penzn commented Mar 8, 2019

See @binji's comment in #69, WASM spec hasn't tried to do this type of optimization yet:

In general we haven't tried too hard to minimize the size of uncompressed wasm. For example, every load/store has an extra two bytes at least for the alignment and offset.

It would make sense to do space optimizations across all WASM, not just SIMD.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants