Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Alternative to Swizzle / Shuffle #8

@billbudge

Description

@billbudge

These operations will be very difficult to implement efficiently.

Even on Intel (SSE, AVX) which has native shuffle instructions, the code generation for this is very complex. Here is the Intel instruction selection lowering in LLVM. There is quite a lot of code to implement the shuffle intrinsic:

X86ISelLowering.cpp

The ARM version is worse, since many cases require use of 2 'vtbl' instructions, in addition to materializing the two constant shuffle masks in d-registers. LLVM uses a pre-generated table of 26K entries to generate fast instruction sequences for the 32x4 shuffles.

ARMISelLowering.cpp

This is a lot of complexity for any compiler, and too much for WebAssembly translators. Most of these swizzles and shuffles will never be used. It is a hazard to provide this feature if we can't guarantee that all shuffles will be fast on all platforms.

An alternative is to implement a small set of primitive permutations that we know can be implemented efficiently, without lots of work in the translator. I think these should cover most real-world cases, and can be composed by the programmer or compiler for other shuffles. By being similar to a real ISA, it should also be straightforward to modify toolchains to support WASM SIMD.

I'm recommending a set along the lines of the ARM permutation instructions:

  • Interleave(low, high) (merge elements from two source vectors into a single vector, with low and high modifiers so we only have a single result vector.)
  • De-interleave(low, high) (inverse shuffle from interleave)
  • Transpose(low, high) (swap even elements from first source with odd elements from second source, low and high modifiers.)
  • Concatenate(k) (concatenate two source vectors, top k bytes from first, bottom 16-k bytes from second, 0 < k < 16, AKA "slide" or "window" shuffle.

Additionally, we may want to have shuffles that reverse the lanes in various patterns like the ARM vrev instructions.

On Intel these can be implemented using the pshuf instructions. We're assuming SSE 4.1 as a baseline for SIMD support right now. POWER and MIPS have similar primitive shuffles.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions