Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Packed shift #110

Open
Open
@penzn

Description

@penzn

I looked into single-precision Mersenne Twister after @ngzhian did double-precision port, and its core function relies on "shift bytes" behavior, represented by PSLLDQ/PSRLDQ on x86 and VEXT on Arm:

https://github.com/penzn/SFMT-wasm/blob/master/SFMT-sse2.h#L37
https://github.com/penzn/SFMT-wasm/blob/master/SFMT-neon.h#L36

In current state of this proposal the only way to achieve this result is to use shuffle, which lowers to hardware shuffle instructions. Using hardware "shift bytes" might yield better throughput, though it would not hurt to prototype this to know for sure. In case they do, there are two options:

  • Weaken shuffle in code generator if the indices are consecutive
  • Add shuffle bytes instruction

I am not sure how easy the former is, as I remember there was talk about doing the same to leverage shuffles with wider lanes, but I am not sure any implementation does that.

I have worked on a Mersenne Twister Wasm port which is using shuffle to get the same results, will try to post it online, stay tuned.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions