You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 22, 2021. It is now read-only.
Lanes with an out-of-range selector become 0 in the output vector.
According to the Intel manual (and some experiments I ran), PSHUFB uses the four least significant bits to decide which lane to grab from a vector. If the most significant bit is one (e.g. 0b10000000), then the result is zeroed. But index values in between 0x0f and 0x80 will use the four least significant bits as an index and will not zero the value. To correctly implement the spec as it currently reads we would need to copy the swizzle mask to another register, do a greater-than comparison to get a bit in the most significant position, and OR this with the original swizzle mask before using the PSHUFB instruction--four instructions instead of one.
Should v128.swizzle change to allow more optimal implementations? Are there considerations for other architectures that I am not aware of?