Closed
Description
In the SIMD meeting yesterday, @gnzlbg encouraged me to file an issue suggesting what aarch64 intrinsics to do first in the interest of letting encoding_rs's SIMD functionality migrate to stable Rust.
Assuming that LLVM portable shuffles become available on stable together with the rest of portable 128-bit SIMD, my aarch64 wishlist has just two items:
extern "platform-intrinsic" {
fn aarch64_vmaxvq_u8(x: u8x16) -> u8;
fn aarch64_vmaxvq_u16(x: u16x8) -> u16;
}
If we don't get portable shuffles on stable for a while, then my wishlist has two additional items:
- The intrinsic(s) that enables the operation of expanding a single
u8x16
into twou16x8
s such that eachu8
lane is zero-extended to au16
lane. I believe this would be zipping theu8x16
with an all-zero vector. - The intrinsic(s) that takes two
u16x8
s and in the situation where the high half of each lane is zeroed, produces au8x16
consisting of the low half of each lane. (If the high halves aren't zero, I don't care what garbage ends up in the resultu8x16
as long as it doesn't trap. For example, on SSE2, I usex86_mm_packus_epi16
, which does signed saturation instead of just discarding the high halves.) I believe on aarch64 this would be some kind of unzip operation with the other output vector (the one with the high halves) ignored afterwards.