-
Notifications
You must be signed in to change notification settings - Fork 299
Closed
Description
pub unsafe fn _mm256_aesdec_epi128(a: __m256i, round_key: __m256i) -> __m256i { |
It seems that the stdarch
library unconditionally put VAES
related intrinsics into the AVX512VL
scope while VAES
is actually available on platforms without AVX512 support (AMD Zen3).
I spot the issue when I was investigating VAES with ahash: tkaitchuck/aHash#85
When using _mm256_aesenc_epi128
, instead of generating something like
2cf7e: c5 d1 6c ee vpunpcklqdq %xmm6,%xmm5,%xmm5
2cf82: c4 c1 f9 6e f2 vmovq %r10,%xmm6
2cf87: c5 c9 6c f7 vpunpcklqdq %xmm7,%xmm6,%xmm6
2cf8b: c4 e2 7d dc c4 vaesenc %ymm4,%ymm0,%ymm0
2cf90: c4 e2 75 dc cb vaesenc %ymm3,%ymm1,%ymm1
2cf95: c4 e3 7d 38 f6 01 vinserti128 $0x1,%xmm6,%ymm0,%ymm6
2cf9b: c4 e3 55 02 ee f0 vpblendd $0xf0,%ymm6,%ymm5,%ymm5
2cfa1: c4 e2 55 00 ea vpshufb %ymm2,%ymm5,%ymm5
2cfa6: c5 e5 d4 dd vpaddq %ymm5,%ymm3,%ymm3
2cfaa: c4 e2 65 00 da vpshufb %ymm2,%ymm3,%ymm3
2cfaf: c5 dd d4 db vpaddq %ymm3,%ymm4,%ymm3
2cfb3: c4 e3 7d 39 dc 01 vextracti128 $0x1,%ymm3,%xmm4
2cfb9: c4 e3 f9 16 d8 01 vpextrq $0x1,%xmm3,%rax
2cfbf: c4 e1 f9 7e d9 vmovq %xmm3,%rcx
2cfc4: c4 c3 f9 16 e1 01 vpextrq $0x1,%xmm4,%r9
2cfca: c4 c1 f9 7e e2 vmovq %xmm4,%r10
the compiler generates
2cf8d: c5 fc 29 84 24 80 00 vmovaps %ymm0,0x80(%rsp)
2cf94: 00 00
2cf96: c5 fd 7f 94 24 20 01 vmovdqa %ymm2,0x120(%rsp)
2cf9d: 00 00
2cf9f: c5 fc 29 8c 24 40 01 vmovaps %ymm1,0x140(%rsp)
2cfa6: 00 00
2cfa8: c5 f8 77 vzeroupper
2cfab: e8 c0 db ff ff call 2ab70 <core::core_arch::x86::avx512vaes::_mm256_aesenc_epi128::hdf0bd31f9011eecd>
2cfb0: c5 fd 6f 84 24 a0 00 vmovdqa 0xa0(%rsp),%ymm0
2cfb7: 00 00
2cfb9: c5 fd 6f 1c 24 vmovdqa (%rsp),%ymm3
2cfbe: c5 fd d4 44 24 60 vpaddq 0x60(%rsp),%ymm0,%ymm0
2cfc4: c4 e2 7d 00 05 d3 ad vpshufb 0x8add3(%rip),%ymm0,%ymm0 # b7da0 <_fini+0x10a0>
2cfcb: 08 00
2cfcd: c5 fd d4 44 24 40 vpaddq 0x40(%rsp),%ymm0,%ymm0
2cfd3: c4 c3 f9 16 c7 01 vpextrq $0x1,%xmm0,%r15
2cfd9: c4 e1 f9 7e c3 vmovq %xmm0,%rbx
2cfde: c4 e3 7d 39 c0 01 vextracti128 $0x1,%ymm0,%xmm0
2cfe4: c4 c3 f9 16 c6 01 vpextrq $0x1,%xmm0,%r14
2cfea: c4 e1 f9 7e c6 vmovq %xmm0,%rsi
It is very strange that the compiler do not inline the function call under release profile even if target-cpu=native
is set.
However, when I explicit write
extern "C" {
#[link_name = "llvm.x86.aesni.aesenc.256"]
fn aesenc_256(a: __m256i, round_key: __m256i) -> __m256i;
}
unsafe {
transmute(aesenc_256(transmute(value), transmute(xor)))
}
The compiler will give the upper asm as expected.
I suspect that this is because of the intrinsic being marked as avx512vl
instruction.
Metadata
Metadata
Assignees
Labels
No labels