VAES should not be restricted to AVX512

https://github.com/rust-lang/stdarch/blob/790411f93c4b5eada3c23abb4c9a063fb0b24d99/crates/core_arch/src/x86/avx512vaes.rs#L65

It seems that the `stdarch` library unconditionally put `VAES` related intrinsics into the `AVX512VL` scope while `VAES` is actually available on platforms without AVX512 support (AMD Zen3).

I spot the issue when I was investigating VAES with ahash: https://github.com/tkaitchuck/aHash/issues/85

When using `_mm256_aesenc_epi128`, instead of generating something like
```
   2cf7e:       c5 d1 6c ee             vpunpcklqdq %xmm6,%xmm5,%xmm5
   2cf82:       c4 c1 f9 6e f2          vmovq  %r10,%xmm6
   2cf87:       c5 c9 6c f7             vpunpcklqdq %xmm7,%xmm6,%xmm6
   2cf8b:       c4 e2 7d dc c4          vaesenc %ymm4,%ymm0,%ymm0
   2cf90:       c4 e2 75 dc cb          vaesenc %ymm3,%ymm1,%ymm1
   2cf95:       c4 e3 7d 38 f6 01       vinserti128 $0x1,%xmm6,%ymm0,%ymm6
   2cf9b:       c4 e3 55 02 ee f0       vpblendd $0xf0,%ymm6,%ymm5,%ymm5
   2cfa1:       c4 e2 55 00 ea          vpshufb %ymm2,%ymm5,%ymm5
   2cfa6:       c5 e5 d4 dd             vpaddq %ymm5,%ymm3,%ymm3
   2cfaa:       c4 e2 65 00 da          vpshufb %ymm2,%ymm3,%ymm3
   2cfaf:       c5 dd d4 db             vpaddq %ymm3,%ymm4,%ymm3
   2cfb3:       c4 e3 7d 39 dc 01       vextracti128 $0x1,%ymm3,%xmm4
   2cfb9:       c4 e3 f9 16 d8 01       vpextrq $0x1,%xmm3,%rax
   2cfbf:       c4 e1 f9 7e d9          vmovq  %xmm3,%rcx
   2cfc4:       c4 c3 f9 16 e1 01       vpextrq $0x1,%xmm4,%r9
   2cfca:       c4 c1 f9 7e e2          vmovq  %xmm4,%r10
```
the compiler generates
```
  2cf8d:       c5 fc 29 84 24 80 00    vmovaps %ymm0,0x80(%rsp)
   2cf94:       00 00 
   2cf96:       c5 fd 7f 94 24 20 01    vmovdqa %ymm2,0x120(%rsp)
   2cf9d:       00 00 
   2cf9f:       c5 fc 29 8c 24 40 01    vmovaps %ymm1,0x140(%rsp)
   2cfa6:       00 00 
   2cfa8:       c5 f8 77                vzeroupper
   2cfab:       e8 c0 db ff ff          call   2ab70 <core::core_arch::x86::avx512vaes::_mm256_aesenc_epi128::hdf0bd31f9011eecd>
   2cfb0:       c5 fd 6f 84 24 a0 00    vmovdqa 0xa0(%rsp),%ymm0
   2cfb7:       00 00 
   2cfb9:       c5 fd 6f 1c 24          vmovdqa (%rsp),%ymm3
   2cfbe:       c5 fd d4 44 24 60       vpaddq 0x60(%rsp),%ymm0,%ymm0
   2cfc4:       c4 e2 7d 00 05 d3 ad    vpshufb 0x8add3(%rip),%ymm0,%ymm0        # b7da0 <_fini+0x10a0>
   2cfcb:       08 00 
   2cfcd:       c5 fd d4 44 24 40       vpaddq 0x40(%rsp),%ymm0,%ymm0
   2cfd3:       c4 c3 f9 16 c7 01       vpextrq $0x1,%xmm0,%r15
   2cfd9:       c4 e1 f9 7e c3          vmovq  %xmm0,%rbx
   2cfde:       c4 e3 7d 39 c0 01       vextracti128 $0x1,%ymm0,%xmm0
   2cfe4:       c4 c3 f9 16 c6 01       vpextrq $0x1,%xmm0,%r14
   2cfea:       c4 e1 f9 7e c6          vmovq  %xmm0,%rsi
```

It is very strange that the compiler do not inline the function call under release profile even if `target-cpu=native` is set.
However, when I explicit write
```rust
    extern "C" {
        #[link_name = "llvm.x86.aesni.aesenc.256"]
        fn aesenc_256(a: __m256i, round_key: __m256i) -> __m256i;
    }

    unsafe {
        transmute(aesenc_256(transmute(value), transmute(xor)))
    }
```
The compiler will give the upper asm as expected.

I suspect that this is because of the intrinsic being marked as `avx512vl` instruction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VAES should not be restricted to AVX512 #1343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VAES should not be restricted to AVX512 #1343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions