-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Closed
Labels
Description
Normally constraint x
allows passing SIMD vectors as X86 inline assembly operands, as documented here for LLVM inline assembly:
https://llvm.org/docs/LangRef.html#inline-assembler-expressions
This Compiler Explorer link shows that this works for __m512
but not for __m512bh
.
https://godbolt.org/z/Wrrnr6E4d
Pasted here:
#include <immintrin.h>
#include <stdint.h>
__m512bh broadcast_load_2xbf16(const void *src) {
__m512bh dst;
asm("vbroadcastss %[dst], dword ptr [%[src]]" :[dst]"=x"(dst) : [src]"r"(src) :);
return dst;
}
__m512 broadcast_load_f32(const void *src) {
__m512 dst;
asm("vbroadcastss %[dst], dword ptr [%[src]]" :[dst]"=x"(dst) : [src]"r"(src) :);
return dst;
}
Compile with flags: -O3 -masm=intel -mavx -mavx2 -mfma -mf16c -mavx512f -mavx512vl -mavx512cd -mavx512bw -mavx512dq -mavx512bf16
.
The function broadcast_load_f32
compiles but the function broadcast_load_2xbf16
errors with:
<source>:6:7: error: couldn't allocate output register for constraint 'x'
6 | asm("vbroadcastss %[dst], dword ptr [%[src]]" :[dst]"=x"(dst) : [src]"r"(src) :);
| ^
1 error generated.
Compiler returned: 1