Skip to content

Conversation

efriedma-quic
Copy link
Collaborator

SVE depends on a combination of host support and operating system support. Sometimes those don't line up with detected host CPU name; make sure SVE is disabled when it isn't available. Implement this for both Windows and Linux. (We don't have a codepath for other operating systems. If someone wants to implement this, it should be possible to adapt fmv code from compiler-rt.)

While I'm here, also add support for detecting other Windows CPU features.

For Windows, declare constants ourselves so the code builds on older SDKs; we also do this in compiler-rt.

Currently untested; help testing for both Windows and Linux would be appreciated.

Copy link

github-actions bot commented Sep 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

SVE depends on a combination of host support and operating system
support. Sometimes those don't line up with detected host CPU name;
make sure SVE is disabled when it isn't available.  Implement this for
both Windows and Linux.  (We don't have a codepath for other operating
systems. If someone wants to implement this, it should be possible to
adapt fmv code from compiler-rt.)

While I'm here, also add support for detecting other Windows CPU
features.

For Windows, declare constants ourselves so the code builds on older
SDKs; we also do this in compiler-rt.
@dpaoliello
Copy link
Contributor

On my Surface Pro 11 with a Qualcomm Snapdragon X I get:

> .\build\bin\clang.exe -mcpu=native -print-enabled-extensions
clang version 22.0.0git (https://github.com/dpaoliello/llvm-project.git fc5d883c7269c0182c99ee0962722cc22c749b88)
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\repos\llvm\build\bin
Build config: +assertions
'+jscvt' is not a recognized feature for this target (ignoring feature)
Extensions enabled for the given AArch64 target

    Architecture Feature(s)                                Description
    FEAT_AES, FEAT_PMULL                                   Enable AES support
    FEAT_AMUv1                                             Enable Armv8.4-A Activity Monitors extension
    FEAT_AMUv1p1                                           Enable Armv8.6-A Activity Monitors Virtualization support
    FEAT_AdvSIMD                                           Enable Advanced SIMD instructions
    FEAT_BF16                                              Enable BFloat16 Extension
    FEAT_BTI                                               Enable Branch Target Identification
    FEAT_CCIDX                                             Enable Armv8.3-A Extend of the CCSIDR number of sets
    FEAT_CRC32                                             Enable Armv8.0-A CRC-32 checksum instructions
    FEAT_CSV2_2                                            Enable architectural speculation restriction
    FEAT_DIT                                               Enable Armv8.4-A Data Independent Timing instructions
    FEAT_DPB                                               Enable Armv8.2-A data Cache Clean to Point of Persistence
    FEAT_DPB2                                              Enable Armv8.5-A Cache Clean to Point of Deep Persistence
    FEAT_DotProd                                           Enable dot product support
    FEAT_ECV                                               Enable enhanced counter virtualization extension
    FEAT_FCMA                                              Enable Armv8.3-A Floating-point complex number support
    FEAT_FGT                                               Enable fine grained virtualization traps extension
    FEAT_FHM                                               Enable FP16 FML instructions
    FEAT_FP                                                Enable Armv8.0-A Floating Point Extensions
    FEAT_FP16                                              Enable half-precision floating-point data processing
    FEAT_FRINTTS                                           Enable FRInt[32|64][Z|X] instructions that round a floating-point number to an integer (in FP format) forcing it to fit into a 32- or 64-bit int
    FEAT_FlagM                                             Enable Armv8.4-A Flag Manipulation instructions
    FEAT_FlagM2                                            Enable alternative NZCV format for floating point comparisons
    FEAT_I8MM                                              Enable Matrix Multiply Int8 Extension
    FEAT_JSCVT                                             Enable Armv8.3-A JavaScript FP conversion instructions
    FEAT_LOR                                               Enable Armv8.1-A Limited Ordering Regions extension
    FEAT_LRCPC                                             Enable support for RCPC extension
    FEAT_LRCPC2                                            Enable Armv8.4-A RCPC instructions with Immediate Offsets
    FEAT_LSE                                               Enable Armv8.1-A Large System Extension (LSE) atomic instructions
    FEAT_LSE2                                              Enable Armv8.4-A Large System Extension 2 (LSE2) atomicity rules
    FEAT_MPAM                                              Enable Armv8.4-A Memory system Partitioning and Monitoring extension
    FEAT_NV, FEAT_NV2                                      Enable Armv8.4-A Nested Virtualization Enchancement
    FEAT_PAN                                               Enable Armv8.1-A Privileged Access-Never extension
    FEAT_PAN2                                              Enable Armv8.2-A PAN s1e1R and s1e1W Variants
    FEAT_PAuth                                             Enable Armv8.3-A Pointer Authentication extension
    FEAT_PMUv3                                             Enable Armv8.0-A PMUv3 Performance Monitors extension
    FEAT_RAS, FEAT_RASv1p1                                 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
    FEAT_RDM                                               Enable Armv8.1-A Rounding Double Multiply Add/Subtract instructions
    FEAT_RNG                                               Enable Random Number generation instructions
    FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
    FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
    FEAT_SHA1, FEAT_SHA256                                 Enable SHA1 and SHA256 support
    FEAT_SHA3, FEAT_SHA512                                 Enable SHA512 and SHA3 support
    FEAT_SM4, FEAT_SM3                                     Enable SM3 and SM4 support
    FEAT_SPE                                               Enable Statistical Profiling extension
    FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
    FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
    FEAT_TLBIOS, FEAT_TLBIRANGE                            Enable Armv8.4-A TLB Range and Maintenance instructions
    FEAT_TRF                                               Enable Armv8.4-A Trace extension
    FEAT_UAO                                               Enable Armv8.2-A UAO PState
    FEAT_VHE                                               Enable Armv8.1-A Virtual Host extension

> .\build\bin\clang.exe -mcpu=native -### empty.c -c -nostdinc
clang version 22.0.0git (https://github.com/dpaoliello/llvm-project.git fc5d883c7269c0182c99ee0962722cc22c749b88)
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\repos\llvm\build\bin
Build config: +assertions
 (in-process)
 "C:\\repos\\llvm\\build\\bin\\clang.exe" "-cc1" "-triple" "aarch64-pc-windows-msvc19.44.35213" "-emit-obj" "-mincremental-linker-compatible" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "empty.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=reserved" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-enable-tlsdesc" "-target-cpu" "oryon-1" "-target-feature" "-sve-sm4" "-target-feature" "-sve2" "-target-feature" "-sve-sha3" "-target-feature" "-sve" "-target-feature" "+jscvt" "-target-feature" "-f32mm" "-target-feature" "-sve-aes" "-target-feature" "-f64mm" "-target-feature" "+v8.6a" "-target-feature" "+aes" "-target-feature" "+bf16" "-target-feature" "+ccidx" "-target-feature" "+complxnum" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+fp-armv8" "-target-feature" "+i8mm" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+neon" "-target-feature" "+pauth" "-target-feature" "+perfmon" "-target-feature" "+rand" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+sm4" "-target-feature" "+spe" "-target-feature" "+ssbs" "-target-abi" "aapcs" "-fdebug-compilation-dir=C:\\repos\\llvm" "-fcoverage-compilation-dir=C:\\repos\\llvm" "-nostdsysteminc" "-nobuiltininc" "-resource-dir" "C:\\repos\\llvm\\build\\lib\\clang\\22" "-ferror-limit" "19" "-fmessage-length=120" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.44.35213" "-fskip-odr-check-in-gmf" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-target-feature" "-fmv" "-faddrsig" "-o" "empty.o" "-x" "c" "empty.c"

@efriedma-quic
Copy link
Collaborator Author

Is that the same as what you get without the patch?

I messed up the feature string for jscvt; I'll push a fix.

(On a side-note, I just downloaded the newest SDK, and there are a bunch of new feature flags defined; we probably want to look at that at some point... but probably not in this patch.)

@dpaoliello
Copy link
Contributor

Before the patch:

> .\build\bin\clang.exe -mcpu=native -print-enabled-extensions
clang version 22.0.0git (https://github.com/dpaoliello/llvm-project.git d136fbdf8cf626a446cc53345810d3f59b7e433c)
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\repos\llvm\build\bin
Build config: +assertions
Extensions enabled for the given AArch64 target

    Architecture Feature(s)                                Description
    FEAT_AES, FEAT_PMULL                                   Enable AES support
    FEAT_AMUv1                                             Enable Armv8.4-A Activity Monitors extension
    FEAT_AMUv1p1                                           Enable Armv8.6-A Activity Monitors Virtualization support
    FEAT_AdvSIMD                                           Enable Advanced SIMD instructions
    FEAT_BF16                                              Enable BFloat16 Extension
    FEAT_BTI                                               Enable Branch Target Identification
    FEAT_CCIDX                                             Enable Armv8.3-A Extend of the CCSIDR number of sets
    FEAT_CRC32                                             Enable Armv8.0-A CRC-32 checksum instructions
    FEAT_CSV2_2                                            Enable architectural speculation restriction
    FEAT_DIT                                               Enable Armv8.4-A Data Independent Timing instructions
    FEAT_DPB                                               Enable Armv8.2-A data Cache Clean to Point of Persistence
    FEAT_DPB2                                              Enable Armv8.5-A Cache Clean to Point of Deep Persistence
    FEAT_DotProd                                           Enable dot product support
    FEAT_ECV                                               Enable enhanced counter virtualization extension
    FEAT_FCMA                                              Enable Armv8.3-A Floating-point complex number support
    FEAT_FGT                                               Enable fine grained virtualization traps extension
    FEAT_FHM                                               Enable FP16 FML instructions
    FEAT_FP                                                Enable Armv8.0-A Floating Point Extensions
    FEAT_FP16                                              Enable half-precision floating-point data processing
    FEAT_FRINTTS                                           Enable FRInt[32|64][Z|X] instructions that round a floating-point number to an integer (in FP format) forcing it to fit into a 32- or 64-bit int
    FEAT_FlagM                                             Enable Armv8.4-A Flag Manipulation instructions
    FEAT_FlagM2                                            Enable alternative NZCV format for floating point comparisons
    FEAT_I8MM                                              Enable Matrix Multiply Int8 Extension
    FEAT_JSCVT                                             Enable Armv8.3-A JavaScript FP conversion instructions
    FEAT_LOR                                               Enable Armv8.1-A Limited Ordering Regions extension
    FEAT_LRCPC                                             Enable support for RCPC extension
    FEAT_LRCPC2                                            Enable Armv8.4-A RCPC instructions with Immediate Offsets
    FEAT_LSE                                               Enable Armv8.1-A Large System Extension (LSE) atomic instructions
    FEAT_LSE2                                              Enable Armv8.4-A Large System Extension 2 (LSE2) atomicity rules
    FEAT_MPAM                                              Enable Armv8.4-A Memory system Partitioning and Monitoring extension
    FEAT_NV, FEAT_NV2                                      Enable Armv8.4-A Nested Virtualization Enchancement
    FEAT_PAN                                               Enable Armv8.1-A Privileged Access-Never extension
    FEAT_PAN2                                              Enable Armv8.2-A PAN s1e1R and s1e1W Variants
    FEAT_PAuth                                             Enable Armv8.3-A Pointer Authentication extension
    FEAT_PMUv3                                             Enable Armv8.0-A PMUv3 Performance Monitors extension
    FEAT_RAS, FEAT_RASv1p1                                 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
    FEAT_RDM                                               Enable Armv8.1-A Rounding Double Multiply Add/Subtract instructions
    FEAT_RNG                                               Enable Random Number generation instructions
    FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
    FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
    FEAT_SHA1, FEAT_SHA256                                 Enable SHA1 and SHA256 support
    FEAT_SHA3, FEAT_SHA512                                 Enable SHA512 and SHA3 support
    FEAT_SM4, FEAT_SM3                                     Enable SM3 and SM4 support
    FEAT_SPE                                               Enable Statistical Profiling extension
    FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
    FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
    FEAT_TLBIOS, FEAT_TLBIRANGE                            Enable Armv8.4-A TLB Range and Maintenance instructions
    FEAT_TRF                                               Enable Armv8.4-A Trace extension
    FEAT_UAO                                               Enable Armv8.2-A UAO PState
    FEAT_VHE                                               Enable Armv8.1-A Virtual Host extension

> .\build\bin\clang.exe -mcpu=native -### empty.c -c -nostdinc
clang version 22.0.0git (https://github.com/dpaoliello/llvm-project.git d136fbdf8cf626a446cc53345810d3f59b7e433c)
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\repos\llvm\build\bin
Build config: +assertions
 (in-process)
 "C:\\repos\\llvm\\build\\bin\\clang.exe" "-cc1" "-triple" "aarch64-pc-windows-msvc19.44.35213" "-emit-obj" "-mincremental-linker-compatible" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "empty.c" "-mrelocation-model" "pic" "-pic-level" "2" "-mframe-pointer=reserved" "-relaxed-aliasing" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-enable-tlsdesc" "-target-cpu" "oryon-1" "-target-feature" "+v8.6a" "-target-feature" "+aes" "-target-feature" "+bf16" "-target-feature" "+ccidx" "-target-feature" "+complxnum" "-target-feature" "+crc" "-target-feature" "+dotprod" "-target-feature" "+fp-armv8" "-target-feature" "+i8mm" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+neon" "-target-feature" "+pauth" "-target-feature" "+perfmon" "-target-feature" "+rand" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sha2" "-target-feature" "+sha3" "-target-feature" "+sm4" "-target-feature" "+spe" "-target-feature" "+ssbs" "-target-abi" "aapcs" "-fdebug-compilation-dir=C:\\repos\\llvm" "-fcoverage-compilation-dir=C:\\repos\\llvm" "-nostdsysteminc" "-nobuiltininc" "-resource-dir" "C:\\repos\\llvm\\build\\lib\\clang\\22" "-ferror-limit" "19" "-fmessage-length=120" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.44.35213" "-fskip-odr-check-in-gmf" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-target-feature" "-fmv" "-faddrsig" "-o" "empty.o" "-x" "c" "empty.c"

Looks like the patch adds the following target feature args:

-sve-sm4
-sve2
-sve-sha3
-sve
+jscvt
-f32mm
-sve-aes
-f64mm

@StDymphna
Copy link

StDymphna commented Sep 30, 2025

Does not appear to have fixed feature detection on Linux or Android.

clang version 20.1.8 (https://github.com/termux/termux-packages 78664dcf247c8738a5297582a2a36f1884d5099c)
Target: aarch64-unknown-linux-android35
Thread model: posix
InstalledDir: /data/data/com.termux/files/home/llvm_test/bin
Build config: +unoptimized
 (in-process)
 "/data/data/com.termux/files/home/llvm_test/bin/clang-20" "-cc1" "-triple" "aarch64-unknown-linux-android35" "-E" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "-" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-mframe-pointer=non-leaf" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "generic" "-target-feature" "+v9a" "-target-feature" "+bti" "-target-feature" "+ccidx" "-target-feature" "+complxnum" "-target-feature" "+crc" "-target-feature" "+dit" "-target-feature" "+dotprod" "-target-feature" "+flagm" "-target-feature" "+fp-armv8" "-target-feature" "+fullfp16" "-target-feature" "+jsconv" "-target-feature" "+lse" "-target-feature" "+neon" "-target-feature" "+pauth" "-target-feature" "+predres" "-target-feature" "+ras" "-target-feature" "+rcpc" "-target-feature" "+rdm" "-target-feature" "+sb" "-target-feature" "+ssbs" "-target-feature" "+sve" "-target-feature" "+sve2" "-target-feature" "+fix-cortex-a53-835769" "-target-abi" "aapcs" "-debugger-tuning=gdb" "-fdebug-compilation-dir=/data/data/com.termux/files/home/llvm_test" "-fcoverage-compilation-dir=/data/data/com.termux/files/home/llvm_test" "-resource-dir" "/data/data/com.termux/files/home/llvm_test/lib/clang/20" "-isysroot" "/data/data/com.termux/files/home/llvm_test/bin/../.." "-internal-isystem" "/data/data/com.termux/files/home/llvm_test/lib/clang/20/include" "-internal-isystem" "/data/data/com.termux/files/home/llvm_test/bin/../../usr/local/include" "-internal-externc-isystem" "/data/data/com.termux/files/home/llvm_test/bin/../../include" "-internal-externc-isystem" "/data/data/com.termux/files/home/llvm_test/bin/../../usr/include" "-ferror-limit" "19" "-fno-signed-char" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf" "-target-feature" "+outline-atomics" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "/dev/null" "-x" "c" "-"

I'm not sure you're actually doing a feature check? I know cpuinfo does not have sve:

Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 bti

@efriedma-quic
Copy link
Collaborator Author

If the code inside the if statement is reached, it should disable SVE, I think.

There are a few different ways we might not get there, I guess:

  • Can you show the command-line you're using?
  • Can you add breakpoints/debug statements to show whether getProcCpuinfoContent() succeeds, and whether the !Features.contains("sve") check succeeds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants