Skip to content

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 19, 2025

Conversation

rj-jesus
Copy link
Contributor

@rj-jesus rj-jesus commented Mar 10, 2025

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen agrees with this. However, for <8 x i32> vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are more common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 labels Mar 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 10, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Ricardo Jesus (rj-jesus)

Changes

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may seemingly lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen seems to agree with this. However, for &lt;8 x i32&gt; vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?


Full diff: https://github.com/llvm/llvm-project/pull/130625.diff

4 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c (+3-6)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+14-2)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+11-1)
  • (added) llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll (+472)
diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..d338c22267885 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,24 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
     return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits<int64_t>::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+    MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast<ConstantSDNode>(VScale)) {
+    int64_t ByteOffset = C->getSExtValue();
+    const auto KnownVScale =
+        Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+    if (!KnownVScale || ByteOffset % KnownVScale != 0 ||
+        !MemVT.isScalableVector())
+      return false;
+
+    MulImm = ByteOffset / KnownVScale;
+  } else
     return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyong what's
+  // of 0 means nothing is known about that particular limit beyond what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
     assert(isSVEorStreamingSVEAvailable() &&
@@ -405,6 +405,16 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
     return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+    assert(isSVEorStreamingSVEAvailable() &&
+           "Tried to get SVE vector length without SVE support!");
+    if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
+      return MaxSVEVectorSizeInBits;
+    return 0;
+  }
+
   bool useSVEForFixedLengthVectors() const {
     if (!isSVEorStreamingSVEAvailable())
       return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..84ab5493b03ee
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,472 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mov w8, #256 // =0x100
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+  store <vscale x 16 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    mov x8, #128 // =0x80
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:    st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+  store <vscale x 8 x i16> %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov x8, #64 // =0x40
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+  %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+  store <vscale x 4 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #32 // =0x20
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+  %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+  store <vscale x 2 x i64> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @nxv4i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov w8, #32 // =0x20
+; CHECK-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.s
+; CHECK-128-NEXT:    mov w8, #32 // =0x20
+; CHECK-128-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-128-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    ld1b { z0.s }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1b { z0.s }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    ld1b { z0.s }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1b { z0.s }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s
+; CHECK-1024-NEXT:    ld1b { z0.s }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1b { z0.s }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s
+; CHECK-2048-NEXT:    mov w8, #32 // =0x20
+; CHECK-2048-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <vscale x 4 x i8>, ptr %ldoff, align 1
+  store <vscale x 4 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv2f32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2f32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.d
+; CHECK-128-NEXT:    mov x8, #16 // =0x10
+; CHECK-128-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-128-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2f32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.d
+; CHECK-256-NEXT:    ld1w { z0.d }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1w { z0.d }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2f32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.d
+; CHECK-512-NEXT:    ld1w { z0.d }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1w { z0.d }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2f32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.d
+; CHECK-1024-NEXT:    ld1w { z0.d }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1w { z0.d }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2f32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 64
+  %x = load <vscale x 2 x float>, ptr %ldoff, align 4
+  store <vscale x 2 x float> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv4f64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    add x9, x0, #128
+; CHECK-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-NEXT:    add x9, x1, #128
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4f64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    add x8, x0, #128
+; CHECK-128-NEXT:    ldr z1, [x0, #8, mul vl]
+; CHECK-128-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    add x8, x1, #128
+; CHECK-128-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    str z1, [x1, #8, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4f64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    add x8, x0, #128
+; CHECK-256-NEXT:    ldr z1, [x0, #4, mul vl]
+; CHECK-256-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    add x8, x1, #128
+; CHECK-256-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    str z1, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4f64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    add x8, x0, #128
+; CHECK-512-NEXT:    ldr z1, [x0, #2, mul vl]
+; CHECK-512-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    add x8, x1, #128
+; CHECK-512-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    str z1, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4f64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    add x8, x0, #128
+; CHECK-1024-NEXT:    ldr z1, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    add x8, x1, #128
+; CHECK-1024-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    str z1, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4f64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    add x9, x0, #128
+; CHECK-2048-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    add x9, x1, #128
+; CHECK-2048-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-2048-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-2048-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 128
+  %x = load <vscale x 4 x double>, ptr %ldoff, align 8
+  store <vscale x 4 x double> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @v8i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: v8i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-128-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-128-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-128-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    mov x8, #16 // =0x10
+; CHECK-256-NEXT:    mov x9, #8 // =0x8
+; CHECK-256-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-256-NEXT:    ld1w { z1.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-256-NEXT:    st1w { z1.s }, p0, [x1, x9, lsl #2]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    mov x8, #8 // =0x8
+; CHECK-512-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s, vl16
+; CHECK-1024-NEXT:    mov x8, #8 // =0x8
+; CHECK-1024-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s, vl16
+; CHECK-2048-NEXT:    mov x8, #8 // =0x8
+; CHECK-2048-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <16 x i32>, ptr %ldoff, align 4
+  store <16 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+; FIXME: This is wrong for VLS.
+define void @v8i32_vscale(ptr %0) {
+; CHECK-LABEL: v8i32_vscale:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.4s, #1
+; CHECK-NEXT:    rdvl x8, #2
+; CHECK-NEXT:    add x8, x0, x8
+; CHECK-NEXT:    stp q0, q0, [x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32_vscale:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    movi v0.4s, #1
+; CHECK-128-NEXT:    rdvl x8, #2
+; CHECK-128-NEXT:    add x8, x0, x8
+; CHECK-128-NEXT:    stp q0, q0, [x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32_vscale:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32_vscale:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-512-NEXT:    ptrue p0.s, vl8
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32_vscale:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-1024-NEXT:    ptrue p0.s, vl8
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32_vscale:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-2048-NEXT:    ptrue p0.s, vl8
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %vl = call i64 @llvm.vscale()
+  %vlx = shl i64 %vl, 5
+  %2 = getelementptr inbounds nuw i8, ptr %0, i64 %vlx
+  store <8 x i32> splat (i32 1), ptr %2, align 4
+  ret void
+}

@llvmbot
Copy link
Member

llvmbot commented Mar 10, 2025

@llvm/pr-subscribers-clang

Author: Ricardo Jesus (rj-jesus)

Changes

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may seemingly lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen seems to agree with this. However, for &lt;8 x i32&gt; vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?


Full diff: https://github.com/llvm/llvm-project/pull/130625.diff

4 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c (+3-6)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+14-2)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+11-1)
  • (added) llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll (+472)
diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..d338c22267885 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,24 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
     return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits<int64_t>::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+    MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast<ConstantSDNode>(VScale)) {
+    int64_t ByteOffset = C->getSExtValue();
+    const auto KnownVScale =
+        Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+    if (!KnownVScale || ByteOffset % KnownVScale != 0 ||
+        !MemVT.isScalableVector())
+      return false;
+
+    MulImm = ByteOffset / KnownVScale;
+  } else
     return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyong what's
+  // of 0 means nothing is known about that particular limit beyond what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
     assert(isSVEorStreamingSVEAvailable() &&
@@ -405,6 +405,16 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
     return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+    assert(isSVEorStreamingSVEAvailable() &&
+           "Tried to get SVE vector length without SVE support!");
+    if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
+      return MaxSVEVectorSizeInBits;
+    return 0;
+  }
+
   bool useSVEForFixedLengthVectors() const {
     if (!isSVEorStreamingSVEAvailable())
       return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..84ab5493b03ee
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,472 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mov w8, #256 // =0x100
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+  store <vscale x 16 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    mov x8, #128 // =0x80
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:    st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+  store <vscale x 8 x i16> %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov x8, #64 // =0x40
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+  %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+  store <vscale x 4 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #32 // =0x20
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+  %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+  store <vscale x 2 x i64> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @nxv4i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov w8, #32 // =0x20
+; CHECK-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.s
+; CHECK-128-NEXT:    mov w8, #32 // =0x20
+; CHECK-128-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-128-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    ld1b { z0.s }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1b { z0.s }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    ld1b { z0.s }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1b { z0.s }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s
+; CHECK-1024-NEXT:    ld1b { z0.s }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1b { z0.s }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s
+; CHECK-2048-NEXT:    mov w8, #32 // =0x20
+; CHECK-2048-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <vscale x 4 x i8>, ptr %ldoff, align 1
+  store <vscale x 4 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv2f32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2f32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.d
+; CHECK-128-NEXT:    mov x8, #16 // =0x10
+; CHECK-128-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-128-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2f32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.d
+; CHECK-256-NEXT:    ld1w { z0.d }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1w { z0.d }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2f32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.d
+; CHECK-512-NEXT:    ld1w { z0.d }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1w { z0.d }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2f32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.d
+; CHECK-1024-NEXT:    ld1w { z0.d }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1w { z0.d }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2f32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 64
+  %x = load <vscale x 2 x float>, ptr %ldoff, align 4
+  store <vscale x 2 x float> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv4f64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    add x9, x0, #128
+; CHECK-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-NEXT:    add x9, x1, #128
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4f64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    add x8, x0, #128
+; CHECK-128-NEXT:    ldr z1, [x0, #8, mul vl]
+; CHECK-128-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    add x8, x1, #128
+; CHECK-128-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    str z1, [x1, #8, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4f64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    add x8, x0, #128
+; CHECK-256-NEXT:    ldr z1, [x0, #4, mul vl]
+; CHECK-256-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    add x8, x1, #128
+; CHECK-256-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    str z1, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4f64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    add x8, x0, #128
+; CHECK-512-NEXT:    ldr z1, [x0, #2, mul vl]
+; CHECK-512-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    add x8, x1, #128
+; CHECK-512-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    str z1, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4f64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    add x8, x0, #128
+; CHECK-1024-NEXT:    ldr z1, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    add x8, x1, #128
+; CHECK-1024-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    str z1, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4f64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    add x9, x0, #128
+; CHECK-2048-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    add x9, x1, #128
+; CHECK-2048-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-2048-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-2048-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 128
+  %x = load <vscale x 4 x double>, ptr %ldoff, align 8
+  store <vscale x 4 x double> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @v8i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: v8i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-128-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-128-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-128-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    mov x8, #16 // =0x10
+; CHECK-256-NEXT:    mov x9, #8 // =0x8
+; CHECK-256-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-256-NEXT:    ld1w { z1.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-256-NEXT:    st1w { z1.s }, p0, [x1, x9, lsl #2]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    mov x8, #8 // =0x8
+; CHECK-512-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s, vl16
+; CHECK-1024-NEXT:    mov x8, #8 // =0x8
+; CHECK-1024-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s, vl16
+; CHECK-2048-NEXT:    mov x8, #8 // =0x8
+; CHECK-2048-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <16 x i32>, ptr %ldoff, align 4
+  store <16 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+; FIXME: This is wrong for VLS.
+define void @v8i32_vscale(ptr %0) {
+; CHECK-LABEL: v8i32_vscale:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.4s, #1
+; CHECK-NEXT:    rdvl x8, #2
+; CHECK-NEXT:    add x8, x0, x8
+; CHECK-NEXT:    stp q0, q0, [x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32_vscale:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    movi v0.4s, #1
+; CHECK-128-NEXT:    rdvl x8, #2
+; CHECK-128-NEXT:    add x8, x0, x8
+; CHECK-128-NEXT:    stp q0, q0, [x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32_vscale:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32_vscale:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-512-NEXT:    ptrue p0.s, vl8
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32_vscale:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-1024-NEXT:    ptrue p0.s, vl8
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32_vscale:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-2048-NEXT:    ptrue p0.s, vl8
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %vl = call i64 @llvm.vscale()
+  %vlx = shl i64 %vl, 5
+  %2 = getelementptr inbounds nuw i8, ptr %0, i64 %vlx
+  store <8 x i32> splat (i32 1), ptr %2, align 4
+  ret void
+}

@paulwalker-arm
Copy link
Collaborator

paulwalker-arm commented Mar 12, 2025

Sorry for the delay and thanks for the investigation @rj-jesus. This is sooooo not intentional behaviour. VLS based auto vectorisation was implemented before the VLS ACLE extensions and by that time it's likely fixed length calls to llvm.vscale() were constant folded away and so never made it to code generation, hence why we've survived this long without being bitten.

I think the problem sits in getMemVTFromNode() which is implemented too literally when considering fixed length vectors. The function should be returning the memory footprint of the operation and for that only the element type of the MemVT matters because that is what governs when extension/truncation must be considered.

I've quickly tested the following, which corrects the behaviour for your test case but I've not investigated what other nodes need to be handled so as not to hit the unreachable.

 static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {
-  if (isa<MemSDNode>(Root))
-    return cast<MemSDNode>(Root)->getMemoryVT();
+  if (isa<MemSDNode>(Root)) {
+    EVT MemVT = cast<MemSDNode>(Root)->getMemoryVT();
+
+    EVT DataVT;
+    if (auto *Load = dyn_cast<LoadSDNode>(Root))
+      DataVT = Load->getValueType(0);
+    else if (auto *Load = dyn_cast<MaskedLoadSDNode>(Root))
+      DataVT = Load->getValueType(0);
+    else if (auto *Store = dyn_cast<StoreSDNode>(Root))
+      DataVT = Store->getValue().getValueType();
+    else if (auto *Store = dyn_cast<MaskedStoreSDNode>(Root))
+      DataVT = Store->getValue().getValueType();
+    else
+      llvm_unreachable("Unexpected MemSDNode!");
+
+    return DataVT.changeVectorElementType(MemVT.getVectorElementType());
+  }

Do you mind running with such an approach for your PR? If not, I'm happy to finish it off and push a PR for yours to build upon.

@rj-jesus
Copy link
Contributor Author

Thank you very much for the explanation, @paulwalker-arm - that makes a lot of sense! I'll try your suggestion tomorrow. I'll let you know how it goes. :)

@rj-jesus
Copy link
Contributor Author

Hi @paulwalker-arm, thanks again for your suggestion. I think the only node missing was MemIntrinsicSDNode, which seemingly was considered after isa<MemSDNode>(Root) in the original code (although I'm not sure it was reachable). I've moved it before the main MemSDNode path to avoid hitting the unreachable.

As far as I could tell, the only MemIntrinsicSDNode nodes that the function handles are for Intrinsic::aarch64_sve_st2, st3 and st4, so getMemoryVT() should be okay to use, I believe. We could also move these intrinsics to the last switch statement and avoid having that dedicated path. I'm not sure what approach is preferable, so I've kept the original code, but please let me know if you'd like me to make that change.

Please let me know if you have any other comments!

@rj-jesus rj-jesus merged commit 74f5a02 into llvm:main Mar 19, 2025
11 checks passed
@rj-jesus rj-jesus deleted the rjj/aarch64-sve-vls-addressing-modes-fix branch March 19, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants