Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

rj-jesus · 2025-03-10T15:54:06Z

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen agrees with this. However, for <8 x i32> vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are more common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?

…#130263) This reverts commit 21610e3.

llvmbot · 2025-03-10T15:54:38Z

@llvm/pr-subscribers-backend-aarch64

Author: Ricardo Jesus (rj-jesus)

Changes

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may seemingly lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen seems to agree with this. However, for <8 x i32> vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?

Full diff: https://github.com/llvm/llvm-project/pull/130625.diff

4 Files Affected:

(modified) clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c (+3-6)
(modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+14-2)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+11-1)
(added) llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll (+472)

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..d338c22267885 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,24 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
     return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits<int64_t>::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+    MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast<ConstantSDNode>(VScale)) {
+    int64_t ByteOffset = C->getSExtValue();
+    const auto KnownVScale =
+        Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+    if (!KnownVScale || ByteOffset % KnownVScale != 0 ||
+        !MemVT.isScalableVector())
+      return false;
+
+    MulImm = ByteOffset / KnownVScale;
+  } else
     return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyong what's
+  // of 0 means nothing is known about that particular limit beyond what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
     assert(isSVEorStreamingSVEAvailable() &&
@@ -405,6 +405,16 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
     return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+    assert(isSVEorStreamingSVEAvailable() &&
+           "Tried to get SVE vector length without SVE support!");
+    if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
+      return MaxSVEVectorSizeInBits;
+    return 0;
+  }
+
   bool useSVEForFixedLengthVectors() const {
     if (!isSVEorStreamingSVEAvailable())
       return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..84ab5493b03ee
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,472 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mov w8, #256 // =0x100
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+  store <vscale x 16 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    mov x8, #128 // =0x80
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:    st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+  store <vscale x 8 x i16> %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov x8, #64 // =0x40
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+  %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+  store <vscale x 4 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #32 // =0x20
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+  %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+  store <vscale x 2 x i64> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @nxv4i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov w8, #32 // =0x20
+; CHECK-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.s
+; CHECK-128-NEXT:    mov w8, #32 // =0x20
+; CHECK-128-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-128-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    ld1b { z0.s }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1b { z0.s }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    ld1b { z0.s }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1b { z0.s }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s
+; CHECK-1024-NEXT:    ld1b { z0.s }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1b { z0.s }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s
+; CHECK-2048-NEXT:    mov w8, #32 // =0x20
+; CHECK-2048-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <vscale x 4 x i8>, ptr %ldoff, align 1
+  store <vscale x 4 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv2f32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2f32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.d
+; CHECK-128-NEXT:    mov x8, #16 // =0x10
+; CHECK-128-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-128-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2f32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.d
+; CHECK-256-NEXT:    ld1w { z0.d }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1w { z0.d }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2f32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.d
+; CHECK-512-NEXT:    ld1w { z0.d }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1w { z0.d }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2f32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.d
+; CHECK-1024-NEXT:    ld1w { z0.d }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1w { z0.d }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2f32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 64
+  %x = load <vscale x 2 x float>, ptr %ldoff, align 4
+  store <vscale x 2 x float> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv4f64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    add x9, x0, #128
+; CHECK-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-NEXT:    add x9, x1, #128
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4f64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    add x8, x0, #128
+; CHECK-128-NEXT:    ldr z1, [x0, #8, mul vl]
+; CHECK-128-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    add x8, x1, #128
+; CHECK-128-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    str z1, [x1, #8, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4f64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    add x8, x0, #128
+; CHECK-256-NEXT:    ldr z1, [x0, #4, mul vl]
+; CHECK-256-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    add x8, x1, #128
+; CHECK-256-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    str z1, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4f64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    add x8, x0, #128
+; CHECK-512-NEXT:    ldr z1, [x0, #2, mul vl]
+; CHECK-512-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    add x8, x1, #128
+; CHECK-512-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    str z1, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4f64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    add x8, x0, #128
+; CHECK-1024-NEXT:    ldr z1, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    add x8, x1, #128
+; CHECK-1024-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    str z1, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4f64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    add x9, x0, #128
+; CHECK-2048-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    add x9, x1, #128
+; CHECK-2048-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-2048-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-2048-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 128
+  %x = load <vscale x 4 x double>, ptr %ldoff, align 8
+  store <vscale x 4 x double> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @v8i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: v8i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-128-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-128-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-128-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    mov x8, #16 // =0x10
+; CHECK-256-NEXT:    mov x9, #8 // =0x8
+; CHECK-256-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-256-NEXT:    ld1w { z1.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-256-NEXT:    st1w { z1.s }, p0, [x1, x9, lsl #2]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    mov x8, #8 // =0x8
+; CHECK-512-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s, vl16
+; CHECK-1024-NEXT:    mov x8, #8 // =0x8
+; CHECK-1024-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s, vl16
+; CHECK-2048-NEXT:    mov x8, #8 // =0x8
+; CHECK-2048-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <16 x i32>, ptr %ldoff, align 4
+  store <16 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+; FIXME: This is wrong for VLS.
+define void @v8i32_vscale(ptr %0) {
+; CHECK-LABEL: v8i32_vscale:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.4s, #1
+; CHECK-NEXT:    rdvl x8, #2
+; CHECK-NEXT:    add x8, x0, x8
+; CHECK-NEXT:    stp q0, q0, [x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32_vscale:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    movi v0.4s, #1
+; CHECK-128-NEXT:    rdvl x8, #2
+; CHECK-128-NEXT:    add x8, x0, x8
+; CHECK-128-NEXT:    stp q0, q0, [x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32_vscale:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32_vscale:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-512-NEXT:    ptrue p0.s, vl8
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32_vscale:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-1024-NEXT:    ptrue p0.s, vl8
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32_vscale:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-2048-NEXT:    ptrue p0.s, vl8
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %vl = call i64 @llvm.vscale()
+  %vlx = shl i64 %vl, 5
+  %2 = getelementptr inbounds nuw i8, ptr %0, i64 %vlx
+  store <8 x i32> splat (i32 1), ptr %2, align 4
+  ret void
+}

llvmbot · 2025-03-10T15:54:38Z

@llvm/pr-subscribers-clang

Author: Ricardo Jesus (rj-jesus)

Changes

This restores commit f01e760.

The original patch from #129732 exposed what seems to be a bug in SelectAddrModeIndexedSVE.

Currently, the offset returned by SelectAddrModeIndexedSVE is computed by dividing a VL-based offset (MulImm) by the known minimum width of MemVT. This works when MemVT is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, MemVT is not scaled to the SVE vector length, which may seemingly lead to inaccurate results. For example, for vscale * 32, I expect the offset returned to be 2*VL, irrespective of the width of MemVT (unless the latter is an unpacked SVE type). VLA codegen seems to agree with this. However, for <8 x i32> vectors, VLS codegen (which uses SelectAddrModeIndexedSVE) returns 1*VL: https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming it is a problem) obvious because combinations of Constant offsets + fixed vectors are common.

To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not.

I think the long-term solution is to set MemWidthBytes = 16 for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being).

What do you think?

Full diff: https://github.com/llvm/llvm-project/pull/130625.diff

4 Files Affected:

(modified) clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c (+3-6)
(modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+14-2)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+11-1)
(added) llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll (+472)

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..d338c22267885 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,24 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
     return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits<int64_t>::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+    MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast<ConstantSDNode>(VScale)) {
+    int64_t ByteOffset = C->getSExtValue();
+    const auto KnownVScale =
+        Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+    if (!KnownVScale || ByteOffset % KnownVScale != 0 ||
+        !MemVT.isScalableVector())
+      return false;
+
+    MulImm = ByteOffset / KnownVScale;
+  } else
     return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast<int64_t>(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast<ConstantSDNode>(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
     return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyong what's
+  // of 0 means nothing is known about that particular limit beyond what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
     assert(isSVEorStreamingSVEAvailable() &&
@@ -405,6 +405,16 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
     return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+    assert(isSVEorStreamingSVEAvailable() &&
+           "Tried to get SVE vector length without SVE support!");
+    if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
+      return MaxSVEVectorSizeInBits;
+    return 0;
+  }
+
   bool useSVEForFixedLengthVectors() const {
     if (!isSVEorStreamingSVEAvailable())
       return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0000000000000..84ab5493b03ee
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,472 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.b
+; CHECK-NEXT:    mov w8, #256 // =0x100
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load <vscale x 16 x i8>, ptr %ldoff, align 1
+  store <vscale x 16 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.h
+; CHECK-NEXT:    mov x8, #128 // =0x80
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:    st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load <vscale x 8 x i16>, ptr %ldoff, align 2
+  store <vscale x 8 x i16> %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov x8, #64 // =0x40
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i32, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i32, ptr %stptr, i64 64
+  %x = load <vscale x 4 x i32>, ptr %ldoff, align 4
+  store <vscale x 4 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv2i64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #32 // =0x20
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2i64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:    str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2i64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:    str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2i64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:    str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2i64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:    str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2i64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i64, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i64, ptr %stptr, i64 32
+  %x = load <vscale x 2 x i64>, ptr %ldoff, align 8
+  store <vscale x 2 x i64> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @nxv4i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.s
+; CHECK-NEXT:    mov w8, #32 // =0x20
+; CHECK-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4i8:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.s
+; CHECK-128-NEXT:    mov w8, #32 // =0x20
+; CHECK-128-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-128-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4i8:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    ld1b { z0.s }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1b { z0.s }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4i8:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    ld1b { z0.s }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1b { z0.s }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4i8:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s
+; CHECK-1024-NEXT:    ld1b { z0.s }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1b { z0.s }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4i8:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s
+; CHECK-2048-NEXT:    mov w8, #32 // =0x20
+; CHECK-2048-NEXT:    ld1b { z0.s }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:    st1b { z0.s }, p0, [x1, x8]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <vscale x 4 x i8>, ptr %ldoff, align 1
+  store <vscale x 4 x i8> %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv2f32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv2f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv2f32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ptrue p0.d
+; CHECK-128-NEXT:    mov x8, #16 // =0x10
+; CHECK-128-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-128-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv2f32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.d
+; CHECK-256-NEXT:    ld1w { z0.d }, p0/z, [x0, #4, mul vl]
+; CHECK-256-NEXT:    st1w { z0.d }, p0, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv2f32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.d
+; CHECK-512-NEXT:    ld1w { z0.d }, p0/z, [x0, #2, mul vl]
+; CHECK-512-NEXT:    st1w { z0.d }, p0, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv2f32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.d
+; CHECK-1024-NEXT:    ld1w { z0.d }, p0/z, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    st1w { z0.d }, p0, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv2f32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    ld1w { z0.d }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.d }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 64
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 64
+  %x = load <vscale x 2 x float>, ptr %ldoff, align 4
+  store <vscale x 2 x float> %x, ptr %stoff, align 4
+  ret void
+}
+
+define void @nxv4f64(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    mov x8, #16 // =0x10
+; CHECK-NEXT:    add x9, x0, #128
+; CHECK-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-NEXT:    add x9, x1, #128
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: nxv4f64:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    add x8, x0, #128
+; CHECK-128-NEXT:    ldr z1, [x0, #8, mul vl]
+; CHECK-128-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    add x8, x1, #128
+; CHECK-128-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-128-NEXT:    str z1, [x1, #8, mul vl]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: nxv4f64:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    add x8, x0, #128
+; CHECK-256-NEXT:    ldr z1, [x0, #4, mul vl]
+; CHECK-256-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    add x8, x1, #128
+; CHECK-256-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-256-NEXT:    str z1, [x1, #4, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: nxv4f64:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    add x8, x0, #128
+; CHECK-512-NEXT:    ldr z1, [x0, #2, mul vl]
+; CHECK-512-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    add x8, x1, #128
+; CHECK-512-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-512-NEXT:    str z1, [x1, #2, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: nxv4f64:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    add x8, x0, #128
+; CHECK-1024-NEXT:    ldr z1, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ldr z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    add x8, x1, #128
+; CHECK-1024-NEXT:    str z0, [x8, #1, mul vl]
+; CHECK-1024-NEXT:    str z1, [x1, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: nxv4f64:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.d
+; CHECK-2048-NEXT:    mov x8, #16 // =0x10
+; CHECK-2048-NEXT:    add x9, x0, #128
+; CHECK-2048-NEXT:    ldr z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    add x9, x1, #128
+; CHECK-2048-NEXT:    ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
+; CHECK-2048-NEXT:    st1d { z0.d }, p0, [x1, x8, lsl #3]
+; CHECK-2048-NEXT:    str z1, [x9, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 128
+  %x = load <vscale x 4 x double>, ptr %ldoff, align 8
+  store <vscale x 4 x double> %x, ptr %stoff, align 8
+  ret void
+}
+
+define void @v8i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: v8i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    ldp q0, q1, [x0, #64]
+; CHECK-128-NEXT:    ldp q3, q2, [x0, #32]
+; CHECK-128-NEXT:    stp q0, q1, [x1, #64]
+; CHECK-128-NEXT:    stp q3, q2, [x1, #32]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    mov x8, #16 // =0x10
+; CHECK-256-NEXT:    mov x9, #8 // =0x8
+; CHECK-256-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-256-NEXT:    ld1w { z1.s }, p0/z, [x0, x9, lsl #2]
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-256-NEXT:    st1w { z1.s }, p0, [x1, x9, lsl #2]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    ptrue p0.s
+; CHECK-512-NEXT:    mov x8, #8 // =0x8
+; CHECK-512-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    ptrue p0.s, vl16
+; CHECK-1024-NEXT:    mov x8, #8 // =0x8
+; CHECK-1024-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    ptrue p0.s, vl16
+; CHECK-2048-NEXT:    mov x8, #8 // =0x8
+; CHECK-2048-NEXT:    ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-2048-NEXT:    ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 32
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 32
+  %x = load <16 x i32>, ptr %ldoff, align 4
+  store <16 x i32> %x, ptr %stoff, align 4
+  ret void
+}
+
+; FIXME: This is wrong for VLS.
+define void @v8i32_vscale(ptr %0) {
+; CHECK-LABEL: v8i32_vscale:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.4s, #1
+; CHECK-NEXT:    rdvl x8, #2
+; CHECK-NEXT:    add x8, x0, x8
+; CHECK-NEXT:    stp q0, q0, [x8]
+; CHECK-NEXT:    ret
+;
+; CHECK-128-LABEL: v8i32_vscale:
+; CHECK-128:       // %bb.0:
+; CHECK-128-NEXT:    movi v0.4s, #1
+; CHECK-128-NEXT:    rdvl x8, #2
+; CHECK-128-NEXT:    add x8, x0, x8
+; CHECK-128-NEXT:    stp q0, q0, [x8]
+; CHECK-128-NEXT:    ret
+;
+; CHECK-256-LABEL: v8i32_vscale:
+; CHECK-256:       // %bb.0:
+; CHECK-256-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-256-NEXT:    ptrue p0.s
+; CHECK-256-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-256-NEXT:    ret
+;
+; CHECK-512-LABEL: v8i32_vscale:
+; CHECK-512:       // %bb.0:
+; CHECK-512-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-512-NEXT:    ptrue p0.s, vl8
+; CHECK-512-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-512-NEXT:    ret
+;
+; CHECK-1024-LABEL: v8i32_vscale:
+; CHECK-1024:       // %bb.0:
+; CHECK-1024-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-1024-NEXT:    ptrue p0.s, vl8
+; CHECK-1024-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-1024-NEXT:    ret
+;
+; CHECK-2048-LABEL: v8i32_vscale:
+; CHECK-2048:       // %bb.0:
+; CHECK-2048-NEXT:    mov z0.s, #1 // =0x1
+; CHECK-2048-NEXT:    ptrue p0.s, vl8
+; CHECK-2048-NEXT:    st1w { z0.s }, p0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:    ret
+  %vl = call i64 @llvm.vscale()
+  %vlx = shl i64 %vl, 5
+  %2 = getelementptr inbounds nuw i8, ptr %0, i64 %vlx
+  store <8 x i32> splat (i32 1), ptr %2, align 4
+  ret void
+}

paulwalker-arm · 2025-03-12T18:22:57Z

Sorry for the delay and thanks for the investigation @rj-jesus. This is sooooo not intentional behaviour. VLS based auto vectorisation was implemented before the VLS ACLE extensions and by that time it's likely fixed length calls to llvm.vscale() were constant folded away and so never made it to code generation, hence why we've survived this long without being bitten.

I think the problem sits in getMemVTFromNode() which is implemented too literally when considering fixed length vectors. The function should be returning the memory footprint of the operation and for that only the element type of the MemVT matters because that is what governs when extension/truncation must be considered.

I've quickly tested the following, which corrects the behaviour for your test case but I've not investigated what other nodes need to be handled so as not to hit the unreachable.

 static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {
-  if (isa<MemSDNode>(Root))
-    return cast<MemSDNode>(Root)->getMemoryVT();
+  if (isa<MemSDNode>(Root)) {
+    EVT MemVT = cast<MemSDNode>(Root)->getMemoryVT();
+
+    EVT DataVT;
+    if (auto *Load = dyn_cast<LoadSDNode>(Root))
+      DataVT = Load->getValueType(0);
+    else if (auto *Load = dyn_cast<MaskedLoadSDNode>(Root))
+      DataVT = Load->getValueType(0);
+    else if (auto *Store = dyn_cast<StoreSDNode>(Root))
+      DataVT = Store->getValue().getValueType();
+    else if (auto *Store = dyn_cast<MaskedStoreSDNode>(Root))
+      DataVT = Store->getValue().getValueType();
+    else
+      llvm_unreachable("Unexpected MemSDNode!");
+
+    return DataVT.changeVectorElementType(MemVT.getVectorElementType());
+  }

Do you mind running with such an approach for your PR? If not, I'm happy to finish it off and push a PR for yours to build upon.

rj-jesus · 2025-03-12T18:50:50Z

Thank you very much for the explanation, @paulwalker-arm - that makes a lot of sense! I'll try your suggestion tomorrow. I'll let you know how it goes. :)

rj-jesus · 2025-03-13T16:20:37Z

Hi @paulwalker-arm, thanks again for your suggestion. I think the only node missing was MemIntrinsicSDNode, which seemingly was considered after isa<MemSDNode>(Root) in the original code (although I'm not sure it was reachable). I've moved it before the main MemSDNode path to avoid hitting the unreachable.

As far as I could tell, the only MemIntrinsicSDNode nodes that the function handles are for Intrinsic::aarch64_sve_st2, st3 and st4, so getMemoryVT() should be okay to use, I believe. We could also move these intrinsics to the last switch statement and avoid having that dedicated path. I'm not sure what approach is preferable, so I've kept the original code, but please let me know if you'd like me to make that change.

Please let me know if you have any other comments!

rj-jesus added 3 commits March 10, 2025 01:57

Reapply "[AArch64][SVE] Improve fixed-length addressing modes." (llvm…

03471cb

…#130263) This reverts commit 21610e3.

Add tests

114d8cd

Bail out if MemVT is a fixed-length vector

2a1ed4e

rj-jesus requested review from paulwalker-arm and david-arm March 10, 2025 15:54

llvmbot added clang Clang issues not falling into any other category backend:AArch64 labels Mar 10, 2025

Fix getMemVTFromNode for fixed length vectors..

8de8e95

paulwalker-arm approved these changes Mar 14, 2025

View reviewed changes

rj-jesus merged commit 74f5a02 into llvm:main Mar 19, 2025
11 checks passed

rj-jesus deleted the rjj/aarch64-sve-vls-addressing-modes-fix branch March 19, 2025 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

Uh oh!

rj-jesus commented Mar 10, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

paulwalker-arm commented Mar 12, 2025 •

edited

Loading

Uh oh!

rj-jesus commented Mar 12, 2025

Uh oh!

rj-jesus commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" #130625

Uh oh!

Conversation

rj-jesus commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

paulwalker-arm commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rj-jesus commented Mar 12, 2025

Uh oh!

rj-jesus commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

rj-jesus commented Mar 10, 2025 •

edited

Loading

paulwalker-arm commented Mar 12, 2025 •

edited

Loading