[LoopVectorize] Refine runtime memory check costs when there is an outer loop #76034

david-arm · 2023-12-20T10:06:22Z

When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

llvmbot · 2023-12-20T10:06:50Z

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

[LoopVectorize] Refine runtime memory check costs when there is an outer loop

When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

Full diff: https://github.com/llvm/llvm-project/pull/76034.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+33-4)
(added) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll (+193)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f82e161fb846d1..8287090800dd37 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2070,7 +2070,7 @@ class GeneratedRTChecks {
     }
   }
 
-  InstructionCost getCost() {
+  InstructionCost getCost(Loop *OuterLoop) {
     if (SCEVCheckBlock || MemCheckBlock)
       LLVM_DEBUG(dbgs() << "Calculating cost of runtime checks:\n");
 
@@ -2091,16 +2091,45 @@ class GeneratedRTChecks {
         LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
         RTCheckCost += C;
       }
-    if (MemCheckBlock)
+    if (MemCheckBlock) {
+      InstructionCost MemCheckCost = 0;
       for (Instruction &I : *MemCheckBlock) {
         if (MemCheckBlock->getTerminator() == &I)
           continue;
         InstructionCost C =
             TTI->getInstructionCost(&I, TTI::TCK_RecipThroughput);
         LLVM_DEBUG(dbgs() << "  " << C << "  for " << I << "\n");
-        RTCheckCost += C;
+        MemCheckCost += C;
+      }
+
+      // If the runtime memory checks are being created inside an outer loop
+      // we should find out if these checks are outer loop invariant. If so,
+      // the checks will be hoisted out and so the effective cost will reduce
+      // according to the outer loop trip count.
+      if (OuterLoop) {
+        ScalarEvolution *SE = MemCheckExp.getSE();
+        const SCEV *Cond = SE->getSCEV(MemRuntimeCheckCond);
+        if (SE->isLoopInvariant(Cond, OuterLoop)) {
+          if (std::optional<unsigned> OuterTC =
+                  getSmallBestKnownTC(*SE, OuterLoop))
+            MemCheckCost /= *OuterTC;
+          else {
+            // It seems reasonable to assume that we can reduce the effective
+            // cost of the checks even when we know nothing about the trip
+            // count. Here I've assumed that the outer loop executes at least
+            // twice.
+            MemCheckCost /= 2;
+          }
+
+          // Let's ensure the cost is always at least 1.
+          if (MemCheckCost == 0)
+            MemCheckCost = 1;
+        }
       }
 
+      RTCheckCost += MemCheckCost;
+    }
+
     if (SCEVCheckBlock || MemCheckBlock)
       LLVM_DEBUG(dbgs() << "Total cost of runtime checks: " << RTCheckCost
                         << "\n");
@@ -9754,7 +9783,7 @@ static bool areRuntimeChecksProfitable(GeneratedRTChecks &Checks,
                                        std::optional<unsigned> VScale, Loop *L,
                                        ScalarEvolution &SE,
                                        ScalarEpilogueLowering SEL) {
-  InstructionCost CheckCost = Checks.getCost();
+  InstructionCost CheckCost = Checks.getCost(L->getParentLoop());
   if (!CheckCost.isValid())
     return false;
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
new file mode 100644
index 00000000000000..b740b055822991
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
@@ -0,0 +1,193 @@
+; RUN: opt -p loop-vectorize -mattr=+sve -debug-only=loop-vectorize -S -disable-output < %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+
+define void @outer_no_tc(ptr nocapture noundef %a, ptr nocapture noundef readonly %b, i64 noundef %m, i64 noundef %n) vscale_range(1,16) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_no_tc'
+; CHECK:      Calculating cost of runtime checks:
+; CHECK:      Total cost of runtime checks: 3
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:32
+entry:
+  %cmp24 = icmp sgt i64 %m, 0
+  %cmp222 = icmp sgt i64 %n, 0
+  %or.cond = and i1 %cmp24, %cmp222
+  br i1 %or.cond, label %for.cond1.preheader.us, label %for.cond.cleanup
+
+for.cond1.preheader.us:
+  %i.025.us = phi i64 [ %inc12.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %i.025.us, %n
+  br label %for.body4.us
+
+for.body4.us:
+  %j.023.us = phi i64 [ 0, %for.cond1.preheader.us ], [ %inc.us, %for.body4.us ]
+  %add.us = add nuw nsw i64 %j.023.us, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inc.us = add nuw nsw i64 %j.023.us, 1
+  %exitcond.not = icmp eq i64 %inc.us, %n
+  br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
+
+for.cond1.for.cond.cleanup3_crit_edge.us:
+  %inc12.us = add nuw nsw i64 %i.025.us, 1
+  %exitcond27.not = icmp eq i64 %inc12.us, %m
+  br i1 %exitcond27.not, label %for.cond.cleanup, label %for.cond1.preheader.us
+
+for.cond.cleanup:
+  ret void
+}
+
+
+define void @outer_known_tc3(ptr nocapture noundef %a, ptr nocapture noundef readonly %b, i64 noundef %n) vscale_range(1,16) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc3'
+; CHECK:      Calculating cost of runtime checks:
+; CHECK:      Total cost of runtime checks: 2
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:32
+entry:
+  %cmp222 = icmp sgt i64 %n, 0
+  br i1 %cmp222, label %for.cond1.preheader.us, label %for.cond.cleanup
+
+for.cond1.preheader.us:
+  %i.024.us = phi i64 [ %inc12.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %i.024.us, %n
+  br label %for.body4.us
+
+for.body4.us:
+  %j.023.us = phi i64 [ 0, %for.cond1.preheader.us ], [ %inc.us, %for.body4.us ]
+  %add.us = add nuw nsw i64 %j.023.us, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inc.us = add nuw nsw i64 %j.023.us, 1
+  %exitcond.not = icmp eq i64 %inc.us, %n
+  br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
+
+for.cond1.for.cond.cleanup3_crit_edge.us:
+  %inc12.us = add nuw nsw i64 %i.024.us, 1
+  %exitcond26.not = icmp eq i64 %inc12.us, 3
+  br i1 %exitcond26.not, label %for.cond.cleanup, label %for.cond1.preheader.us
+
+for.cond.cleanup:
+  ret void
+}
+
+
+define void @outer_known_tc64(ptr nocapture noundef %a, ptr nocapture noundef readonly %b, i64 noundef %n) vscale_range(1,16) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc64'
+; CHECK:      Calculating cost of runtime checks:
+; CHECK:      Total cost of runtime checks: 1
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:32
+entry:
+  %cmp222 = icmp sgt i64 %n, 0
+  br i1 %cmp222, label %for.cond1.preheader.us, label %for.cond.cleanup
+
+for.cond1.preheader.us:
+  %i.024.us = phi i64 [ %inc12.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %i.024.us, %n
+  br label %for.body4.us
+
+for.body4.us:
+  %j.023.us = phi i64 [ 0, %for.cond1.preheader.us ], [ %inc.us, %for.body4.us ]
+  %add.us = add nuw nsw i64 %j.023.us, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inc.us = add nuw nsw i64 %j.023.us, 1
+  %exitcond.not = icmp eq i64 %inc.us, %n
+  br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
+
+for.cond1.for.cond.cleanup3_crit_edge.us:
+  %inc12.us = add nuw nsw i64 %i.024.us, 1
+  %exitcond26.not = icmp eq i64 %inc12.us, 64
+  br i1 %exitcond26.not, label %for.cond.cleanup, label %for.cond1.preheader.us
+
+for.cond.cleanup:
+  ret void
+}
+
+
+define void @outer_pgo_3(ptr nocapture noundef %a, ptr nocapture noundef readonly %b, i64 noundef %m, i64 noundef %n) vscale_range(1,16) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_pgo_3'
+; CHECK:      Calculating cost of runtime checks:
+; CHECK:      Total cost of runtime checks: 2
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:32
+entry:
+  %cmp222 = icmp sgt i64 %n, 0
+  br i1 %cmp222, label %for.cond1.preheader.us, label %for.cond.cleanup
+
+for.cond1.preheader.us:
+  %i.024.us = phi i64 [ %inc12.us, %for.cond1.for.cond.cleanup3_crit_edge.us ], [ 0, %entry ]
+  %mul.us = mul nsw i64 %i.024.us, %n
+  br label %for.body4.us
+
+for.body4.us:
+  %j.023.us = phi i64 [ 0, %for.cond1.preheader.us ], [ %inc.us, %for.body4.us ]
+  %add.us = add nuw nsw i64 %j.023.us, %mul.us
+  %arrayidx.us = getelementptr inbounds i8, ptr %b, i64 %add.us
+  %0 = load i8, ptr %arrayidx.us, align 1
+  %arrayidx7.us = getelementptr inbounds i8, ptr %a, i64 %add.us
+  %1 = load i8, ptr %arrayidx7.us, align 1
+  %add9.us = add i8 %1, %0
+  store i8 %add9.us, ptr %arrayidx7.us, align 1
+  %inc.us = add nuw nsw i64 %j.023.us, 1
+  %exitcond.not = icmp eq i64 %inc.us, %n
+  br i1 %exitcond.not, label %for.cond1.for.cond.cleanup3_crit_edge.us, label %for.body4.us
+
+for.cond1.for.cond.cleanup3_crit_edge.us:
+  %inc12.us = add nuw nsw i64 %i.024.us, 1
+  %exitcond26.not = icmp eq i64 %inc12.us, %m
+  br i1 %exitcond26.not, label %for.cond.cleanup, label %for.cond1.preheader.us, !prof !0
+
+for.cond.cleanup:
+  ret void
+}
+
+
+define void @outer_known_tc3_full_range_checks(ptr nocapture noundef %dst, ptr nocapture noundef readonly %src, i64 noundef %n) {
+; CHECK-LABEL: LV: Checking a loop in 'outer_known_tc3_full_range_checks'
+; CHECK:      Calculating cost of runtime checks:
+; CHECK:      Total cost of runtime checks: 2
+; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:8
+entry:
+  br label %outer.loop
+
+outer.loop:
+  %outer.iv = phi i64 [ 0, %entry ], [ %outer.iv.next, %inner.exit ]
+  %0 = mul nsw i64 %outer.iv, %n
+  br label %inner.loop
+
+inner.loop:
+  %iv.inner = phi i64 [ 0, %outer.loop ], [ %iv.inner.next, %inner.loop ]
+  %1 = add nuw nsw i64 %iv.inner, %0
+  %arrayidx.us = getelementptr inbounds i32, ptr %src, i64 %1
+  %2 = load i32, ptr %arrayidx.us, align 4
+  %arrayidx8.us = getelementptr inbounds i32, ptr %dst, i64 %1
+  %3 = load i32, ptr %arrayidx8.us, align 4
+  %add9.us = add nsw i32 %3, %2
+  store i32 %add9.us, ptr %arrayidx8.us, align 4
+  %iv.inner.next = add nuw nsw i64 %iv.inner, 1
+  %inner.exit.cond = icmp eq i64 %iv.inner.next, %n
+  br i1 %inner.exit.cond, label %inner.exit, label %inner.loop
+
+inner.exit:
+  %outer.iv.next = add nuw nsw i64 %outer.iv, 1
+  %outer.exit.cond = icmp eq i64 %outer.iv.next, 3
+  br i1 %outer.exit.cond, label %outer.exit, label %outer.loop
+
+outer.exit:
+  ret void
+}
+
+
+!0 = !{!"branch_weights", i32 10, i32 20}

llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll

davemgreen

This seems OK from what I can tell. LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

fhahn · 2023-12-22T11:51:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2070,7 +2070,7 @@ class GeneratedRTChecks {
    }
  }

-  InstructionCost getCost() {
+  InstructionCost getCost(Loop *OuterLoop) {


It might be better to store the inner or outer loop directly in the struct; the inner loop is passed in to Create and there are 2 places that check for an outer loop already (via if (auto *PL = LI->getLoopFor(LoopVectorPreHeader)))

Could you update the places that look up the outer loop to use the new variable as well?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

david-arm · 2024-01-15T08:15:32Z

Gentle ping!

Asher8118

One small comment, but otherwise LGTM! I'll leave someone else more familiar with the code to approve the change.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

…ter loop When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.

…is an outer loop" This reverts commit a152314.

This reverts commit a4caa47.

david-arm · 2024-01-19T12:37:08Z

Trying a rebase to see if that fixes the failing RISCV test that appears unrelated to my change

fhahn

Thanks for the updates!

Trying a rebase to see if that fixes the failing RISCV test that appears unrelated to my change

Alternatively you could use the Update Branch button to merge in the changes from current main, which I think is preferred to rebases, as the GH UI may lose comments after rebases.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

fhahn · 2024-01-19T12:59:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2070,7 +2070,7 @@ class GeneratedRTChecks {
    }
  }

-  InstructionCost getCost() {
+  InstructionCost getCost(Loop *OuterLoop) {


Could you update the places that look up the outer loop to use the new variable as well?

david-arm · 2024-01-19T13:10:26Z

Thanks for the updates!

Trying a rebase to see if that fixes the failing RISCV test that appears unrelated to my change

Alternatively you could use the Update Branch button to merge in the changes from current main, which I think is preferred to rebases, as the GH UI may lose comments after rebases.

Yeah I used to use the "Update Branch" button, but I find it makes life very difficult for reviewers (and authors) and I'd like to stop using it in future. The problem is that in Phabricator we could do stacked patches quite easily, i.e. NFC test patch followed by code change. The purpose of this is to allow people to see the impact of the code changes. In this new world we're supposed to always add a new commit every time we address comments, which completely defeats the whole purpose of splitting the test patch up from the code change. Quite often reviewers ask authors to rewrite tests or add new ones, which means that when you add a new commit you can no longer see the impact of your code change. It feels like we should either:

Stop splitting commits up into NFC test + code change patch, because it's not that useful anymore. Or
Force push the same two commits every time similar to how we did things on Phabricator. That way when we do change the tests it's still possible for the reviewer to see what effect the code change has.

To be honest either approach is unsatisfactory, but I feel option 2) gives the reviewer a more fighting chance of seeing cause and effect!

fhahn · 2024-01-19T13:17:00Z

To be honest either approach is unsatisfactory, but I feel option 2) gives the reviewer a more fighting chance of seeing cause and effect!

IMO including a separate commit for precommitting tests in a feature PR isn't ideal, as the default view in GH shows the diff for all changes compared to the main branch and when merging the PR all commits get squashed into one.

It's now possible to get something close to Phabricator's stacked reviews discussed here https://discourse.llvm.org/t/update-on-github-pull-requests/71540/146.

david-arm · 2024-01-19T13:25:17Z

To be honest either approach is unsatisfactory, but I feel option 2) gives the reviewer a more fighting chance of seeing cause and effect!

IMO including a separate commit for precommitting tests in a feature PR isn't ideal, as the default view in GH shows the diff for all changes compared to the main branch and when merging the PR all commits get squashed into one.

It's now possible to get something close to Phabricator's stacked reviews discussed here https://discourse.llvm.org/t/update-on-github-pull-requests/71540/146.

That's true, but to be honest the idea of creating loads of user branches on main fills me with dread. :) I've not tried it myself so I don't know how much extra overhead is involved. If you have examples where you have done this successfully/efficiently I'd be genuinely curious to see how it's done?

david-arm · 2024-01-19T15:56:03Z

Could you update the places that look up the outer loop to use the new variable as well?

Hi @fhahn, I couldn't find any other instances in GeneratedRTChecks where we call getParentLoop(). I'm not sure where else I could use OuterLoop?

fhahn · 2024-01-23T10:40:21Z

Could you update the places that look up the outer loop to use the new variable as well?

Hi @fhahn, I couldn't find any other instances in GeneratedRTChecks where we call getParentLoop(). I'm not sure where else I could use OuterLoop?

There are 2 places that do something like if (auto *PL = LI->getLoopFor(LoopVectorPreHeader)..., which effectively checks if there's an outer loop

david-arm · 2024-01-23T11:31:13Z

Hi @fhahn, ok thanks. It wasn't obvious to me that those cases were checking for outer loops, so thanks for pointing them out! I've changed them now.

fhahn

LGTM, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

…ter loop When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.

…ter loop (llvm#76034) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant. (cherry picked from commit 962fbaf)

…ter loop (#76034) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant. (cherry picked from commit 962fbaf)

…ter loop (llvm#76034) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit 49b0e6d when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant. (cherry picked from commit 962fbaf)

david-arm requested review from fhahn, davemgreen and Asher8118 December 20, 2023 10:06

llvmbot added vectorizers llvm:transforms labels Dec 20, 2023

david-arm changed the title ~~Low trip memcheck~~ [LoopVectorize] Refine runtime memory check costs when there is an outer loop Dec 20, 2023

Asher8118 reviewed Dec 20, 2023

View reviewed changes

llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll Outdated Show resolved Hide resolved

fhahn reviewed Dec 20, 2023

View reviewed changes

david-arm force-pushed the low_trip_memcheck branch from af1cf43 to a152314 Compare December 20, 2023 13:49

davemgreen reviewed Dec 22, 2023

View reviewed changes

llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll Show resolved Hide resolved

davemgreen approved these changes Dec 22, 2023

View reviewed changes

fhahn reviewed Dec 22, 2023

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Show resolved Hide resolved

Asher8118 reviewed Jan 18, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

david-arm force-pushed the low_trip_memcheck branch from a4bb56b to e4742f6 Compare January 18, 2024 16:04

david-arm added 5 commits January 19, 2024 12:34

Add tests showing runtime checks cost with low trip counts

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

cf4bfa7

Add tests showing runtime checks cost with low trip counts

ea50e94

david-arm force-pushed the low_trip_memcheck branch from e4742f6 to c0aa602 Compare January 19, 2024 12:35

fhahn reviewed Jan 19, 2024

View reviewed changes

david-arm force-pushed the low_trip_memcheck branch from c0aa602 to feaef62 Compare January 19, 2024 15:52

david-arm force-pushed the low_trip_memcheck branch from feaef62 to a5e2f19 Compare January 23, 2024 11:29

fhahn approved these changes Jan 25, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

david-arm force-pushed the low_trip_memcheck branch from a5e2f19 to e1ec7db Compare January 26, 2024 10:15

david-arm merged commit 962fbaf into llvm:main Jan 26, 2024

david-arm mentioned this pull request Jan 29, 2024

[LoopVectorize] Refine runtime memory check costs when there is an outer loop #79800

Closed

david-arm deleted the low_trip_memcheck branch May 1, 2024 11:55

pointhex mentioned this pull request May 7, 2024

getStyleDiagHandler #91314

Closed

aemerson mentioned this pull request May 9, 2024

release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge #91672

Merged

[LoopVectorize] Refine runtime memory check costs when there is an outer loop #76034

[LoopVectorize] Refine runtime memory check costs when there is an outer loop #76034

Conversation

david-arm commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn Dec 22, 2023

Choose a reason for hiding this comment

Uh oh!

fhahn Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

david-arm commented Jan 15, 2024

Uh oh!

Asher8118 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-arm commented Jan 19, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fhahn Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

david-arm commented Jan 19, 2024

Uh oh!

fhahn commented Jan 19, 2024

Uh oh!

david-arm commented Jan 19, 2024

Uh oh!

david-arm commented Jan 19, 2024

Uh oh!

fhahn commented Jan 23, 2024

Uh oh!

david-arm commented Jan 23, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david-arm commented Dec 20, 2023 •

edited

Loading