Skip to content

[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. #108945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 23, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 17, 2024

In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning.

Fixes #106523.

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

In some cases, Previous (and its operands) can be hoisted. This allows supporting additional cases where sinking of all users of to FOR fails, e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a first-order recurrence, but can create a vector VPlan, because the trunc instruction of an IV which generates the previous value of the recurrence has been optimized to a truncated induction recipe, thus hoisting it to the beginning.

Fixes #106523.


Full diff: https://github.com/llvm/llvm-project/pull/108945.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+68-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+7-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+65)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+47-2)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+38-6)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 1d84550010017f..6c0923d3d43de2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -771,6 +771,72 @@ sinkRecurrenceUsersAfterPrevious(VPFirstOrderRecurrencePHIRecipe *FOR,
   return true;
 }
 
+/// Try to hoist \p Previous and its operands to the beginning of the vector
+/// header.
+static bool hoistPreviousBeforeFORUsers(VPFirstOrderRecurrencePHIRecipe *FOR,
+                                        VPRecipeBase *Previous,
+                                        VPDominatorTree &VPDT) {
+  using namespace llvm::VPlanPatternMatch;
+  if (Previous->mayHaveSideEffects() || Previous->mayReadFromMemory())
+    return false;
+  // Collect recipes that need sinking.
+  SmallVector<VPRecipeBase *> WorkList;
+  SmallPtrSet<VPRecipeBase *, 8> Seen;
+  auto TryToPushHoistCandidate = [&](VPRecipeBase *HoistCandidate) {
+    // If we reach FOR, it means the original Previous depends on some other
+    // recurrence that in turn depends on FOR. If that is the case, we would
+    // also need to hoist recipes involving the other FOR, which may break
+    // dependencies.
+    if (HoistCandidate == FOR)
+      return false;
+
+    // Hoist candidate outside any region, no need to hoist.
+    if (!HoistCandidate->getParent()->getParent())
+      return true;
+
+    // Hoist candidate is a header phi or already visited, no need to hoist.
+    if (isa<VPHeaderPHIRecipe>(HoistCandidate) ||
+        !Seen.insert(HoistCandidate).second)
+      return true;
+
+    // Don't move candiates with sideeffects, as we do not yet analyze recipes
+    // between candidate and hoist destination yet.
+    if (HoistCandidate->mayHaveSideEffects())
+      return false;
+
+    WorkList.push_back(HoistCandidate);
+    return true;
+  };
+
+  // Recursively sink users of FOR after Previous.
+  if (!TryToPushHoistCandidate(Previous))
+    return false;
+  for (unsigned I = 0; I != WorkList.size(); ++I) {
+    VPRecipeBase *Current = WorkList[I];
+    assert(Current->getNumDefinedValues() == 1 &&
+           "only recipes with a single defined value expected");
+
+    for (VPValue *Op : Current->operands())
+      if (auto *R = Op->getDefiningRecipe())
+        if (!TryToPushHoistCandidate(R))
+          return false;
+  }
+
+  // Keep recipes to hoist ordered by dominance so earlier instructions are
+  // processed first.
+  sort(WorkList, [&VPDT](const VPRecipeBase *A, const VPRecipeBase *B) {
+    return VPDT.properlyDominates(A, B);
+  });
+
+  auto HoistPoint = FOR->getParent()->getFirstNonPhi();
+  for (VPRecipeBase *HoistCandidate : WorkList) {
+    if (HoistPoint == HoistCandidate->getIterator())
+      continue;
+    HoistCandidate->moveBefore(*FOR->getParent(), HoistPoint);
+  }
+  return true;
+}
+
 bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
                                                   VPBuilder &LoopBuilder) {
   VPDominatorTree VPDT;
@@ -795,7 +861,8 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
     }
 
     if (!sinkRecurrenceUsersAfterPrevious(FOR, Previous, VPDT))
-      return false;
+      if (!hoistPreviousBeforeFORUsers(FOR, Previous, VPDT))
+        return false;
 
     // Introduce a recipe to combine the incoming and previous values of a
     // fixed-order recurrence.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 9d852a27a8ef62..9940fb371112f9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -36,11 +36,13 @@ struct VPlanTransforms {
                                 GetIntOrFpInductionDescriptor,
                             ScalarEvolution &SE, const TargetLibraryInfo &TLI);
 
-  /// Sink users of fixed-order recurrences after the recipe defining their
-  /// previous value. Then introduce FirstOrderRecurrenceSplice VPInstructions
-  /// to combine the value from the recurrence phis and previous values. The
-  /// current implementation assumes all users can be sunk after the previous
-  /// value, which is enforced by earlier legality checks.
+  /// Try to move users of fixed-order recurrences after the recipe defining
+  /// their previous value, either by sinking them or hoisting the recipe
+  /// defining their previous value (and its operands). Then introduce
+  /// FirstOrderRecurrenceSplice VPInstructions to combine the value from the
+  /// recurrence phis and previous values. The current implementation assumes
+  /// all users can be sunk after the previous value, which is enforced by
+  /// earlier legality checks.
   /// \returns true if all users of fixed-order recurrences could be re-arranged
   /// as needed or false if it is not possible. In the latter case, \p Plan is
   /// not valid.
diff --git a/llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll b/llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll
index a51ef66951f0e8..b1c6cafcd3adca 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll
@@ -285,3 +285,68 @@ exit:
   store double %.lcssa, ptr %C
   ret i64 %.in.lcssa
 }
+
+; Test for https://github.com/llvm/llvm-project/issues/106523.
+define void @for_iv_trunc_optimized(ptr %dst) {
+; CHECK-LABEL: @for_iv_trunc_optimized(
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 1>, [[VECTOR_PH]] ], [ [[STEP_ADD:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR1:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[TMP3:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 1, i32 2, i32 3, i32 4>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
+; CHECK-NEXT:    [[TMP0:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i32> [[VEC_IND]], <4 x i32> [[STEP_ADD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP2:%.*]] = or <4 x i32> [[TMP0]], zeroinitializer
+; CHECK-NEXT:    [[TMP3]] = or <4 x i32> [[TMP1]], zeroinitializer
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR1]], <4 x i32> [[TMP2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i32> [[TMP5]], i32 3
+; CHECK-NEXT:    store i32 [[TMP6]], ptr [[DST:%.*]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <4 x i32> [[STEP_ADD]], <i32 4, i32 4, i32 4, i32 4>
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 336
+; CHECK-NEXT:    br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[STEP_ADD]], i32 3
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT3:%.*]] = extractelement <4 x i32> [[TMP3]], i32 3
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 337, [[MIDDLE_BLOCK]] ], [ 1, [[BB:%.*]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 1, [[BB]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT4:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT3]], [[MIDDLE_BLOCK]] ], [ 0, [[BB]] ]
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[ADD:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[FOR_1:%.*]] = phi i32 [ [[TRUNC:%.*]], [[LOOP]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[FOR_2:%.*]] = phi i32 [ [[OR:%.*]], [[LOOP]] ], [ [[SCALAR_RECUR_INIT4]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[OR]] = or i32 [[FOR_1]], 0
+; CHECK-NEXT:    [[ADD]] = add i64 [[IV]], 1
+; CHECK-NEXT:    store i32 [[FOR_2]], ptr [[DST]], align 4
+; CHECK-NEXT:    [[ICMP:%.*]] = icmp ult i64 [[IV]], 337
+; CHECK-NEXT:    [[TRUNC]] = trunc i64 [[IV]] to i32
+; CHECK-NEXT:    br i1 [[ICMP]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+bb:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ %add, %loop ], [ 1, %bb ]
+  %for.1 = phi i32 [ %trunc, %loop ], [ 1, %bb ]
+  %for.2 = phi i32 [ %or, %loop ], [ 0, %bb ]
+  %or = or i32 %for.1, 0
+  %add = add i64 %iv, 1
+  store i32 %for.2, ptr %dst, align 4
+  %icmp = icmp ult i64 %iv, 337
+  %trunc = trunc i64 %iv to i32
+  br i1 %icmp, label %loop, label %exit
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll
index 6522ed2b52b4fb..6d8b7a2027947e 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll
@@ -154,8 +154,53 @@ exit:
 ; FOR (for.y) should be moved which is not currently supported.
 define i32 @test_chained_first_order_recurrences_4(ptr %base) {
 ; CHECK-LABEL: 'test_chained_first_order_recurrences_4'
-; CHECK: No VPlans built.
-
+; CHECK:      VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<4098> = original trip-count
+; CHECK-EMPTY:
+; CHECK-NEXT: vector.ph:
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[CAN_IV_NEXT:%.+]]>
+; CHECK-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for.x> = phi ir<0>, ir<%for.x.next>
+; CHECK-NEXT:     FIRST-ORDER-RECURRENCE-PHI ir<%for.y> = phi ir<0>, ir<%for.x.prev>
+; CHECK-NEXT:     vp<[[SCALAR_STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<1>
+; CHECK-NEXT:     WIDEN ir<%for.x.next> = mul ir<0>, ir<0>
+; CHECK-NEXT:     EMIT vp<[[SPLICE_X:%.]]> = first-order splice ir<%for.x>, ir<%for.x.next>
+; CHECK-NEXT:     WIDEN-CAST ir<%for.x.prev> = trunc  vp<[[SPLICE_X]]> to i32
+; CHECK-NEXT:     EMIT vp<[[SPLICE_Y:%.+]]> = first-order splice ir<%for.y>, ir<%for.x.prev>
+; CHECK-NEXT:     CLONE ir<%gep> = getelementptr ir<%base>, vp<[[SCALAR_STEPS]]>
+; CHECK-NEXT:     WIDEN-CAST ir<%for.y.i64> = sext  vp<[[SPLICE_Y]]> to i64
+; CHECK-NEXT:     vp<[[VEC_PTR:%.+]]> = vector-pointer ir<%gep>
+; CHECK-NEXT:     WIDEN store vp<[[VEC_PTR]]>, ir<%for.y.i64>
+; CHECK-NEXT:     EMIT vp<[[CAN_IV_NEXT]]> = add nuw vp<[[CAN_IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[CAN_IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): middle.block
+; CHECK-EMPTY:
+; CHECK-NEXT: middle.block:
+; CHECK-NEXT:   EMIT vp<[[EXT_X:%.+]]> = extract-from-end ir<%for.x.next>, ir<1>
+; CHECK-NEXT:   EMIT vp<[[EXT_Y:%.+]]> = extract-from-end ir<%for.x.prev>, ir<1>
+; CHECK-NEXT:   EMIT vp<[[MIDDLE_C:%.+]]> = icmp eq ir<4098>, vp<[[VTC]]>
+; CHECK-NEXT:   EMIT branch-on-cond vp<[[MIDDLE_C]]>
+; CHECK-NEXT: Successor(s): ir-bb<ret>, scalar.ph
+; CHECK-EMPTY:
+; CHECK-NEXT: ir-bb<ret>:
+; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: scalar.ph:
+; CHECK-NEXT:   EMIT vp<[[RESUME_X:%.+]]> = resume-phi vp<[[EXT_X]]>, ir<0>
+; CHECK-NEXT:   EMIT vp<[[RESUME_Y:%.+]]> = resume-phi vp<[[EXT_Y]]>, ir<0>
+; CHECK-NEXT: No successors
+; CHECK-EMPTY:
+; CHECK-NEXT: Live-out i64 %for.x = vp<[[RESUME_X]]>
+; CHECK-NEXT: Live-out i32 %for.y = vp<[[RESUME_Y]]>
+; CHECK-NEXT: }
+;
 entry:
   br label %loop
 
diff --git a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll
index dbe373b46cce21..77a566424e89b9 100644
--- a/llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll
+++ b/llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll
@@ -385,19 +385,51 @@ exit:
 define void @hoist_previous_value_and_operands(ptr %dst, i64 %mask) {
 ; CHECK-LABEL: @hoist_previous_value_and_operands(
 ; CHECK-NEXT:  bb:
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[MASK:%.*]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <4 x i64> [ <i64 1, i64 2, i64 3, i64 4>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 1>, [[VECTOR_PH]] ], [ [[TMP2:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VECTOR_RECUR1:%.*]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[TMP4:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = and <4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
+; CHECK-NEXT:    [[TMP2]] = trunc <4 x i64> [[TMP1]] to <4 x i32>
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[TMP2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP4]] = or <4 x i32> [[TMP3]], zeroinitializer
+; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR1]], <4 x i32> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP0]]
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
+; CHECK-NEXT:    store <4 x i32> [[TMP5]], ptr [[TMP7]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
+; CHECK-NEXT:    [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 336
+; CHECK-NEXT:    br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT2:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
+; CHECK-NEXT:    br i1 false, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 337, [[MIDDLE_BLOCK]] ], [ 1, [[BB:%.*]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 1, [[BB]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR_INIT3:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT2]], [[MIDDLE_BLOCK]] ], [ 0, [[BB]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
-; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[ADD:%.*]], [[LOOP]] ], [ 1, [[BB:%.*]] ]
-; CHECK-NEXT:    [[FOR_1:%.*]] = phi i32 [ [[TRUNC:%.*]], [[LOOP]] ], [ 1, [[BB]] ]
-; CHECK-NEXT:    [[FOR_2:%.*]] = phi i32 [ [[OR:%.*]], [[LOOP]] ], [ 0, [[BB]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[ADD:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[FOR_1:%.*]] = phi i32 [ [[TRUNC:%.*]], [[LOOP]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[FOR_2:%.*]] = phi i32 [ [[OR:%.*]], [[LOOP]] ], [ [[SCALAR_RECUR_INIT3]], [[SCALAR_PH]] ]
 ; CHECK-NEXT:    [[OR]] = or i32 [[FOR_1]], 0
 ; CHECK-NEXT:    [[ADD]] = add i64 [[IV]], 1
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[IV]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds i32, ptr [[DST]], i64 [[IV]]
 ; CHECK-NEXT:    store i32 [[FOR_2]], ptr [[GEP]], align 4
 ; CHECK-NEXT:    [[ICMP:%.*]] = icmp ult i64 [[IV]], 337
-; CHECK-NEXT:    [[A:%.*]] = and i64 [[IV]], [[MASK:%.*]]
+; CHECK-NEXT:    [[A:%.*]] = and i64 [[IV]], [[MASK]]
 ; CHECK-NEXT:    [[TRUNC]] = trunc i64 [[A]] to i32
-; CHECK-NEXT:    br i1 [[ICMP]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK-NEXT:    br i1 [[ICMP]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP7:![0-9]+]]
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
 ;

@fhahn
Copy link
Contributor Author

fhahn commented Sep 25, 2024

ping :)

fhahn added a commit that referenced this pull request Oct 6, 2024
@fhahn
Copy link
Contributor Author

fhahn commented Oct 7, 2024

ping :)

Comment on lines 809 to 822
// Hoist candidate outside any region, no need to hoist.
if (!HoistCandidate->getParent()->getParent())
return true;

// Hoist candidate is a header phi or already visited, no need to hoist.
if (isa<VPHeaderPHIRecipe>(HoistCandidate) ||
!Seen.insert(HoistCandidate).second)
return true;

// Don't move candiates with sideeffects, as we do not yet analyze recipes
// between candidate and hoist destination yet.
if (HoistCandidate->mayHaveSideEffects())
return false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these checks be preformed earlier, before the if in line 797?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moved the cheaper checks above, thanks!

if (HoistCandidate->mayHaveSideEffects(

must stay at the current position, as we only need to check for instructions that need hoisting, the earlier exists are for cases not needing hoisting.

if (!TryToPushHoistCandidate(Previous))
return false;

for (unsigned I = 0; I != WorkList.size(); ++I) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (unsigned I : seq<unsigned>(WorkList.size())) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkList gets expanded in the loop, so I left it as is for now.

// Keep recipes to hoist ordered by dominance so earlier instructions are
// processed first.
sort(WorkList, [&VPDT](const VPRecipeBase *A, const VPRecipeBase *B) {
return VPDT.properlyDominates(A, B);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it provide strict weak ordering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should, as the CFG in the loop is flattened.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the CFG in the loop is assumed to be flattened,

  • verify CFG in the loop is flattened, expansion of replicate regions must be applied later?
  • fix hoist point to right before first FOR user?
  • check dominance of first FOR user only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify CFG in the loop is flattened, expansion of replicate regions must be applied later?
adde assert, thanks
fix hoist point to right before first FOR user?
check dominance of first FOR user only?
both done, thanks!

In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the recurrence has
been optimized to a truncated induction recipe, thus hoisting it to the
beginning.

Fixes llvm#106523.
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice extension!

Would be good to also check recurrences of order greater than 1 - finding first non-phi 'previous' should work for hoisting as it does for sinking.

Curious if there are cases where sinking all users or hoisting previous along with all relevant operands fail, but some code motion in both directions succeeds, possibly with multiple FORs. A general approach would be to add dependences between users and previous of each FOR (possibly also with splices introduced), and then topologically sort the data-dependence graph having all dependences - included memory based ones. Handling each FOR separately by either only sinking its users or only hoisting its previous might be too restricted.

@@ -795,7 +881,8 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
}

if (!sinkRecurrenceUsersAfterPrevious(FOR, Previous, VPDT))
return false;
if (!hoistPreviousBeforeFORUsers(FOR, Previous, VPDT))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!hoistPreviousBeforeFORUsers(FOR, Previous, VPDT))
if (!sinkRecurrenceUsersAfterPrevious(FOR, Previous, VPDT) &&
!hoistPreviousBeforeFORUsers(FOR, Previous, VPDT))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

return false;

// Collect recipes that need hoisting.
SmallVector<VPRecipeBase *> WorkList;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<VPRecipeBase *> WorkList;
SmallVector<VPRecipeBase *> HoistCandidates;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

SmallVector<VPRecipeBase *> WorkList;
SmallPtrSet<VPRecipeBase *, 8> Seen;
VPBasicBlock *HoistBlock = FOR->getParent();
auto HoistPoint = HoistBlock->getFirstNonPhi();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler to set HoistPoint to FOR, given the flat loop? See more below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to initialize to nullptr and then look for the user dominating all others. FOR as initializer isn't suitable I think, as it already dominates all users and is in the phi section of the block, so we can't hoist recipes before it

Comment on lines 788 to 791
// If we reach FOR, it means the original Previous depends on some other
// recurrence that in turn depends on FOR. If that is the case, we would
// also need to hoist recipes involving the other FOR, which may break
// dependencies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Previous depends on FOR (directly or indirectly), they form a dependence cycle, which should have been classified as an induction or reduction rather than FOR? I.e., assert this does not occur. (Also holds for sinking.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can occur at the moment I think.

There can be cases where 'previous' of one FOR uses another FOR (e.g. as in https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll#L32). They get classified as FORs but not vectorized because we cannot rearrange the users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but here "If we reach FOR" refers to the header phi of the original recurrence, rather than 'previous' of one FOR using header phi of another FOR. Posted inline as this appears outdated.

if (HoistCandidate == FOR)
return false;

// Hoist candidate outside any region, no need to hoist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Hoist candidate outside any region, no need to hoist.
// Candidate is outside loop region, dominates FOR users w/o hoisting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[VEC_IND]], <4 x i32> [[STEP_ADD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i32> [[TMP0]], zeroinitializer
; CHECK-NEXT: [[TMP3]] = or <4 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first splice of %for.2 combines the last lane of VECTOR_RECUR1 with first 3 lanes of TMP2, but being dead is eliminated.
TMP5 is the second splice for %for.2, combining last lane of TMP2 with first 3 lanes of TMP3, the last of which feeds the store to invariant address.

; FOR (for.y) should be moved which is not currently supported.
; iteration (for.x.prev) of one FOR (for.y) depends on another FOR (for.x).
; Sinking would require moving a recipe with side effects (store). Instead,
; for.x.prev can be hoisted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Instead, for.x.prev can be hoisted" - is it important to hoist for.x.prev? (swapping places with the gep)

for.x requires code motion, as its first user for.x.prev precedes its previous for.x.next.

for.y does not require code motion, as its first user for.y.i64 succeeds its previous for.x.prev.

To handle for.x, we avoid sinking its user for.x.prev, because of its role as previous for for.y. Instead, we hoist its previous for.x.next above its user (and all the way out of the loop).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, should be for.x.next. updated, thanks!

; CHECK-NEXT: CLONE ir<%gep> = getelementptr ir<%base>, vp<[[SCALAR_STEPS]]>
; CHECK-NEXT: vp<[[VEC_PTR:%.+]]> = vector-pointer ir<%gep>
; CHECK-NEXT: WIDEN ir<%l> = load vp<[[VEC_PTR]]>
; CHECK-NEXT: WIDEN ir<%for.x.next> = mul ir<%l>, ir<2>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar case to the one above, except the end - we hoist its previous for.x.next above its user for.x.prev (until reaching a dependence on %l, rather than all the way out of the loop).

; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 337, [[MIDDLE_BLOCK]] ], [ 1, [[BB:%.*]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 1, [[BB]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT3:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT2]], [[MIDDLE_BLOCK]] ], [ 0, [[BB]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should have probably been check not vectorized.

; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = and <4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[TMP2]] = trunc <4 x i64> [[TMP1]] to <4 x i32>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar case to PR106523 above, except here the loop is vectorized w/o being unrolled, the previous truncation gets vectorized (rather than folded into an IV) and hoisted along with its AND operand above the user 'or', the store is consecutive (rather than to an invariant address). Name should use plural hoist_previous_value_and_operands or singular hoist_previous_value_and_operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a target-independent version. Added a comment, thanks!

And dropped the s from operands

@fhahn
Copy link
Contributor Author

fhahn commented Oct 22, 2024

(forgot to respond to the comments here when responding to the inline comments/updating the patch)

Nice extension!

Would be good to also check recurrences of order greater than 1 - finding first non-phi 'previous' should work for hoisting as it does for sinking.

I think that should already be done, the loop to look for the first non-phi previous is just above the call to sinkRecurrenceUsersAfterPrevious/hoistPreviousBeforeFORUsers.

Curious if there are cases where sinking all users or hoisting previous along with all relevant operands fail, but some code motion in both directions succeeds, possibly with multiple FORs. A general approach would be to add dependences between users and previous of each FOR (possibly also with splices introduced), and then topologically sort the data-dependence graph having all dependences - included memory based ones. Handling each FOR separately by either only sinking its users or only hoisting its previous might be too restricted.

There probably are such cases, which would have the potential for another follow-up?

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, adding several minor comments and suggestions.

if (!HoistPoint || VPDT.properlyDominates(R, HoistPoint))
HoistPoint = R;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may be worth asserting HoistPoint (is non-null and) dominates all FOR->users().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

if (!R)
continue;
if (!HoistPoint || VPDT.properlyDominates(R, HoistPoint))
HoistPoint = R;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else can assert HoistPoint dominates R, if not assert so collectively below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert after the loop, thanks!

Comment on lines 802 to 804
assert((!HoistCandidate->getParent()->getParent() ||
HoistCandidate->getParent()->getParent() ==
HoistCandidate->getParent()->getEnclosingLoopRegion()) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert((!HoistCandidate->getParent()->getParent() ||
HoistCandidate->getParent()->getParent() ==
HoistCandidate->getParent()->getEnclosingLoopRegion()) &&
auto EnclosingLoopRegion = HoistCandidate->getParent()->getEnclosingLoopRegion();
assert((!HoistCandidate->getParent()->getParent() ||
HoistCandidate->getParent()->getParent() == EnclosingLoopRegion) &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

assert((!HoistCandidate->getParent()->getParent() ||
HoistCandidate->getParent()->getParent() ==
HoistCandidate->getParent()->getEnclosingLoopRegion()) &&
"CFG in VPlan should still be flattened, without replicate regions");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"CFG in VPlan should still be flattened, without replicate regions");
"CFG in VPlan should still be flat, without replicate regions");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

HoistCandidate->getParent()->getParent() ==
HoistCandidate->getParent()->getEnclosingLoopRegion()) &&
"CFG in VPlan should still be flattened, without replicate regions");
// Hoist candidate has already beend visited, no need to hoist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Hoist candidate has already beend visited, no need to hoist.
// Hoist candidate was already visited, no need to hoist.

better rename Seen to Visited?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!


for (VPValue *Op : Current->operands()) {
// If we reach FOR, it means the original Previous depends on some other
// recurrence that in turn depends on FOR. If that is the case, we would
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(continuing outdated discussion): having the Previous of one FOR depend on the header phi of another FOR is clear (including the case where Previous is simply an or-with-zero copy leading to a second order recurrence), implying that FORs better be handled in some order if not (topologically sorting) altogether. But this check circles back to the original FOR phi, i.e., a cyclic dependence rather than a FOR one.

Comment on lines 843 to 844
// Keep recipes to hoist ordered by dominance so earlier instructions are
// processed first.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. But then "Keep recipes to hoist ordered by ..." should read "Order recipes to hoist by ..."

static bool hoistPreviousBeforeFORUsers(VPFirstOrderRecurrencePHIRecipe *FOR,
VPRecipeBase *Previous,
VPDominatorTree &VPDT) {
using namespace llvm::VPlanPatternMatch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this namespace (still) needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

Comment on lines 40 to 41
/// defining their previous value, by either sinking or hoisting the recipe
/// defining their previous value (and its operands). Then introduce
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// defining their previous value, by either sinking or hoisting the recipe
/// defining their previous value (and its operands). Then introduce
/// defining their previous value, by either sinking users or hoisting
/// recipes defining their previous value. Then introduce

(sinking users may require additional sinkings, and hoisting previous may require additional hoistings.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

/// FirstOrderRecurrenceSplice VPInstructions to combine the value from the
/// recurrence phis and previous values. The current implementation assumes
/// all users can be sunk after the previous value, or the previous value can
/// be hoisted before all users, which is enforced by earlier legality checks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do earlier legality checks enforce both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not any more, removed the stale comment, thanks!

@ayalz
Copy link
Collaborator

ayalz commented Oct 22, 2024

(forgot to respond to the comments here when responding to the inline comments/updating the patch)

Nice extension!
Would be good to also check recurrences of order greater than 1 - finding first non-phi 'previous' should work for hoisting as it does for sinking.

I think that should already be done, the loop to look for the first non-phi previous is just above the call to sinkRecurrenceUsersAfterPrevious/hoistPreviousBeforeFORUsers.

Agreed, hence "should work", suggestion is to "also check", i.e., add tests if needed.

Curious if there are cases where sinking all users or hoisting previous along with all relevant operands fail, but some code motion in both directions succeeds, possibly with multiple FORs. A general approach would be to add dependences between users and previous of each FOR (possibly also with splices introduced), and then topologically sort the data-dependence graph having all dependences - included memory based ones. Handling each FOR separately by either only sinking its users or only hoisting its previous might be too restricted.

There probably are such cases, which would have the potential for another follow-up?

Sure, by all means, potentially starting with tests demonstrating such cases.

@fhahn fhahn merged commit 2dfb1c6 into llvm:main Oct 23, 2024
6 of 8 checks passed
@fhahn fhahn deleted the vplan-for-hoist branch October 23, 2024 20:12
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 23, 2024

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla running on linaro-g3-03 while building llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/3472

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'AddressSanitizer-aarch64-linux-dynamic :: TestCases/Posix/halt_on_error-signals.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 3: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/./bin/clang  -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   -shared-libasan -fsanitize-recover=address -pthread /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/./bin/clang -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -shared-libasan -fsanitize-recover=address -pthread /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp
In file included from /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c:11:
In file included from /usr/include/stdio.h:27:
In file included from /usr/include/aarch64-linux-gnu/bits/libc-header-start.h:33:
/usr/include/features.h:194:3: warning: "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-W#warnings]
  194 | # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
      |   ^
1 warning generated.
RUN: at line 5: env ASAN_OPTIONS=halt_on_error=false:suppress_equal_pcs=false  /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp 100 >/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp.log 2>&1 || true
+ env ASAN_OPTIONS=halt_on_error=false:suppress_equal_pcs=false /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp 100
+ true
RUN: at line 7: FileCheck --check-prefix=CHECK-COLLISION /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c </home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp.log || FileCheck --check-prefix=CHECK-NO-COLLISION /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c </home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/runtimes/runtimes-bins/compiler-rt/test/asan/AARCH64LinuxDynamicConfig/TestCases/Posix/Output/halt_on_error-signals.c.tmp.log
+ FileCheck --check-prefix=CHECK-COLLISION /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c:29:22: error: CHECK-COLLISION: expected string not found in input
 // CHECK-COLLISION: AddressSanitizer: nested bug in the same thread, aborting
                     ^
<stdin>:1:1: note: scanning from here
=================================================================
^
<stdin>:55:78: note: possible intended match here
AddressSanitizer/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/./bin/llvm-symbolizer: nested bug in the same thread, aborting.
                                                                             ^

Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ================================================================= 
check:29'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            2: ==3528958==ERROR: AddressSanitizer: use-after-poison on address 0xaaaab3164160 at pc 0xaaaab3151c24 bp 0xfbffa81fe620 sp 0xfbffa81fe618 
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            3: WRITE of size 1 at 0xaaaab3164160 thread T1 
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            4:  #0 0xaaaab3151c20 in error /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c:32:12 
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            5:  #1 0xaaaab3151fd0 in receiver /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/test/asan/TestCases/Posix/halt_on_error-signals.c:66:5 
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            6:  #2 0xffffabb63200 in asan_thread_start(void*) /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/compiler-rt/lib/asan/asan_interceptors.cpp:239:28 
check:29'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            .
            .
...

@frobtech frobtech mentioned this pull request Oct 25, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…Rs. (llvm#108945)

In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.

Fixes llvm#106523.

PR: llvm#108945
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion `hasPlanWithVF(ScalarVF) && "More than a single plan/VF w/o any plan having scalar VF"' failed.
5 participants