[VPlan] Perform optimizeMaskToEVL in terms of pattern matching #155394

lukel97 · 2025-08-26T10:42:50Z

Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask.

So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in #113667

This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it is exactly masked with the header mask.

After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change.

This fixes #152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in #153144

It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make #146525 easier to implement and reason about.

llvmbot · 2025-08-26T10:43:22Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

Stacked on #155383

Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask.

So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in #113667

This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it uses the header mask.

It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make #146525 easier to implement and reason about.

After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change.

This fixes #152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in #153144

Patch is 35.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155394.diff

6 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+75-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+163-111)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+1-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+16-16)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 1ec6ae677374c..f9d8d2f0856e1 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -242,7 +242,8 @@ struct Recipe_match {
     if constexpr (std::is_same<RecipeTy, VPScalarIVStepsRecipe>::value ||
                   std::is_same<RecipeTy, VPCanonicalIVPHIRecipe>::value ||
                   std::is_same<RecipeTy, VPDerivedIVRecipe>::value ||
-                  std::is_same<RecipeTy, VPWidenGEPRecipe>::value)
+                  std::is_same<RecipeTy, VPWidenGEPRecipe>::value ||
+                  std::is_same<RecipeTy, VPVectorEndPointerRecipe>::value)
       return DefR;
     else
       return DefR && DefR->getOpcode() == Opcode;
@@ -554,6 +555,79 @@ m_DerivedIV(const Op0_t &Op0, const Op1_t &Op1, const Op2_t &Op2) {
   return VPDerivedIV_match<Op0_t, Op1_t, Op2_t>({Op0, Op1, Op2});
 }
 
+template <typename Addr_t, typename Mask_t, bool Reverse> struct Load_match {
+  Addr_t Addr;
+  Mask_t Mask;
+
+  Load_match(Addr_t Addr, Mask_t Mask) : Addr(Addr), Mask(Mask) {}
+
+  template <typename OpTy> bool match(const OpTy *V) const {
+    auto *Load = dyn_cast<VPWidenLoadRecipe>(V);
+    if (!Load || Load->isReverse() != Reverse || !Addr.match(Load->getAddr()) ||
+        !Load->isMasked() || !Mask.match(Load->getMask()))
+      return false;
+    return true;
+  }
+};
+
+/// Match a non-reversed masked load.
+template <typename Addr_t, typename Mask_t>
+inline Load_match<Addr_t, Mask_t, false> m_Load(const Addr_t &Addr,
+                                                const Mask_t &Mask) {
+  return Load_match<Addr_t, Mask_t, false>(Addr, Mask);
+}
+
+/// Match a reversed masked load.
+template <typename Addr_t, typename Mask_t>
+inline Load_match<Addr_t, Mask_t, true> m_ReverseLoad(const Addr_t &Addr,
+                                                      const Mask_t &Mask) {
+  return Load_match<Addr_t, Mask_t, true>(Addr, Mask);
+}
+
+template <typename Addr_t, typename Val_t, typename Mask_t, bool Reverse>
+struct Store_match {
+  Addr_t Addr;
+  Val_t Val;
+  Mask_t Mask;
+
+  Store_match(Addr_t Addr, Val_t Val, Mask_t Mask)
+      : Addr(Addr), Val(Val), Mask(Mask) {}
+
+  template <typename OpTy> bool match(const OpTy *V) const {
+    auto *Store = dyn_cast<VPWidenStoreRecipe>(V);
+    if (!Store || Store->isReverse() != Reverse ||
+        !Addr.match(Store->getAddr()) || !Val.match(Store->getStoredValue()) ||
+        !Store->isMasked() || !Mask.match(Store->getMask()))
+      return false;
+    return true;
+  }
+};
+
+/// Match a non-reversed masked store.
+template <typename Addr_t, typename Val_t, typename Mask_t>
+inline Store_match<Addr_t, Val_t, Mask_t, false>
+m_Store(const Addr_t &Addr, const Val_t &Val, const Mask_t &Mask) {
+  return Store_match<Addr_t, Val_t, Mask_t, false>(Addr, Val, Mask);
+}
+
+/// Match a reversed masked store.
+template <typename Addr_t, typename Val_t, typename Mask_t>
+inline Store_match<Addr_t, Val_t, Mask_t, true>
+m_ReverseStore(const Addr_t &Addr, const Val_t &Val, const Mask_t &Mask) {
+  return Store_match<Addr_t, Val_t, Mask_t, true>(Addr, Val, Mask);
+}
+
+template <typename Op0_t, typename Op1_t>
+using VectorEndPointerRecipe_match =
+    Recipe_match<std::tuple<Op0_t, Op1_t>, 0,
+                 /*Commutative*/ false, VPVectorEndPointerRecipe>;
+
+template <typename Op0_t, typename Op1_t>
+VectorEndPointerRecipe_match<Op0_t, Op1_t> m_VecEndPtr(const Op0_t &Op0,
+                                                       const Op1_t &Op1) {
+  return VectorEndPointerRecipe_match<Op0_t, Op1_t>(Op0, Op1);
+}
+
 /// Match a call argument at a given argument index.
 template <typename Opnd_t> struct Argument_match {
   /// Call argument index to match.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 56175e7f18145..52bbcf3adbec7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -998,6 +998,8 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
   if (!Def)
     return;
 
+  VPBuilder Builder(Def);
+
   // Simplification of live-in IR values for SingleDef recipes using
   // InstSimplifyFolder.
   if (TypeSwitch<VPRecipeBase *, bool>(&R)
@@ -1067,7 +1069,7 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
   // TODO: Split up into simpler, modular combines: (X && Y) || (X && Z) into X
   // && (Y || Z) and (X || !X) into true. This requires queuing newly created
   // recipes to be visited during simplification.
-  VPValue *X, *Y;
+  VPValue *X, *Y, *Z;
   if (match(Def,
             m_c_BinaryOr(m_LogicalAnd(m_VPValue(X), m_VPValue(Y)),
                          m_LogicalAnd(m_Deferred(X), m_Not(m_Deferred(Y)))))) {
@@ -1084,6 +1086,15 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
     return;
   }
 
+  // (x && y) || (x && z) -> x && (y || z)
+  if (match(Def, m_c_BinaryOr(m_LogicalAnd(m_VPValue(X), m_VPValue(Y)),
+                              m_LogicalAnd(m_Deferred(X), m_VPValue(Z)))) &&
+      // Creating an extra recipe, so at least one arm needs to have one use.
+      (!Def->getOperand(0)->hasMoreThanOneUniqueUser() ||
+       !Def->getOperand(1)->hasMoreThanOneUniqueUser()))
+    return Def->replaceAllUsesWith(
+        Builder.createLogicalAnd(X, Builder.createOr(Y, Z)));
+
   if (match(Def, m_Select(m_VPValue(), m_VPValue(X), m_Deferred(X))))
     return Def->replaceAllUsesWith(X);
 
@@ -1150,7 +1161,7 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
                      m_VPValue(X), m_SpecificInt(1)))) {
     Type *WideStepTy = TypeInfo.inferScalarType(Def);
     if (TypeInfo.inferScalarType(X) != WideStepTy)
-      X = VPBuilder(Def).createWidenCast(Instruction::Trunc, X, WideStepTy);
+      X = Builder.createWidenCast(Instruction::Trunc, X, WideStepTy);
     Def->replaceAllUsesWith(X);
     return;
   }
@@ -1240,7 +1251,86 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
   }
 }
 
+/// Collect the header mask with the pattern:
+///   (ICMP_ULE, WideCanonicalIV, backedge-taken-count)
+/// TODO: Introduce explicit recipe for header-mask instead of searching
+/// for the header-mask pattern manually.
+static VPSingleDefRecipe *findHeaderMask(VPlan &Plan) {
+  SmallVector<VPValue *> WideCanonicalIVs;
+  auto *FoundWidenCanonicalIVUser =
+      find_if(Plan.getCanonicalIV()->users(),
+              [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); });
+  assert(count_if(Plan.getCanonicalIV()->users(),
+                  [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); }) <=
+             1 &&
+         "Must have at most one VPWideCanonicalIVRecipe");
+  if (FoundWidenCanonicalIVUser != Plan.getCanonicalIV()->users().end()) {
+    auto *WideCanonicalIV =
+        cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
+    WideCanonicalIVs.push_back(WideCanonicalIV);
+  }
+
+  // Also include VPWidenIntOrFpInductionRecipes that represent a widened
+  // version of the canonical induction.
+  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
+  for (VPRecipeBase &Phi : HeaderVPBB->phis()) {
+    auto *WidenOriginalIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi);
+    if (WidenOriginalIV && WidenOriginalIV->isCanonical())
+      WideCanonicalIVs.push_back(WidenOriginalIV);
+  }
+
+  // Walk users of wide canonical IVs and find the single compare of the form
+  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count).
+  VPSingleDefRecipe *HeaderMask = nullptr;
+  for (auto *Wide : WideCanonicalIVs) {
+    for (VPUser *U : SmallVector<VPUser *>(Wide->users())) {
+      auto *VPI = dyn_cast<VPInstruction>(U);
+      if (!VPI || !vputils::isHeaderMask(VPI, Plan))
+        continue;
+
+      assert(VPI->getOperand(0) == Wide &&
+             "WidenCanonicalIV must be the first operand of the compare");
+      assert(!HeaderMask && "Multiple header masks found?");
+      HeaderMask = VPI;
+    }
+  }
+  return HeaderMask;
+}
+
+/// Canonicalize uses of the header mask by pulling out of logical ands to
+/// enable more simplifications.
+static void reassociateHeaderMask(VPlan &Plan) {
+  // Only do it before unrolling, otherwise there can be multiple header masks.
+  if (Plan.isUnrolled())
+    return;
+
+  VPValue *HeaderMask = findHeaderMask(Plan);
+  if (!HeaderMask)
+    return;
+
+  ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(
+      Plan.getEntry());
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+    for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+      auto *V = dyn_cast<VPSingleDefRecipe>(&R);
+      if (!V)
+        continue;
+      VPBuilder Builder(V);
+      VPValue *X, *Y;
+      /// (headermask && x) && y -> headermask && (x && y)
+      if (!match(V, m_LogicalAnd(
+                        m_LogicalAnd(m_Specific(HeaderMask), m_VPValue(X)),
+                        m_VPValue(Y))))
+        continue;
+      V->replaceAllUsesWith(
+          Builder.createLogicalAnd(HeaderMask, Builder.createLogicalAnd(X, Y)));
+      V->eraseFromParent();
+    }
+  }
+}
+
 void VPlanTransforms::simplifyRecipes(VPlan &Plan) {
+  reassociateHeaderMask(Plan);
   ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(
       Plan.getEntry());
   VPTypeAnalysis TypeInfo(Plan);
@@ -2071,52 +2161,6 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch(
   return LaneMaskPhi;
 }
 
-/// Collect the header mask with the pattern:
-///   (ICMP_ULE, WideCanonicalIV, backedge-taken-count)
-/// TODO: Introduce explicit recipe for header-mask instead of searching
-/// for the header-mask pattern manually.
-static VPSingleDefRecipe *findHeaderMask(VPlan &Plan) {
-  SmallVector<VPValue *> WideCanonicalIVs;
-  auto *FoundWidenCanonicalIVUser =
-      find_if(Plan.getCanonicalIV()->users(),
-              [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); });
-  assert(count_if(Plan.getCanonicalIV()->users(),
-                  [](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); }) <=
-             1 &&
-         "Must have at most one VPWideCanonicalIVRecipe");
-  if (FoundWidenCanonicalIVUser != Plan.getCanonicalIV()->users().end()) {
-    auto *WideCanonicalIV =
-        cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
-    WideCanonicalIVs.push_back(WideCanonicalIV);
-  }
-
-  // Also include VPWidenIntOrFpInductionRecipes that represent a widened
-  // version of the canonical induction.
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  for (VPRecipeBase &Phi : HeaderVPBB->phis()) {
-    auto *WidenOriginalIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi);
-    if (WidenOriginalIV && WidenOriginalIV->isCanonical())
-      WideCanonicalIVs.push_back(WidenOriginalIV);
-  }
-
-  // Walk users of wide canonical IVs and find the single compare of the form
-  // (ICMP_ULE, WideCanonicalIV, backedge-taken-count).
-  VPSingleDefRecipe *HeaderMask = nullptr;
-  for (auto *Wide : WideCanonicalIVs) {
-    for (VPUser *U : SmallVector<VPUser *>(Wide->users())) {
-      auto *VPI = dyn_cast<VPInstruction>(U);
-      if (!VPI || !vputils::isHeaderMask(VPI, Plan))
-        continue;
-
-      assert(VPI->getOperand(0) == Wide &&
-             "WidenCanonicalIV must be the first operand of the compare");
-      assert(!HeaderMask && "Multiple header masks found?");
-      HeaderMask = VPI;
-    }
-  }
-  return HeaderMask;
-}
-
 void VPlanTransforms::addActiveLaneMask(
     VPlan &Plan, bool UseActiveLaneMaskForControlFlow,
     bool DataAndControlFlowWithoutRuntimeCheck) {
@@ -2151,86 +2195,94 @@ void VPlanTransforms::addActiveLaneMask(
   HeaderMask->eraseFromParent();
 }
 
+template <typename Op0_t, typename Op1_t> struct RemoveMask_match {
+  Op0_t In;
+  Op1_t &Out;
+
+  RemoveMask_match(const Op0_t &In, Op1_t &Out) : In(In), Out(Out) {}
+
+  template <typename OpTy> bool match(OpTy *V) const {
+    if (m_Specific(In).match(V)) {
+      Out = nullptr;
+      return true;
+    }
+    if (m_LogicalAnd(m_Specific(In), m_VPValue(Out)).match(V))
+      return true;
+    return false;
+  }
+};
+
+/// Match a specific mask \p in, or a combination of it (logical-and in, out).
+/// Returns the remaining part \p out if so, or nullptr otherwise.
+template <typename Op0_t, typename Op1_t>
+static inline RemoveMask_match<Op0_t, Op1_t> m_RemoveMask(const Op0_t &In,
+                                                          Op1_t &Out) {
+  return RemoveMask_match<Op0_t, Op1_t>(In, Out);
+}
+
 /// Try to optimize a \p CurRecipe masked by \p HeaderMask to a corresponding
 /// EVL-based recipe without the header mask. Returns nullptr if no EVL-based
 /// recipe could be created.
 /// \p HeaderMask  Header Mask.
 /// \p CurRecipe   Recipe to be transform.
 /// \p TypeInfo    VPlan-based type analysis.
-/// \p AllOneMask  The vector mask parameter of vector-predication intrinsics.
 /// \p EVL         The explicit vector length parameter of vector-predication
 /// intrinsics.
 static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
                                        VPRecipeBase &CurRecipe,
-                                       VPTypeAnalysis &TypeInfo,
-                                       VPValue &AllOneMask, VPValue &EVL) {
-  // FIXME: Don't transform recipes to EVL recipes if they're not masked by the
-  // header mask.
-  auto GetNewMask = [&](VPValue *OrigMask) -> VPValue * {
-    assert(OrigMask && "Unmasked recipe when folding tail");
-    // HeaderMask will be handled using EVL.
-    VPValue *Mask;
-    if (match(OrigMask, m_LogicalAnd(m_Specific(HeaderMask), m_VPValue(Mask))))
-      return Mask;
-    return HeaderMask == OrigMask ? nullptr : OrigMask;
-  };
+                                       VPTypeAnalysis &TypeInfo, VPValue &EVL) {
+  VPlan *Plan = CurRecipe.getParent()->getPlan();
+  VPValue *Addr, *Mask;
 
   /// Adjust any end pointers so that they point to the end of EVL lanes not VF.
-  auto GetNewAddr = [&CurRecipe, &EVL](VPValue *Addr) -> VPValue * {
-    auto *EndPtr = dyn_cast<VPVectorEndPointerRecipe>(Addr);
-    if (!EndPtr)
-      return Addr;
-    assert(EndPtr->getOperand(1) == &EndPtr->getParent()->getPlan()->getVF() &&
-           "VPVectorEndPointerRecipe with non-VF VF operand?");
-    assert(
-        all_of(EndPtr->users(),
-               [](VPUser *U) {
-                 return cast<VPWidenMemoryRecipe>(U)->isReverse();
-               }) &&
-        "VPVectorEndPointRecipe not used by reversed widened memory recipe?");
-    VPVectorEndPointerRecipe *EVLAddr = EndPtr->clone();
-    EVLAddr->insertBefore(&CurRecipe);
-    EVLAddr->setOperand(1, &EVL);
-    return EVLAddr;
+  auto AdjustEndPtr = [&CurRecipe, &EVL](VPValue *EndPtr) {
+    auto *EVLEndPtr = cast<VPVectorEndPointerRecipe>(EndPtr)->clone();
+    EVLEndPtr->insertBefore(&CurRecipe);
+    EVLEndPtr->setOperand(1, &EVL);
+    return EVLEndPtr;
   };
 
-  return TypeSwitch<VPRecipeBase *, VPRecipeBase *>(&CurRecipe)
-      .Case<VPWidenLoadRecipe>([&](VPWidenLoadRecipe *L) {
-        VPValue *NewMask = GetNewMask(L->getMask());
-        VPValue *NewAddr = GetNewAddr(L->getAddr());
-        return new VPWidenLoadEVLRecipe(*L, NewAddr, EVL, NewMask);
-      })
-      .Case<VPWidenStoreRecipe>([&](VPWidenStoreRecipe *S) {
-        VPValue *NewMask = GetNewMask(S->getMask());
-        VPValue *NewAddr = GetNewAddr(S->getAddr());
-        return new VPWidenStoreEVLRecipe(*S, NewAddr, EVL, NewMask);
-      })
-      .Case<VPReductionRecipe>([&](VPReductionRecipe *Red) {
-        VPValue *NewMask = GetNewMask(Red->getCondOp());
-        return new VPReductionEVLRecipe(*Red, EVL, NewMask);
-      })
-      .Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
-        VPValue *LHS, *RHS;
-        // Transform select with a header mask condition
-        //   select(header_mask, LHS, RHS)
-        // into vector predication merge.
-        //   vp.merge(all-true, LHS, RHS, EVL)
-        if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
-                                 m_VPValue(RHS))))
-          return nullptr;
-        // Use all true as the condition because this transformation is
-        // limited to selects whose condition is a header mask.
-        return new VPWidenIntrinsicRecipe(
-            Intrinsic::vp_merge, {&AllOneMask, LHS, RHS, &EVL},
-            TypeInfo.inferScalarType(LHS), VPI->getDebugLoc());
-      })
-      .Default([&](VPRecipeBase *R) { return nullptr; });
+  if (match(&CurRecipe,
+            m_Load(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
+                                    EVL, Mask);
+
+  if (match(&CurRecipe, m_ReverseLoad(m_VecEndPtr(m_VPValue(Addr),
+                                                  m_Specific(&Plan->getVF())),
+                                      m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe),
+                                    AdjustEndPtr(Addr), EVL, Mask);
+
+  if (match(&CurRecipe, m_Store(m_VPValue(Addr), m_VPValue(),
+                                m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe), Addr,
+                                     EVL, Mask);
+
+  if (match(&CurRecipe,
+            m_ReverseStore(
+                m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())),
+                m_VPValue(), m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenStoreEVLRecipe(cast<VPWidenStoreRecipe>(CurRecipe),
+                                     AdjustEndPtr(Addr), EVL, Mask);
+
+  if (auto *Rdx = dyn_cast<VPReductionRecipe>(&CurRecipe))
+    if (Rdx->isConditional() &&
+        match(Rdx->getCondOp(), m_RemoveMask(HeaderMask, Mask)))
+      return new VPReductionEVLRecipe(*Rdx, EVL, Mask);
+
+  VPValue *LHS, *RHS;
+  if (match(&CurRecipe,
+            m_Select(m_Specific(HeaderMask), m_VPValue(LHS), m_VPValue(RHS))))
+    return new VPWidenIntrinsicRecipe(
+        Intrinsic::vp_merge, {Plan->getTrue(), LHS, RHS, &EVL},
+        TypeInfo.inferScalarType(LHS), CurRecipe.getDebugLoc());
+
+  return nullptr;
 }
 
 /// Replace recipes with their EVL variants.
 static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
   VPTypeAnalysis TypeInfo(Plan);
-  VPValue *AllOneMask = Plan.getTrue();
   VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
   VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
 
@@ -2289,7 +2341,7 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
             ConstantInt::getSigned(Type::getInt32Ty(Plan.getContext()), -1));
         VPWidenIntrinsicRecipe *VPSplice = new VPWidenIntrinsicRecipe(
             Intrinsic::experimental_vp_splice,
-            {V1, V2, Imm, AllOneMask, PrevEVL, &EVL},
+            {V1, V2, Imm, Plan.getTrue(), PrevEVL, &EVL},
             TypeInfo.inferScalarType(R.getVPSingleValue()), R.getDebugLoc());
         VPSplice->insertBefore(&R);
         R.getVPSingleValue()->replaceAllUsesWith(VPSplice);
@@ -2323,7 +2375,7 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
   for (VPUser *U : collectUsersRecursively(EVLMask)) {
     auto *CurRecipe = cast<VPRecipeBase>(U);
     VPRecipeBase *EVLRecipe =
-        optimizeMaskToEVL(EVLMask, *CurRecipe, TypeInfo, *AllOneMask, EVL);
+        optimizeMaskToEVL(EVLMask, *CurRecipe, TypeInfo, EVL);
     if (!EVLRecipe)
       continue;
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll b/llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll
index 757c77fef98c8..d5029bfc47ee8 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll
@@ -447,23 +447,17 @@ define void @multiple_blocks_with_dead_inst_multiple_successors_6(ptr %src, i1 %
 ; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <vscale x 8 x i64> [ [[INDUCTION]], %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[AVL:%.*]] = phi i64 [ [[TMP...
[truncated]

lukel97 · 2025-08-26T10:47:12Z

llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h

These reverse patterns should go away in #146525, because you should be able to express them as (store (vecendptr addr), (reverse x), (reverse mask))

Stacked on llvm#155383 Currently in optimizeMaskToEVL we convert every widened load, store or reduction to a VP predicated recipe with EVL, regardless of whether or not it uses the header mask. So currently we have to be careful when working on other parts VPlan to make sure that the EVL transform doesn't break or transform something incorrectly, because it's not a semantics preserving transform. Forgetting to do so has caused miscompiles before, like the case that was fixed in llvm#113667 This PR rewrites it to work in terms of pattern matching, so it now only converts a recipe to a VP predicated recipe if it uses the header mask. It also splits out the load/store transforms into separate patterns for reversed and non-reversed, which should make llvm#146525 easier to implement and reason about. After this the transform should be a true optimisation and not change any semantics, so it shouldn't miscompile things if other parts of VPlan change. This fixes llvm#152541, and allows us to move addExplicitVectorLength into tryToBuildVPlanWithVPRecipes in llvm#153144

lukel97 · 2025-09-01T14:41:09Z

I've unstacked this PR because I don't think the regressions are that serious, and can be fixed afterwards by #155383.

I think there's value in getting this in earlier as it should make both #146525 and #155579 much simpler to implement, cc) @Mel-Chen

artagnon

Looks good, and the regressions are very minor. I will wait for others to comment on the big-picture.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

…ze/optimizeMaskToEVL-pattern-matching

lukel97 · 2025-09-03T01:50:27Z

Regressions are gone now that #155383 is landed

lukel97 · 2025-09-08T09:43:52Z

Ping

…ching

Mel-Chen · 2025-09-09T07:38:44Z

llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h

+
+  template <typename OpTy> bool match(const OpTy *V) const {
+    auto *Store = dyn_cast<VPWidenStoreRecipe>(V);
+    if (!Store || Store->isReverse() != Reverse ||


Don’t rely on isReverse to determine reverse access, since we plan to separate the reverse mask from load/store recipes and remove Reverse from VPWidenMemoryRecipe.

My plan was that we would land this first and rebase #146525 on top of it.
#146525 would then remove m_ReverseLoad/m_ReverseStore.

This way you wouldn't need to separately handle the reverse addresses here https://github.com/llvm/llvm-project/pull/146525/files#diff-53267225b83e943ceae51c326c9941e323fd7aaf74a08b5e6998d6456f88d1ddR2628-R2659

Instead you would only need to adjust the reverse pattern in optimizeMaskToEVL:

if (match(&CurRecipe, m_Reverse(m_Load(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask)))) && match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) { auto *Load = new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(EndPtr), EVL, Mask); return Builder.createVPReverse(Load, EVL); }

This approach seems to work for now, but it’s not a long-term solution. In the future, a reverse recipe might be sunk, meaning the reverse recipe would no longer directly use the loaded result. For example, consider this simplification rule: BinOp(reverse(V1), reverse(V2)) --> reverse(BinOp(V1, V2)).

But you gave me a really good idea! This won’t happen with reverse stores. The reverse of a stored value will only be eliminated, not sunk. So this method should work for reverse stores, but I don’t think it will work for reverse loads.
ef2d027

I see what you mean about the use-def chain for loads. I don't think that will be a problem though if we only do the BinOp(reverse(V1), reverse(V2)) --> reverse(BinOp(V1, V2)) transform after optimizeMaskToEVL. So we'd have something like

simplifyRecipes(...) optimizeMaskToEVL(...) simplifyReverses(...)

Hmm, my plan is a bit different. I think the order of transformations should be:

simplifyReverse

convertToStridedAccess

set VFs for the plan

EVL lowering

This order is based on the idea that reverse access might later be converted into strided access. If the reverse can be removed entirely, then we don’t need to convert it into strided access.

Oh I see, I think your plan makes sense. But surely convertToStrided access has the same issue with the BinOp(reverse(V1), reverse(V2)) --> reverse(BinOp(V1, V2)) simplification? I think it will also have to peek through the binary op to find the loads too?

If we do have to peek through the use-def chain then we can probably share the logic between convertToStridedAccess and optimizeMaskToEVL.

Mel-Chen

Is this patch NFC?

lukel97 · 2025-09-11T11:39:05Z

Is this patch NFC?

Not really because we no longer convert recipes without a header mask, which might be fixing a correctness issue. As you mention though I'm not sure if we get any edge cases that hit this today. Hopefully not!

Mel-Chen · 2025-09-12T09:43:06Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (match(&CurRecipe,
+            m_Load(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
+                                    EVL, Mask);
+
+  if (match(&CurRecipe,
+            m_ReverseLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe),
+                                    AdjustEndPtr(EndPtr), EVL, Mask);


Could you merge this into one if-block?
Like:

if (match(&CurRecipe, m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask)))) { if (cast<VPWidenLoadRecipe>(CurRecipe)->isReverse()) { // Transform VPVectorEndPointer(ptr, VF) to VPVectorEndPointer(ptr, EVL) return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(EndPtr), EVL, Mask); } return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr, EVL, Mask); }

After we have reverse recipe, I can match it by m_MaskedLoad(m_VPValue(Addr), m_Reverse(m_RemoveMask(HeaderMask, Mask))).

I tried this but it ends up not being correct to merge the two if blocks, because we can only match the reverse load when the pointer address is a VPVectorEndPointer, e.g.:

if (match(&CurRecipe, m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask)))) { if (cast<VPWidenLoadRecipe>(CurRecipe).isReverse() && match(Addr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) { // Transform VPVectorEndPointer(ptr, VF) to VPVectorEndPointer(ptr, EVL) return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(EndPtr), EVL, Mask); } // Incorrect if reverse recipe falls through w/ non VPVectorEndPointer address return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr, EVL, Mask); }

After #155579 I think the pattern should look like:

if (match(&CurRecipe, m_Reverse(m_Load(m_VPValue(EndPtr), m_Reverse(m_RemoveMask(HeaderMask, Mask))))) && match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) { auto *Load = new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(EndPtr), EVL, Mask); return Builder.createVPReverse(Load, EVL); }

Because we also need to convert the reverse to a VP reverse

I tried this but it ends up not being correct to merge the two if blocks, because we can only match the reverse load when the pointer address is a VPVectorEndPointer, e.g.:

if (cast<VPWidenLoadRecipe>(CurRecipe).isReverse() && match(Addr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {

My first question is: why do we need to additionally check whether the address is a VPVectorEndPointer for reverse accesses? Isn’t the whole reason for transforming VPVectorEndPointer(ptr, VF) into VPVectorEndPointer(ptr, EVL) that, once the header mask is reversed, non-active lanes are shifted to the beginning, while the VP intrinsic with EVL can only mask off the tail lanes and not the head lanes? If so, I think we only need:

if (match(&CurRecipe, m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask)))) { if (cast<VPWidenLoadRecipe>(CurRecipe).isReverse()) { bool IsVecEndPtr = match(Addr, m_VecEndPtr(m_VPValue(), m_Specific(&Plan->getVF()))); assert(IsVecEndPtr); // Transform VPVectorEndPointer(ptr, VF) to VPVectorEndPointer(ptr, EVL) return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(Addr), EVL, Mask); } return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr, EVL, Mask); }

My second question is: if we keep the check on both isReverse and m_VPVectorEndPointer, do we actually have any cases is a reverse access with a non-VPVectorEndPointer address?

My first question is: why do we need to additionally check whether the address is a VPVectorEndPointer for reverse accesses? Isn’t the whole reason for transforming VPVectorEndPointer(ptr, VF) into VPVectorEndPointer(ptr, EVL) that, once the header mask is reversed, non-active lanes are shifted to the beginning, while the VP intrinsic with EVL can only mask off the tail lanes and not the head lanes?

Exactly, that's why it's not correct to transform a reverse VPWidenLoadRecipe w/ a non-VPVectorEndPointer address to a reverse VPWidenLoadEVLRecipe. Consider the case with an arbitrary address that we don't call AdjustEndPtr on:

%headermask = [1, 1, 0, 0] %load = VPWidenLoadRecipe(%ptr, %headermask, reverse=true) = reverse([poison, poison, *(ptr + 2), *(ptr + 3)]) = [*(ptr + 3), *(ptr + 2), poison, poison] ---> VPWidenLoadEVLRecipe(%ptr, null, reverse=true, evl=2) = vp.reverse([*ptr, *(ptr + 1), poison, poison], evl=2) = [*(ptr + 1), *ptr, poison, poison]

But with a VPVectorEndPointer that we adjust it's correct:

%headermask = [1, 1, 0, 0] %load = VPWidenLoadRecipe(VPVectorEndPointer(%ptr, VF), %headermask, reverse=true) = reverse([poison, poison, *(ptr - (VF - 1) + 2), *(ptr - (VF - 1) + 3)]) = reverse([poison, poison, *(ptr - 1), *ptr]) = [*ptr, *(ptr - 1), poison, poison] ---> VPWidenLoadEVLRecipe(VPVectorEndPointer(%ptr, 2), null, reverse=true, evl=2) = vp.reverse([*(ptr - (2 - 1)), *(ptr - (2 - 1) + 1), poison, poison], evl=2) = vp.reverse([*(ptr - 1), *ptr, poison, poison], evl=2) = [*ptr, *(ptr - 1), poison, poison]

My second question is: if we keep the check on both isReverse and m_VPVectorEndPointer, do we actually have any cases is a reverse access with a non-VPVectorEndPointer address?

Hopefully not, I think this is an invariant as you suggest. I'll try out the assert that you wrote above and see if it triggers.

Enjoy your vacation by the way, see you in October :)

…ze/optimizeMaskToEVL-pattern-matching

Mel-Chen

Just to let you know, I will out-of-office until the end of September.

Mel-Chen · 2025-09-16T12:47:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (match(&CurRecipe,
+            m_Load(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr,
+                                    EVL, Mask);
+
+  if (match(&CurRecipe,
+            m_ReverseLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
+      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))))
+    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe),
+                                    AdjustEndPtr(EndPtr), EVL, Mask);


I tried this but it ends up not being correct to merge the two if blocks, because we can only match the reverse load when the pointer address is a VPVectorEndPointer, e.g.:

if (cast<VPWidenLoadRecipe>(CurRecipe).isReverse() && match(Addr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF())))) {

My first question is: why do we need to additionally check whether the address is a VPVectorEndPointer for reverse accesses? Isn’t the whole reason for transforming VPVectorEndPointer(ptr, VF) into VPVectorEndPointer(ptr, EVL) that, once the header mask is reversed, non-active lanes are shifted to the beginning, while the VP intrinsic with EVL can only mask off the tail lanes and not the head lanes? If so, I think we only need:

if (match(&CurRecipe, m_MaskedLoad(m_VPValue(Addr), m_RemoveMask(HeaderMask, Mask)))) { if (cast<VPWidenLoadRecipe>(CurRecipe).isReverse()) { bool IsVecEndPtr = match(Addr, m_VecEndPtr(m_VPValue(), m_Specific(&Plan->getVF()))); assert(IsVecEndPtr); // Transform VPVectorEndPointer(ptr, VF) to VPVectorEndPointer(ptr, EVL) return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), AdjustEndPtr(Addr), EVL, Mask); } return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe), Addr, EVL, Mask); }

My second question is: if we keep the check on both isReverse and m_VPVectorEndPointer, do we actually have any cases is a reverse access with a non-VPVectorEndPointer address?

lukel97 requested review from fhahn, arcbbb, ElvisWang123, Mel-Chen, alexey-bataev and artagnon August 26, 2025 10:42

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Aug 26, 2025

lukel97 commented Aug 26, 2025

View reviewed changes

lukel97 force-pushed the loop-vectorize/optimizeMaskToEVL-pattern-matching branch from 5fa09a3 to 71e24f0 Compare September 1, 2025 14:34

lukel97 changed the title ~~[VPlan] Perform optimizeMaskToEVL in terms of pattern matching. NFC~~ [VPlan] Perform optimizeMaskToEVL in terms of pattern matching Sep 1, 2025

artagnon reviewed Sep 1, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp Show resolved Hide resolved

lukel97 added 2 commits September 3, 2025 09:48

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

f6c13fb

…ze/optimizeMaskToEVL-pattern-matching

Fix comment capitalization

3e19e28

Merge branch 'main' into loop-vectorize/optimizeMaskToEVL-pattern-mat…

9a6ac60

…ching

Mel-Chen reviewed Sep 9, 2025

View reviewed changes

lukel97 mentioned this pull request Sep 9, 2025

[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

Open

Mel-Chen reviewed Sep 11, 2025

View reviewed changes

Mel-Chen reviewed Sep 12, 2025

View reviewed changes

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

479951d

…ze/optimizeMaskToEVL-pattern-matching

Mel-Chen reviewed Sep 16, 2025

View reviewed changes

[VPlan] Perform optimizeMaskToEVL in terms of pattern matching #155394

Are you sure you want to change the base?

[VPlan] Perform optimizeMaskToEVL in terms of pattern matching #155394

Uh oh!

Conversation

lukel97 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Sep 1, 2025

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Sep 3, 2025

Uh oh!

lukel97 commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mel-Chen left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mel-Chen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 commented Aug 26, 2025 •

edited

Loading

llvmbot commented Aug 26, 2025 •

edited

Loading