Skip to content

Conversation

Mel-Chen
Copy link
Contributor

Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
This patch replaces

  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)

with

  vp.merge(conditional_mask, LHS, RHS, EVL)

so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.

@llvmbot
Copy link
Member

llvmbot commented Aug 18, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
This patch replaces

  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)

with

  vp.merge(conditional_mask, LHS, RHS, EVL)

so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.


Full diff: https://github.com/llvm/llvm-project/pull/154072.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+10-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+3-3)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 05c12b7a1adcc..d015a1ccf9c2a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2160,18 +2160,20 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
         return new VPReductionEVLRecipe(*Red, EVL, NewMask);
       })
       .Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
-        VPValue *LHS, *RHS;
+        VPValue *Cond, *LHS, *RHS;
         // Transform select with a header mask condition
-        //   select(header_mask, LHS, RHS)
+        //   select(mask_w/_header_mask, LHS, RHS)
         // into vector predication merge.
-        //   vp.merge(all-true, LHS, RHS, EVL)
-        if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
-                                 m_VPValue(RHS))))
+        //   vp.merge(mask_w/o_header_mask, LHS, RHS, EVL)
+        if (!match(VPI,
+                   m_Select(m_VPValue(Cond), m_VPValue(LHS), m_VPValue(RHS))))
           return nullptr;
-        // Use all true as the condition because this transformation is
-        // limited to selects whose condition is a header mask.
+
+	VPValue *NewMask = GetNewMask(Cond);
+	if (!NewMask)
+	  NewMask = &AllOneMask;
         return new VPWidenIntrinsicRecipe(
-            Intrinsic::vp_merge, {&AllOneMask, LHS, RHS, &EVL},
+            Intrinsic::vp_merge, {NewMask, LHS, RHS, &EVL},
             TypeInfo.inferScalarType(LHS), VPI->getDebugLoc());
       })
       .Default([&](VPRecipeBase *R) { return nullptr; });
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
index 3af328fb6568e..7efaf2080810b 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
@@ -371,7 +371,7 @@ define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = udiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -486,7 +486,7 @@ define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = sdiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -817,7 +817,7 @@ define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD]], splat (i8 -128)
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 16 x i1> [[TMP15]], <vscale x 16 x i1> [[TMP9]], <vscale x 16 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.vp.merge.nxv16i8(<vscale x 16 x i1> [[TMP9]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = sdiv <vscale x 16 x i8> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> [[TMP11]], <vscale x 16 x i8> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv16i8.p0(<vscale x 16 x i8> [[PREDPHI]], ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])

Copy link

github-actions bot commented Aug 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Mel-Chen Mel-Chen force-pushed the evl-get-new-select-cond branch from ebcdecc to 96d4b98 Compare August 18, 2025 08:35
@lukel97
Copy link
Contributor

lukel97 commented Aug 18, 2025

I'm not sure if this is the right approach, since it still leaves around a vmv.v.i to mask the divisor. What I originally tried in #148828 was to fold the div/rem into a VP div/rem, but in that PR it was relying on nothing in the VPlan ever reading past EVL lanes.

What I think is safer is to emit the VP intrinsic when the recipe is initially being widened using a mask, that way we know the lanes are defined as poison, and then optimising the mask to EVL inoptimizeMaskToEVL. I've opened up #154076 for this, what do you think?

@Mel-Chen
Copy link
Contributor Author

I'm not sure if this is the right approach, since it still leaves around a vmv.v.i to mask the divisor. What I originally tried in #148828 was to fold the div/rem into a VP div/rem, but in that PR it was relying on nothing in the VPlan ever reading past EVL lanes.

What I think is safer is to emit the VP intrinsic when the recipe is initially being widened using a mask, that way we know the lanes are defined as poison, and then optimising the mask to EVL inoptimizeMaskToEVL. I've opened up #154076 for this, what do you think?

This patch wasn’t intended to address the vp.div issue, actually. :D
I just notice that we never performed this transformation, and in fact it should affect correctness before we replace the header mask with the EVL mask. It just hadn’t been caught until now.

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Sep 9, 2025

ping. Can we use this approach first to allow the header mask to be removed?

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to have this as an incremental improvement, but this change won't be correct without #155394 landing first. The other VP recipe transforms also have the same incorrect behaviour but I'd like to avoid making the problem worse


VPValue *NewMask = GetNewMask(Cond);
if (!NewMask)
NewMask = &AllOneMask;
Copy link
Contributor

@lukel97 lukel97 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not correct to transform a recipe with no header mask to a VP intrinsic with EVL, e.g. this will now transform

select <all ones>, foo, bar -> vp.select <all ones>, foo, bar, EVL

Which doesn't have the same semantics.

This is something the VPWidenLoadRecipe/VPWidenStoreRecipe/VPReductionRecipe recipes do today too, but I think we really need to fix this in #155394 first.

If we land #155394, then we can rewrite the pattern in this PR as

        if (!match(VPI,
                   m_Select(m_RemoveMask(HeaderMask, Mask), m_VPValue(LHS), m_VPValue(RHS))))
          return nullptr;

Which will only transform selects that use the header mask, and should be correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is indeed a bit problematic, though we probably won’t find such an example in practice (it might never occur, or it could just get replaced directly by the LHS). We do need to first ensure that the mask is formed from a logical-and tree including the header mask before deciding what the new mask should be. #155394 looks good, but I’d like to adjusting GetNewMask in this PR to see if we can achieve the same effect. 3060d05

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there much point in changing GetNewMask if #155394 is trying to remove it?

@Mel-Chen Mel-Chen force-pushed the evl-get-new-select-cond branch from 96d4b98 to 3060d05 Compare September 11, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants