[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

Mel-Chen · 2025-08-18T08:07:06Z

Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
This patch replaces

  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)

with

  vp.merge(conditional_mask, LHS, RHS, EVL)

so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.

llvmbot · 2025-08-18T08:07:41Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
This patch replaces

  VPInstruction::Select(logical_and(header_mask, conditional_mask), LHS, RHS)

with

  vp.merge(conditional_mask, LHS, RHS, EVL)

so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.

Full diff: https://github.com/llvm/llvm-project/pull/154072.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+10-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+3-3)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 05c12b7a1adcc..d015a1ccf9c2a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2160,18 +2160,20 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
         return new VPReductionEVLRecipe(*Red, EVL, NewMask);
       })
       .Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
-        VPValue *LHS, *RHS;
+        VPValue *Cond, *LHS, *RHS;
         // Transform select with a header mask condition
-        //   select(header_mask, LHS, RHS)
+        //   select(mask_w/_header_mask, LHS, RHS)
         // into vector predication merge.
-        //   vp.merge(all-true, LHS, RHS, EVL)
-        if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
-                                 m_VPValue(RHS))))
+        //   vp.merge(mask_w/o_header_mask, LHS, RHS, EVL)
+        if (!match(VPI,
+                   m_Select(m_VPValue(Cond), m_VPValue(LHS), m_VPValue(RHS))))
           return nullptr;
-        // Use all true as the condition because this transformation is
-        // limited to selects whose condition is a header mask.
+
+	VPValue *NewMask = GetNewMask(Cond);
+	if (!NewMask)
+	  NewMask = &AllOneMask;
         return new VPWidenIntrinsicRecipe(
-            Intrinsic::vp_merge, {&AllOneMask, LHS, RHS, &EVL},
+            Intrinsic::vp_merge, {NewMask, LHS, RHS, &EVL},
             TypeInfo.inferScalarType(LHS), VPI->getDebugLoc());
       })
       .Default([&](VPRecipeBase *R) { return nullptr; });
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
index 3af328fb6568e..7efaf2080810b 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
@@ -371,7 +371,7 @@ define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = udiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -486,7 +486,7 @@ define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = sdiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -817,7 +817,7 @@ define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD]], splat (i8 -128)
 ; CHECK-NEXT:    [[TMP16:%.*]] = select <vscale x 16 x i1> [[TMP15]], <vscale x 16 x i1> [[TMP9]], <vscale x 16 x i1> zeroinitializer
-; CHECK-NEXT:    [[TMP10:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1)
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.vp.merge.nxv16i8(<vscale x 16 x i1> [[TMP9]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1), i32 [[TMP12]])
 ; CHECK-NEXT:    [[TMP11:%.*]] = sdiv <vscale x 16 x i8> [[WIDE_LOAD]], [[TMP10]]
 ; CHECK-NEXT:    [[PREDPHI:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> [[TMP11]], <vscale x 16 x i8> [[WIDE_LOAD]]
 ; CHECK-NEXT:    call void @llvm.vp.store.nxv16i8.p0(<vscale x 16 x i8> [[PREDPHI]], ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])

github-actions · 2025-08-18T08:09:34Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lukel97 · 2025-08-18T08:38:23Z

I'm not sure if this is the right approach, since it still leaves around a vmv.v.i to mask the divisor. What I originally tried in #148828 was to fold the div/rem into a VP div/rem, but in that PR it was relying on nothing in the VPlan ever reading past EVL lanes.

What I think is safer is to emit the VP intrinsic when the recipe is initially being widened using a mask, that way we know the lanes are defined as poison, and then optimising the mask to EVL inoptimizeMaskToEVL. I've opened up #154076 for this, what do you think?

Mel-Chen · 2025-08-18T09:39:30Z

I'm not sure if this is the right approach, since it still leaves around a vmv.v.i to mask the divisor. What I originally tried in #148828 was to fold the div/rem into a VP div/rem, but in that PR it was relying on nothing in the VPlan ever reading past EVL lanes.

What I think is safer is to emit the VP intrinsic when the recipe is initially being widened using a mask, that way we know the lanes are defined as poison, and then optimising the mask to EVL inoptimizeMaskToEVL. I've opened up #154076 for this, what do you think?

This patch wasn’t intended to address the vp.div issue, actually. :Ｄ
I just notice that we never performed this transformation, and in fact it should affect correctness before we replace the header mask with the EVL mask. It just hadn’t been caught until now.

Mel-Chen · 2025-09-09T08:05:01Z

ping. Can we use this approach first to allow the header mask to be removed?

lukel97

I'm happy to have this as an incremental improvement, but this change won't be correct without #155394 landing first. The other VP recipe transforms also have the same incorrect behaviour but I'd like to avoid making the problem worse

lukel97 · 2025-09-09T08:23:02Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+        VPValue *NewMask = GetNewMask(Cond);
+        if (!NewMask)
+          NewMask = &AllOneMask;


It's not correct to transform a recipe with no header mask to a VP intrinsic with EVL, e.g. this will now transform

select <all ones>, foo, bar -> vp.select <all ones>, foo, bar, EVL

Which doesn't have the same semantics.

This is something the VPWidenLoadRecipe/VPWidenStoreRecipe/VPReductionRecipe recipes do today too, but I think we really need to fix this in #155394 first.

If we land #155394, then we can rewrite the pattern in this PR as

if (!match(VPI, m_Select(m_RemoveMask(HeaderMask, Mask), m_VPValue(LHS), m_VPValue(RHS)))) return nullptr;

Which will only transform selects that use the header mask, and should be correct

Ah, this is indeed a bit problematic, though we probably won’t find such an example in practice (it might never occur, or it could just get replaced directly by the LHS). We do need to first ensure that the mask is formed from a logical-and tree including the header mask before deciding what the new mask should be. #155394 looks good, but I’d like to adjusting GetNewMask in this PR to see if we can achieve the same effect. 3060d05

Is there much point in changing GetNewMask if #155394 is trying to remove it?

…, EVL

Mel-Chen requested review from fhahn, lukel97, alexey-bataev and LiqinWeng August 18, 2025 08:07

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Aug 18, 2025

Mel-Chen force-pushed the evl-get-new-select-cond branch from ebcdecc to 96d4b98 Compare August 18, 2025 08:35

lukel97 mentioned this pull request Sep 2, 2025

RISC-V EVL tail folding #123069

Open

17 tasks

Mel-Chen mentioned this pull request Sep 9, 2025

[LV][EVL] Reimplement method for extracting new mask. nfc #156827

Open

lukel97 reviewed Sep 9, 2025

View reviewed changes

Mel-Chen added 2 commits September 11, 2025 01:34

EVL, transform VPInstruction::Select to vp.merge for div

06c6cff

Resolve select <all ones>, foo, bar -> vp.select <all ones>, foo, bar…

3060d05

…, EVL

Mel-Chen force-pushed the evl-get-new-select-cond branch from 96d4b98 to 3060d05 Compare September 11, 2025 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

Uh oh!

Mel-Chen commented Aug 18, 2025

Uh oh!

llvmbot commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

lukel97 commented Aug 18, 2025

Uh oh!

Mel-Chen commented Aug 18, 2025

Uh oh!

Mel-Chen commented Sep 9, 2025

Uh oh!

lukel97 left a comment •

edited

Loading

Uh oh!

lukel97 Sep 9, 2025 •

edited

Loading

Uh oh!

Mel-Chen Sep 11, 2025

Uh oh!

lukel97 Sep 11, 2025

Uh oh!

Uh oh!

[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

Are you sure you want to change the base?

[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072

Uh oh!

Conversation

Mel-Chen commented Aug 18, 2025

Uh oh!

llvmbot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Aug 18, 2025

Uh oh!

Mel-Chen commented Aug 18, 2025

Uh oh!

Mel-Chen commented Sep 9, 2025

Uh oh!

lukel97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading

lukel97 left a comment •

edited

Loading

lukel97 Sep 9, 2025 •

edited

Loading