-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[LV][EVL] Replace VPInstruction::Select with vp.merge for predicated div/rem #154072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-risc-v Author: Mel Chen (Mel-Chen) ChangesSince div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
with
so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL. Full diff: https://github.com/llvm/llvm-project/pull/154072.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 05c12b7a1adcc..d015a1ccf9c2a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2160,18 +2160,20 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
return new VPReductionEVLRecipe(*Red, EVL, NewMask);
})
.Case<VPInstruction>([&](VPInstruction *VPI) -> VPRecipeBase * {
- VPValue *LHS, *RHS;
+ VPValue *Cond, *LHS, *RHS;
// Transform select with a header mask condition
- // select(header_mask, LHS, RHS)
+ // select(mask_w/_header_mask, LHS, RHS)
// into vector predication merge.
- // vp.merge(all-true, LHS, RHS, EVL)
- if (!match(VPI, m_Select(m_Specific(HeaderMask), m_VPValue(LHS),
- m_VPValue(RHS))))
+ // vp.merge(mask_w/o_header_mask, LHS, RHS, EVL)
+ if (!match(VPI,
+ m_Select(m_VPValue(Cond), m_VPValue(LHS), m_VPValue(RHS))))
return nullptr;
- // Use all true as the condition because this transformation is
- // limited to selects whose condition is a header mask.
+
+ VPValue *NewMask = GetNewMask(Cond);
+ if (!NewMask)
+ NewMask = &AllOneMask;
return new VPWidenIntrinsicRecipe(
- Intrinsic::vp_merge, {&AllOneMask, LHS, RHS, &EVL},
+ Intrinsic::vp_merge, {NewMask, LHS, RHS, &EVL},
TypeInfo.inferScalarType(LHS), VPI->getDebugLoc());
})
.Default([&](VPRecipeBase *R) { return nullptr; });
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
index 3af328fb6568e..7efaf2080810b 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
@@ -371,7 +371,7 @@ define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
; CHECK-NEXT: [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT: [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
; CHECK-NEXT: [[TMP11:%.*]] = udiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
; CHECK-NEXT: call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -486,7 +486,7 @@ define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
; CHECK-NEXT: [[TMP16:%.*]] = select <vscale x 2 x i1> [[TMP15]], <vscale x 2 x i1> [[TMP6]], <vscale x 2 x i1> zeroinitializer
-; CHECK-NEXT: [[TMP10:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1)
+; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 2 x i64> @llvm.vp.merge.nxv2i64(<vscale x 2 x i1> [[TMP6]], <vscale x 2 x i64> [[BROADCAST_SPLAT]], <vscale x 2 x i64> splat (i64 1), i32 [[TMP12]])
; CHECK-NEXT: [[TMP11:%.*]] = sdiv <vscale x 2 x i64> [[WIDE_LOAD]], [[TMP10]]
; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 2 x i1> [[TMP16]], <vscale x 2 x i64> [[TMP11]], <vscale x 2 x i64> [[WIDE_LOAD]]
; CHECK-NEXT: call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[PREDPHI]], ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
@@ -817,7 +817,7 @@ define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])
; CHECK-NEXT: [[TMP9:%.*]] = icmp ne <vscale x 16 x i8> [[WIDE_LOAD]], splat (i8 -128)
; CHECK-NEXT: [[TMP16:%.*]] = select <vscale x 16 x i1> [[TMP15]], <vscale x 16 x i1> [[TMP9]], <vscale x 16 x i1> zeroinitializer
-; CHECK-NEXT: [[TMP10:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1)
+; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.vp.merge.nxv16i8(<vscale x 16 x i1> [[TMP9]], <vscale x 16 x i8> splat (i8 -1), <vscale x 16 x i8> splat (i8 1), i32 [[TMP12]])
; CHECK-NEXT: [[TMP11:%.*]] = sdiv <vscale x 16 x i8> [[WIDE_LOAD]], [[TMP10]]
; CHECK-NEXT: [[PREDPHI:%.*]] = select <vscale x 16 x i1> [[TMP16]], <vscale x 16 x i8> [[TMP11]], <vscale x 16 x i8> [[WIDE_LOAD]]
; CHECK-NEXT: call void @llvm.vp.store.nxv16i8.p0(<vscale x 16 x i8> [[PREDPHI]], ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
ebcdecc
to
96d4b98
Compare
I'm not sure if this is the right approach, since it still leaves around a vmv.v.i to mask the divisor. What I originally tried in #148828 was to fold the div/rem into a VP div/rem, but in that PR it was relying on nothing in the VPlan ever reading past EVL lanes. What I think is safer is to emit the VP intrinsic when the recipe is initially being widened using a mask, that way we know the lanes are defined as poison, and then optimising the mask to EVL in |
This patch wasn’t intended to address the vp.div issue, actually. :D |
ping. Can we use this approach first to allow the header mask to be removed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to have this as an incremental improvement, but this change won't be correct without #155394 landing first. The other VP recipe transforms also have the same incorrect behaviour but I'd like to avoid making the problem worse
|
||
VPValue *NewMask = GetNewMask(Cond); | ||
if (!NewMask) | ||
NewMask = &AllOneMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not correct to transform a recipe with no header mask to a VP intrinsic with EVL, e.g. this will now transform
select <all ones>, foo, bar -> vp.select <all ones>, foo, bar, EVL
Which doesn't have the same semantics.
This is something the VPWidenLoadRecipe/VPWidenStoreRecipe/VPReductionRecipe recipes do today too, but I think we really need to fix this in #155394 first.
If we land #155394, then we can rewrite the pattern in this PR as
if (!match(VPI,
m_Select(m_RemoveMask(HeaderMask, Mask), m_VPValue(LHS), m_VPValue(RHS))))
return nullptr;
Which will only transform selects that use the header mask, and should be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is indeed a bit problematic, though we probably won’t find such an example in practice (it might never occur, or it could just get replaced directly by the LHS). We do need to first ensure that the mask is formed from a logical-and tree including the header mask before deciding what the new mask should be. #155394 looks good, but I’d like to adjusting GetNewMask in this PR to see if we can achieve the same effect. 3060d05
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there much point in changing GetNewMask if #155394 is trying to remove it?
96d4b98
to
3060d05
Compare
Since div/rem operations don’t support a mask operand, the lanes of the divisor that are masked out are currently replaced with 1 using VPInstruction::Select before the predicated div/rem operation.
This patch replaces
with
so that the header mask can be replaced by EVL in this usage scenario when tail folding with EVL.