-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[InstCombine] Detect different vscales in div by shift combine. #126411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This attempts to fix a regression in code that performs `svcntb() / svcntw()` (which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous expand the svcnt into two different vscale intrinsics, CSE them in a later pass and then fold udiv of shifts into a constant in a second instcombine. After llvm#121386 we now introduce a cttz. This patch just adds an additional check for vscale to the div of shift fold, allowing it to happen earlier and avoiding the need to look through the awkward (but probably not impossible) cttz that was introduced.
@llvm/pr-subscribers-llvm-transforms Author: David Green (davemgreen) ChangesThis attempts to fix a regression in code that performs After #121386 we now introduce a cttz. This patch just adds an additional check Full diff: https://github.com/llvm/llvm-project/pull/126411.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index c8bdf029dd71c37..b2382eb7aed3196 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1205,14 +1205,16 @@ static Value *foldIDivShl(BinaryOperator &I, InstCombiner::BuilderTy &Builder) {
// If X << Y and X << Z does not overflow, then:
// (X << Y) / (X << Z) -> (1 << Y) / (1 << Z) -> 1 << Y >> Z
- if (match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
- match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) {
+ if ((match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
+ match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) ||
+ (match(Op0, m_Shl(m_VScale(), m_Value(Y))) &&
+ match(Op1, m_Shl(m_VScale(), m_Value(Z))))) {
auto *Shl0 = cast<OverflowingBinaryOperator>(Op0);
auto *Shl1 = cast<OverflowingBinaryOperator>(Op1);
if (IsSigned ? (Shl0->hasNoSignedWrap() && Shl1->hasNoSignedWrap())
: (Shl0->hasNoUnsignedWrap() && Shl1->hasNoUnsignedWrap())) {
- Constant *One = ConstantInt::get(X->getType(), 1);
+ Constant *One = ConstantInt::get(Op0->getType(), 1);
// Only preserve the nsw flag if dividend has nsw
// or divisor has nsw and operator is sdiv.
Value *Dividend = Builder.CreateShl(
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
index 4e7e9eeb7250bcd..a398997a3453d2f 100644
--- a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
@@ -240,6 +240,17 @@ define i64 @cntd_all() {
}
+define i64 @udiv() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv(
+; CHECK-NEXT: ret i64 4
+;
+ %a = call i64 @llvm.aarch64.sve.cntb(i32 31)
+ %b = call i64 @llvm.aarch64.sve.cntw(i32 31)
+ %c = udiv i64 %a, %b
+ ret i64 %c
+}
+
+
declare i64 @llvm.aarch64.sve.cntb(i32 %pattern)
declare i64 @llvm.aarch64.sve.cnth(i32 %pattern)
declare i64 @llvm.aarch64.sve.cntw(i32 %pattern)
diff --git a/llvm/test/Transforms/InstCombine/div-shift.ll b/llvm/test/Transforms/InstCombine/div-shift.ll
index af83f37011ba014..f42aaa1c60750e4 100644
--- a/llvm/test/Transforms/InstCombine/div-shift.ll
+++ b/llvm/test/Transforms/InstCombine/div-shift.ll
@@ -1399,3 +1399,17 @@ start:
%div = udiv i8 %x, %y
ret i8 %div
}
+
+define i32 @udiv_shl_pair_const_vscale() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv_shl_pair_const_vscale(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: ret i32 2
+;
+entry:
+ %a = call i32 @llvm.vscale()
+ %b = call i32 @llvm.vscale()
+ %lhs = shl nuw i32 %a, 2
+ %rhs = shl nuw i32 %b, 1
+ %div = udiv i32 %lhs, %rhs
+ ret i32 %div
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a fold like this instead? https://alive2.llvm.org/ce/z/aCJLn9
(llvm#121386) Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in (llvm#126411)
Matt had a go at that here: #127055 |
#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in #126411. https://alive2.llvm.org/ce/z/gWbtPw
llvm#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in llvm#126411. https://alive2.llvm.org/ce/z/gWbtPw
See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.
This attempts to fix a regression in code that performs
svcntb() / svcntw()
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.
After #121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.