Skip to content

[InstCombine] Detect different vscales in div by shift combine. #126411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

davemgreen
Copy link
Collaborator

This attempts to fix a regression in code that performs svcntb() / svcntw()
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.

After #121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.

This attempts to fix a regression in code that performs `svcntb() / svcntw()`
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.

After llvm#121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.
@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: David Green (davemgreen)

Changes

This attempts to fix a regression in code that performs svcntb() / svcntw()
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.

After #121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.


Full diff: https://github.com/llvm/llvm-project/pull/126411.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+5-3)
  • (modified) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll (+11)
  • (modified) llvm/test/Transforms/InstCombine/div-shift.ll (+14)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index c8bdf029dd71c37..b2382eb7aed3196 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1205,14 +1205,16 @@ static Value *foldIDivShl(BinaryOperator &I, InstCombiner::BuilderTy &Builder) {
 
   // If X << Y and X << Z does not overflow, then:
   // (X << Y) / (X << Z) -> (1 << Y) / (1 << Z) -> 1 << Y >> Z
-  if (match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
-      match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) {
+  if ((match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
+       match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) ||
+      (match(Op0, m_Shl(m_VScale(), m_Value(Y))) &&
+       match(Op1, m_Shl(m_VScale(), m_Value(Z))))) {
     auto *Shl0 = cast<OverflowingBinaryOperator>(Op0);
     auto *Shl1 = cast<OverflowingBinaryOperator>(Op1);
 
     if (IsSigned ? (Shl0->hasNoSignedWrap() && Shl1->hasNoSignedWrap())
                  : (Shl0->hasNoUnsignedWrap() && Shl1->hasNoUnsignedWrap())) {
-      Constant *One = ConstantInt::get(X->getType(), 1);
+      Constant *One = ConstantInt::get(Op0->getType(), 1);
       // Only preserve the nsw flag if dividend has nsw
       // or divisor has nsw and operator is sdiv.
       Value *Dividend = Builder.CreateShl(
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
index 4e7e9eeb7250bcd..a398997a3453d2f 100644
--- a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
@@ -240,6 +240,17 @@ define i64 @cntd_all() {
 }
 
 
+define i64 @udiv() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv(
+; CHECK-NEXT:    ret i64 4
+;
+  %a = call i64 @llvm.aarch64.sve.cntb(i32 31)
+  %b = call i64 @llvm.aarch64.sve.cntw(i32 31)
+  %c = udiv i64 %a, %b
+  ret i64 %c
+}
+
+
 declare i64 @llvm.aarch64.sve.cntb(i32 %pattern)
 declare i64 @llvm.aarch64.sve.cnth(i32 %pattern)
 declare i64 @llvm.aarch64.sve.cntw(i32 %pattern)
diff --git a/llvm/test/Transforms/InstCombine/div-shift.ll b/llvm/test/Transforms/InstCombine/div-shift.ll
index af83f37011ba014..f42aaa1c60750e4 100644
--- a/llvm/test/Transforms/InstCombine/div-shift.ll
+++ b/llvm/test/Transforms/InstCombine/div-shift.ll
@@ -1399,3 +1399,17 @@ start:
   %div = udiv i8 %x, %y
   ret i8 %div
 }
+
+define i32 @udiv_shl_pair_const_vscale() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv_shl_pair_const_vscale(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i32 2
+;
+entry:
+  %a = call i32 @llvm.vscale()
+  %b = call i32 @llvm.vscale()
+  %lhs = shl nuw i32 %a, 2
+  %rhs = shl nuw i32 %b, 1
+  %div = udiv i32 %lhs, %rhs
+  ret i32 %div
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a fold like this instead? https://alive2.llvm.org/ce/z/aCJLn9

MDevereau added a commit to MDevereau/llvm-project that referenced this pull request Feb 13, 2025
(llvm#121386) Introduced cttz intrinsics which caused a regression
where vscale/vscale divisions could no longer be constant folded.

This fold was suggested as a fix in (llvm#126411)
@davemgreen
Copy link
Collaborator Author

Matt had a go at that here: #127055

@davemgreen davemgreen closed this Feb 16, 2025
@davemgreen davemgreen deleted the gh-ic-vscaledivshift branch February 16, 2025 15:39
MDevereau added a commit that referenced this pull request Feb 18, 2025
#121386 Introduced cttz intrinsics which caused a regression where
vscale/vscale divisions could no longer be constant folded.

This fold was suggested as a fix in #126411.
https://alive2.llvm.org/ce/z/gWbtPw
davemgreen added a commit that referenced this pull request Feb 18, 2025
See #126411 / #127055, the test isn't expected to fold in a single instcombine
iteration, needing instcombine->cse->instcombine.
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
llvm#121386 Introduced cttz intrinsics which caused a regression where
vscale/vscale divisions could no longer be constant folded.

This fold was suggested as a fix in llvm#126411.
https://alive2.llvm.org/ce/z/gWbtPw
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine
iteration, needing instcombine->cse->instcombine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants