[InstCombine] Detect different vscales in div by shift combine. #126411

davemgreen · 2025-02-09T09:56:11Z

This attempts to fix a regression in code that performs svcntb() / svcntw()
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.

After #121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.

This attempts to fix a regression in code that performs `svcntb() / svcntw()` (which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous expand the svcnt into two different vscale intrinsics, CSE them in a later pass and then fold udiv of shifts into a constant in a second instcombine. After llvm#121386 we now introduce a cttz. This patch just adds an additional check for vscale to the div of shift fold, allowing it to happen earlier and avoiding the need to look through the awkward (but probably not impossible) cttz that was introduced.

llvmbot · 2025-02-09T09:56:44Z

@llvm/pr-subscribers-llvm-transforms

Author: David Green (davemgreen)

Changes

This attempts to fix a regression in code that performs svcntb() / svcntw()
(which is just a constant). https://godbolt.org/z/4o3a67s6n. We would previous
expand the svcnt into two different vscale intrinsics, CSE them in a later pass
and then fold udiv of shifts into a constant in a second instcombine.

After #121386 we now introduce a cttz. This patch just adds an additional check
for vscale to the div of shift fold, allowing it to happen earlier and avoiding
the need to look through the awkward (but probably not impossible) cttz that
was introduced.

Full diff: https://github.com/llvm/llvm-project/pull/126411.diff

3 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+5-3)
(modified) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll (+11)
(modified) llvm/test/Transforms/InstCombine/div-shift.ll (+14)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index c8bdf029dd71c37..b2382eb7aed3196 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1205,14 +1205,16 @@ static Value *foldIDivShl(BinaryOperator &I, InstCombiner::BuilderTy &Builder) {
 
   // If X << Y and X << Z does not overflow, then:
   // (X << Y) / (X << Z) -> (1 << Y) / (1 << Z) -> 1 << Y >> Z
-  if (match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
-      match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) {
+  if ((match(Op0, m_Shl(m_Value(X), m_Value(Y))) &&
+       match(Op1, m_Shl(m_Specific(X), m_Value(Z)))) ||
+      (match(Op0, m_Shl(m_VScale(), m_Value(Y))) &&
+       match(Op1, m_Shl(m_VScale(), m_Value(Z))))) {
     auto *Shl0 = cast<OverflowingBinaryOperator>(Op0);
     auto *Shl1 = cast<OverflowingBinaryOperator>(Op1);
 
     if (IsSigned ? (Shl0->hasNoSignedWrap() && Shl1->hasNoSignedWrap())
                  : (Shl0->hasNoUnsignedWrap() && Shl1->hasNoUnsignedWrap())) {
-      Constant *One = ConstantInt::get(X->getType(), 1);
+      Constant *One = ConstantInt::get(Op0->getType(), 1);
       // Only preserve the nsw flag if dividend has nsw
       // or divisor has nsw and operator is sdiv.
       Value *Dividend = Builder.CreateShl(
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
index 4e7e9eeb7250bcd..a398997a3453d2f 100644
--- a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-opts-counting-elems.ll
@@ -240,6 +240,17 @@ define i64 @cntd_all() {
 }
 
 
+define i64 @udiv() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv(
+; CHECK-NEXT:    ret i64 4
+;
+  %a = call i64 @llvm.aarch64.sve.cntb(i32 31)
+  %b = call i64 @llvm.aarch64.sve.cntw(i32 31)
+  %c = udiv i64 %a, %b
+  ret i64 %c
+}
+
+
 declare i64 @llvm.aarch64.sve.cntb(i32 %pattern)
 declare i64 @llvm.aarch64.sve.cnth(i32 %pattern)
 declare i64 @llvm.aarch64.sve.cntw(i32 %pattern)
diff --git a/llvm/test/Transforms/InstCombine/div-shift.ll b/llvm/test/Transforms/InstCombine/div-shift.ll
index af83f37011ba014..f42aaa1c60750e4 100644
--- a/llvm/test/Transforms/InstCombine/div-shift.ll
+++ b/llvm/test/Transforms/InstCombine/div-shift.ll
@@ -1399,3 +1399,17 @@ start:
   %div = udiv i8 %x, %y
   ret i8 %div
 }
+
+define i32 @udiv_shl_pair_const_vscale() vscale_range(1, 16) {
+; CHECK-LABEL: @udiv_shl_pair_const_vscale(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i32 2
+;
+entry:
+  %a = call i32 @llvm.vscale()
+  %b = call i32 @llvm.vscale()
+  %lhs = shl nuw i32 %a, 2
+  %rhs = shl nuw i32 %b, 1
+  %div = udiv i32 %lhs, %rhs
+  ret i32 %div
+}

nikic

Would it make sense to add a fold like this instead? https://alive2.llvm.org/ce/z/aCJLn9

(llvm#121386) Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in (llvm#126411)

davemgreen · 2025-02-16T15:38:58Z

Matt had a go at that here: #127055

#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in #126411. https://alive2.llvm.org/ce/z/gWbtPw

See #126411 / #127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

llvm#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in llvm#126411. https://alive2.llvm.org/ce/z/gWbtPw

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

davemgreen added 2 commits February 9, 2025 06:02

[InstCombine] Add an extra test for udiv_shl_pair_const with vscale

4b0a7e6

davemgreen requested review from nikic, dtcxzyw and veera-sivarajan February 9, 2025 09:56

llvmbot added llvm:instcombine llvm:transforms labels Feb 9, 2025

nikic reviewed Feb 9, 2025

View reviewed changes

MDevereau mentioned this pull request Feb 13, 2025

[InstCombine] Fold shift+cttz with power of 2 operands #127055

Merged

davemgreen closed this Feb 16, 2025

davemgreen deleted the gh-ic-vscaledivshift branch February 16, 2025 15:39

davemgreen added a commit that referenced this pull request Feb 18, 2025

[AArch64] Add a phase-ordering test for dividing vscale. NFC

c71f914

See #126411 / #127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Detect different vscales in div by shift combine. #126411

[InstCombine] Detect different vscales in div by shift combine. #126411

davemgreen commented Feb 9, 2025

llvmbot commented Feb 9, 2025

nikic left a comment

davemgreen commented Feb 16, 2025

[InstCombine] Detect different vscales in div by shift combine. #126411

[InstCombine] Detect different vscales in div by shift combine. #126411

Conversation

davemgreen commented Feb 9, 2025

llvmbot commented Feb 9, 2025

nikic left a comment

Choose a reason for hiding this comment

davemgreen commented Feb 16, 2025