Skip to content

[ValueTracking] Handle non-canonical operand order in isImpliedCondICmps #85575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

goldsteinn
Copy link
Contributor

We don't always have canonical order here, so do it manually.

@llvmbot
Copy link
Member

llvmbot commented Mar 17, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: None (goldsteinn)

Changes

We don't always have canonical order here, so do it manually.


Full diff: https://github.com/llvm/llvm-project/pull/85575.diff

4 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+9)
  • (modified) llvm/test/Transforms/InstCombine/assume.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/select.ll (+2-4)
  • (modified) llvm/test/Transforms/InstCombine/shift.ll (+3-4)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index edbeede910d7f7..4d85e423982fc0 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -8532,6 +8532,15 @@ static std::optional<bool> isImpliedCondICmps(const ICmpInst *LHS,
   CmpInst::Predicate LPred =
       LHSIsTrue ? LHS->getPredicate() : LHS->getInversePredicate();
 
+  // We can have non-canonical operands here so canonicalize constant to L1/R1.
+  if (match(L0, m_ImmConstant())) {
+    std::swap(L0, L1);
+    LPred = ICmpInst::getSwappedPredicate(LPred);
+  }
+  if (match(R0, m_ImmConstant())) {
+    std::swap(R0, R1);
+    RPred = ICmpInst::getSwappedPredicate(RPred);
+  }
   // Can we infer anything when the 0-operands match and the 1-operands are
   // constants (not necessarily matching)?
   const APInt *LC, *RC;
diff --git a/llvm/test/Transforms/InstCombine/assume.ll b/llvm/test/Transforms/InstCombine/assume.ll
index 927f0a86b0a252..87c75fb2b55592 100644
--- a/llvm/test/Transforms/InstCombine/assume.ll
+++ b/llvm/test/Transforms/InstCombine/assume.ll
@@ -386,7 +386,7 @@ define i1 @nonnull5(ptr %a) {
 define i32 @assumption_conflicts_with_known_bits(i32 %a, i32 %b) {
 ; CHECK-LABEL: @assumption_conflicts_with_known_bits(
 ; CHECK-NEXT:    store i1 true, ptr poison, align 1
-; CHECK-NEXT:    ret i32 1
+; CHECK-NEXT:    ret i32 poison
 ;
   %and1 = and i32 %b, 3
   %B1 = lshr i32 %and1, %and1
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index a84904106eced4..d9734242a86891 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2925,10 +2925,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
 
 define i16 @select_replacement_loop4(i16 noundef %p_12) {
 ; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
-; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12]], 1
-; CHECK-NEXT:    [[AND2:%.*]] = select i1 [[CMP1]], i16 [[AND1]], i16 0
-; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i16 [[AND2]], [[P_12]]
+; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp ult i16 [[P_12]], 2
 ; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP2]], i16 [[AND1]], i16 0
 ; CHECK-NEXT:    ret i16 [[AND3]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll
index 62f32c28683711..bef7fc81a7d1f9 100644
--- a/llvm/test/Transforms/InstCombine/shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift.ll
@@ -1751,12 +1751,11 @@ define void @ashr_out_of_range_1(ptr %A) {
 ; CHECK-NEXT:    [[L:%.*]] = load i177, ptr [[A:%.*]], align 4
 ; CHECK-NEXT:    [[L_FROZEN:%.*]] = freeze i177 [[L]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i177 [[L_FROZEN]], -1
-; CHECK-NEXT:    [[B:%.*]] = select i1 [[TMP1]], i177 0, i177 [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP2:%.*]] = trunc i177 [[B]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = trunc i177 [[L_FROZEN]] to i64
+; CHECK-NEXT:    [[TMP2:%.*]] = select i1 [[TMP1]], i64 0, i64 [[TMP6]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]
 ; CHECK-NEXT:    [[G11:%.*]] = getelementptr i8, ptr [[TMP3]], i64 -24
-; CHECK-NEXT:    [[C17:%.*]] = icmp sgt i177 [[B]], [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[C17]] to i64
+; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[TMP1]] to i64
 ; CHECK-NEXT:    [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP4]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i177 [[L_FROZEN]], -1
 ; CHECK-NEXT:    [[B28:%.*]] = select i1 [[TMP5]], i177 0, i177 [[L_FROZEN]]

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 17, 2024
if (match(R0, m_ImmConstant())) {
std::swap(R0, R1);
RPred = ICmpInst::getSwappedPredicate(RPred);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this by checking operand equality instead, as you did in the other patch? I'm thinking there is some cleanup to be done here: We have areMatchingOperands checking for swap, and then we also have it in the isUnsigned() code path below, and then this adds another case.

Or does that run afoul of the case where the constants are equal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking matching does work (didn't see any difference between the two). Ill update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the swapping code below now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup with a little bit more work in the initial swaps.

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

Could you please also add one explicit test for this? The other ones all look like fuzzer generated test cases that might break randomly again in the future...

@goldsteinn goldsteinn force-pushed the goldsteinn/non-canonical-implied branch from 9e43e77 to e6d6d0a Compare March 17, 2024 20:41
…Cmps`

We don't always have canonical order here, so do it manually.
@goldsteinn goldsteinn force-pushed the goldsteinn/non-canonical-implied branch from e6d6d0a to beb6e1a Compare March 17, 2024 21:36
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

std::swap(R0, R1);
RPred = ICmpInst::getSwappedPredicate(RPred);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a newline here?

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 18, 2024

This patch seems to break minmax idiom: dtcxzyw/llvm-opt-benchmark#421 (comment)

@goldsteinn
Copy link
Contributor Author

This patch seems to break minmax idiom: dtcxzyw/llvm-opt-benchmark#421 (comment)

Its not really this change that breaking it, this patch is providing information that causes a fold to break it no?
Either way, Ill look into it tomorrow.

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 18, 2024

this patch is providing information that causes a fold to break it

Yeah.

@goldsteinn
Copy link
Contributor Author

This patch seems to break minmax idiom: dtcxzyw/llvm-opt-benchmark#421 (comment)

Investigated the regression a bit. A reduced case is:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare void @use(i64)

define i32 @ParseCipherList(i64 %sub.ptr.lhs.cast) {
entry:
  %currLen = alloca i32, i32 0, align 4
  br label %do.body24

do.body24:  ; preds = %if.end34, %entry
  %conv29 = trunc i64 %sub.ptr.lhs.cast to i32
  store i32 %conv29, ptr %currLen, align 4
  %0 = load i32, ptr %currLen, align 4
  %cmp30 = icmp ugt i32 48, %0
  br i1 %cmp30, label %if.then32, label %if.end34

if.then32:  ; preds = %do.body24
  %1 = load i32, ptr %currLen, align 4
  br label %if.end34

if.end34:  ; preds = %if.then32, %do.body24
  %length.0 = phi i32 [ %1, %if.then32 ], [ 48, %do.body24 ]
  %conv35 = zext i32 %length.0 to i64
  call void @use(i64 %conv35)
  %cmp38 = icmp eq i64 %conv35, 49
  %cond = select i1 %cmp38, i32 0, i32 %length.0
  %idxprom = zext i32 %cond to i64
  %arrayidx40 = getelementptr [49 x i8], ptr null, i64 0, i64 %idxprom
  store i8 0, ptr %arrayidx40, align 1
  br label %do.body24
}

What ends up happening is, with the change, earlycse is able to simplify %cond so we get:

if.end34:                                         ; preds = %if.then32, %do.body24
  %length.0 = phi i32 [ %conv29, %if.then32 ], [ 48, %do.body24 ]
  %conv35 = zext i32 %length.0 to i64
  call void @use(i64 %conv35)
  %arrayidx40 = getelementptr [49 x i8], ptr null, i64 0, i64 %conv35
  store i8 0, ptr %arrayidx40, align 1
  br label %do.body24

Without the change earlycse produces:

  %length.0 = phi i32 [ %conv29, %if.then32 ], [ 48, %do.body24 ]
  %conv35 = zext i32 %length.0 to i64
  call void @use(i64 %conv35)
  %cmp38 = icmp eq i64 %conv35, 49
  %cond = select i1 %cmp38, i32 0, i32 %length.0
  %idxprom = zext i32 %cond to i64
  %arrayidx40 = getelementptr [49 x i8], ptr null, i64 0, i64 %idxprom
  store i8 0, ptr %arrayidx40, align 1
  br label %do.body24

The big difference then comes up when we reach instcombine. without the earlycse cleanup, we do essentially same simplification in instcombine/instsimplify:

  %length.0 = phi i32 [ %conv29, %if.then32 ], [ 48, %do.body24 ]
  %conv35 = zext nneg i32 %length.0 to i64
  call void @use(i64 %conv35)
  %idxprom = zext nneg i32 %length.0 to i64
  %arrayidx40 = getelementptr [49 x i8], ptr null, i64 0, i64 %idxprom
  store i8 poison, ptr %arrayidx40, align 1
  br label %do.body24

But we end up duplicating the zext nneg i32 %length.0 to i64 instruction instead of folding it all to one instruction.
This makes the phi multi-use which disables canEvaluateZExtd which preserves the minmax pattern in the phi for when its later reduced to select by simplifycfg.

I don't think any part of how this regression got introduced is really fixable.

Ultimately the fix boils down to matching:

define i32 @ParseCipherList(i64 %sub.ptr.lhs.cast) local_unnamed_addr {
  %conv29 = trunc i64 %sub.ptr.lhs.cast to i32
  %cmp30 = icmp ult i32 %conv29, 48
  %spec.select = select i1 %cmp30, i64 %sub.ptr.lhs.cast, i64 48
  %conv35 = and i64 %spec.select, 63
  ...

As a min/max pattern which is a bit tough. Its not really possible w.o
the context of the and instruction (with the and we can simplify the
true arm of the select back to %conv29 to recreate the pattern).

This is doable in foldSelectInstWithICmp before canonicalizeSPF
but I don't really see a way to make this anything but extremely brittle,
as I don't see a reasonable way to introduce the demanded bits aspect
(so we can match trunc of an arm) into matchDecomposedSelectPattern.

My feeling is this is basically wont fix. Do you see any clean way to
address this?

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 19, 2024

I agree that we can just leave it as is until we find similar patterns in the future :)

@goldsteinn
Copy link
Contributor Author

I agree that we can just leave it as is until we find similar patterns in the future :)

Yeah, I might try my luck at patch for integrating into simplifyWithDemandedBiys, but we'll see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants