-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[InstCombine] optimize powi(X,Y) * X with Ofast #69998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-transforms Author: Allen (vfdff) ChangesTry to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step. Fixes #69862. Full diff: https://github.com/llvm/llvm-project/pull/69998.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index bc784390c23be49..d3b07113ed7a183 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -716,6 +716,18 @@ Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
return replaceInstUsesWith(I, Pow);
}
+ // powi(X, Y) * X --> powi(X, Y+1)
+ // X * powi(X, Y) --> powi(X, Y+1)
+ if (match(&I, m_c_FMul(m_OneUse(m_Intrinsic<Intrinsic::powi>(m_Value(X),
+ m_Value(Y))),
+ m_Deferred(X))) &&
+ willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I)) {
+ auto *Y1 = Builder.CreateAdd(Y, ConstantInt::get(Y->getType(), 1));
+ auto *NewPow = Builder.CreateIntrinsic(
+ Intrinsic::powi, {X->getType(), Y1->getType()}, {X, Y1}, &I);
+ return replaceInstUsesWith(I, NewPow);
+ }
+
if (I.isOnlyUserOfAnyOperand()) {
// pow(X, Y) * pow(X, Z) -> pow(X, Y + Z)
if (match(Op0, m_Intrinsic<Intrinsic::pow>(m_Value(X), m_Value(Y))) &&
diff --git a/llvm/test/Transforms/InstCombine/powi.ll b/llvm/test/Transforms/InstCombine/powi.ll
index 89efbb6f4536113..95722d09a17ad32 100644
--- a/llvm/test/Transforms/InstCombine/powi.ll
+++ b/llvm/test/Transforms/InstCombine/powi.ll
@@ -341,3 +341,52 @@ define double @fdiv_pow_powi_negative_variable(double %x, i32 %y) {
%div = fdiv reassoc nnan double %p1, %x
ret double %div
}
+
+; powi(X, Y) * X --> powi(X, Y+1)
+define double @powi_fmul_powi_x(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x(
+; CHECK-NEXT: [[MUL:%.*]] = call reassoc double @llvm.powi.f64.i32(double [[X:%.*]], i32 4)
+; CHECK-NEXT: ret double [[MUL]]
+;
+ %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+ %mul = fmul reassoc double %p1, %x
+ ret double %mul
+}
+
+; Negative test: Multi-use
+define double @powi_fmul_powi_x_multi_use(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_multi_use(
+; CHECK-NEXT: [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT: tail call void @use(double [[P1]])
+; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT: ret double [[MUL]]
+;
+ %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+ tail call void @use(double %p1)
+ %mul = fmul reassoc double %p1, %x
+ ret double %mul
+}
+
+; Negative test: Miss fmf flag
+define double @powi_fmul_powi_x_missing_reassoc(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_missing_reassoc(
+; CHECK-NEXT: [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT: [[MUL:%.*]] = fmul double [[P1]], [[X]]
+; CHECK-NEXT: ret double [[MUL]]
+;
+ %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+ %mul = fmul double %p1, %x
+ ret double %mul
+}
+
+; Negative test: overflow
+define double @powi_fmul_powi_x_overflow(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_overflow(
+; CHECK-NEXT: [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 2147483647)
+; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT: ret double [[MUL]]
+;
+ %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 2147483647) ; INT_MAX
+ %mul = fmul reassoc double %p1, %x
+ ret double %mul
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have three variants of this optimization now: powi(x,a)*powi(x,b) => powi(x,a+b)
, powi(x,a)/x => powi(x,a-1)
and powi(x,a)*x => powi(x,a+1)
.
I'd suggest to move most of the transform into a helper function, that will accept x, a, b
, x, a, -1
and x, a, 1
in the above cases and then do the common checks and transform.
This will also fix the bug in the current powi * powi
transform, which fails to perform the necessary overflow check.
The ARM case already does end up with this optimization? How did that happen? |
Three variants of this optimization have many differences except the pattern match form, and it seems the only common check is |
hi @arsenm, @nikic : I have another idea, not sure if it's appropriate?
|
Fast-math flags tend to be applied on a per-project basis via command-line (i.e., per-IR module) flags, so it tends to be very rare for the flags to differ within a function, unless you're doing LTO, which is why it's unlikely to crop up very frequently. |
322d458
to
4423e80
Compare
Thanks, I add the restrict for the operands of fmul. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
auto *Powi = dyn_cast<IntrinsicInst>(I.getOperand(0)); | ||
if (!Powi) | ||
Powi = cast<IntrinsicInst>(I.getOperand(1)); | ||
if (Powi->hasAllowReassoc()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really have a way to put this flag check inside the matcher
@@ -571,6 +571,50 @@ Instruction *InstCombinerImpl::foldFPSignBitOps(BinaryOperator &I) { | |||
return nullptr; | |||
} | |||
|
|||
Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) { | |||
Value *X, *Y, *Z; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sink these down where they are used, looks like shadowing now
tail call void @use(double %p1) | ||
%p2 = tail call double @llvm.powi.f64.i32(double %x, i32 %y) | ||
%p2 = tail call reassoc double @llvm.powi.f64.i32(double %x, i32 %y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have a test where the 2 powi declarations have different integer types, should add one
m_Deferred(X))) && | ||
willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I)) | ||
return createPowiExpr(I, *this, X, Y, ConstantInt::get(Y->getType(), 1)); | ||
if (match(&I, m_c_FMul(m_AllowReassoc(m_OneUse(m_Intrinsic<Intrinsic::powi>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the m_AllowReassoc should be paired with the m_Intrinsic part, not after the oneUse
}; | ||
|
||
template <typename T> | ||
inline AllowReassoc_match<T> m_AllowReassoc(const T &SubPattern) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite the API I had in my mind. I envisioned the required flags as a template parameter to the existing matchers, so you could have something like:
m_FMul<Reassoc>(), m_Intrinsic<powi, Reassoc>
However I wasn't expecting you to do anything for this in this patch. This is fine for now
Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step. Similar to D109954, who requires reassoc. Fixes llvm#69862.
According the discussion, except the fmul itself, all its operands should also have reassoc flag. Add new API m_AllowReassoc to check reassoc flag
Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast
For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.
Fixes #69862.