Skip to content

[LV] Decompose WidenIntOrFPInduction into phi and update recipes #82021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nikolaypanchenko
Copy link
Contributor

Loop Vectorizer still has two recipes VPWidenIntOrFpInductionRecipe and VPWidenPointerInductionRecipe that behave in a VPlan as phi-like, as they're derived from VPHeaderPHIRecipe, but their generate functions construct vector phi and vector self-update in the vectorized loop.

This is not only bad from readability of a VPlan, but also requires more code to maintain such behavior. For instance, there's already ad-hoc code motion to move generated updates of these recipes closer to the loop latch.

The changeset:

  • Adds WidenVFxUF to represent broadcast({1...UF} x VFxUF) value
  • Decomposes existing VPWidenIntOrFpInductionRecipe into
  WIDEN-INDUCTION vp<%iv> = phi ir<0>, vp<%be-value>
  ...
  EMIT vp<%widen-step> = mul ir<%step>, vp<WidenVFxUF>
  EMIT vp<%be-value> = add vp<%iv>,vp<%widen-step>
  • Moves trunc optimization of widen IV into VPlan xform
  • Adds trivial cyclic dependency removal and mark some binops as non side-effecting
  • Adds element type to VPValue to query it for artifical added VPValue without underlying instruction

@llvmbot
Copy link
Member

llvmbot commented Feb 16, 2024

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Kolya Panchenko (nikolaypanchenko)

Changes

Loop Vectorizer still has two recipes VPWidenIntOrFpInductionRecipe and VPWidenPointerInductionRecipe that behave in a VPlan as phi-like, as they're derived from VPHeaderPHIRecipe, but their generate functions construct vector phi and vector self-update in the vectorized loop.

This is not only bad from readability of a VPlan, but also requires more code to maintain such behavior. For instance, there's already ad-hoc code motion to move generated updates of these recipes closer to the loop latch.

The changeset:

  • Adds WidenVFxUF to represent broadcast({1...UF} x VFxUF) value
  • Decomposes existing VPWidenIntOrFpInductionRecipe into
  WIDEN-INDUCTION vp&lt;%iv&gt; = phi ir&lt;0&gt;, vp&lt;%be-value&gt;
  ...
  EMIT vp&lt;%widen-step&gt; = mul ir&lt;%step&gt;, vp&lt;WidenVFxUF&gt;
  EMIT vp&lt;%be-value&gt; = add vp&lt;%iv&gt;,vp&lt;%widen-step&gt;
  • Moves trunc optimization of widen IV into VPlan xform
  • Adds trivial cyclic dependency removal and mark some binops as non side-effecting
  • Adds element type to VPValue to query it for artifical added VPValue without underlying instruction

Patch is 3.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82021.diff

171 Files Affected:

  • (modified) llvm/include/llvm/Analysis/IVDescriptors.h (+5)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+88-36)
  • (modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+13-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+54-16)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+57-32)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+13-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+26-60)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+81-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+19-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+42-44)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+120-120)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+64-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-trunc.ll (+62-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaved-store-of-first-order-recurrence.ll (+49-14)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_prefer_scalable.ll (+31-31)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/outer_loop_test1_no_explicit_vect_width.ll (+123-57)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+11-11)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+19-20)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll (+78-16)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+844-844)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll (+36-36)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+2903-778)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+22-22)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+14-15)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+149-54)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll (+11-11)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+173-168)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll (+170-170)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll (+63-19)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+43-43)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+11-11)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+109-109)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+158-158)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+149-149)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll (+131-32)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+56-51)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-call-linear-args.ll (+56-69)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+81-83)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+35-35)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+21-22)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/masked_gather_scatter.ll (+66-66)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+39-39)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-interleaved.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+106-106)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+580-214)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+123-125)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+238-243)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/zvl32b.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll (+202-41)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+30-30)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+496-119)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+167-104)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+31-40)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+54-54)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+27-27)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+364-364)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+42-46)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll (+118-57)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+24-27)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+39-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr54634.ll (+19-25)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll (+245-15)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+60-61)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+29-32)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+47-58)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+8-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+6-7)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll (+189-191)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll (+23-24)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+88-98)
  • (modified) llvm/test/Transforms/LoopVectorize/branch-weights.ll (+101-52)
  • (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+6-7)
  • (modified) llvm/test/Transforms/LoopVectorize/cast-induction.ll (+363-58)
  • (modified) llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/dbg-outer-loop-vect.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll (+31-31)
  • (modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-trunc-induction-steps.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+3-72)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll (+648-198)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+3-352)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+174-176)
  • (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+138-149)
  • (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/fpsat.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/i8-induction.ll (+98-4)
  • (modified) llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll (+3-1)
  • (modified) llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll (+104-112)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+8-7)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-ptrcasts.ll (+83-17)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+226-75)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-unroll-novec.ll (+59-20)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+839-880)
  • (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+15-17)
  • (modified) llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll (+10-13)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+35-35)
  • (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/loop-form.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/loop-scalars.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+114-114)
  • (modified) llvm/test/Transforms/LoopVectorize/outer-loop-vec-phi-predecessor-order.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/outer_loop_hcfg_construction.ll (+27-18)
  • (modified) llvm/test/Transforms/LoopVectorize/outer_loop_scalable.ll (+32-31)
  • (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test1.ll (+62-29)
  • (modified) llvm/test/Transforms/LoopVectorize/outer_loop_test2.ll (+94-40)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll (+28-28)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-select-runtime-checks.ll (+99-99)
  • (modified) llvm/test/Transforms/LoopVectorize/pr30654-phiscev-sext-trunc.ll (+33-33)
  • (modified) llvm/test/Transforms/LoopVectorize/pr35773.ll (+57-16)
  • (modified) llvm/test/Transforms/LoopVectorize/pr37248.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr45259.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+90-102)
  • (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+45-45)
  • (modified) llvm/test/Transforms/LoopVectorize/pr55100-expand-scev-predicate-used.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+27-27)
  • (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/pr59319-loop-access-info-invalidation.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-align.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+154-154)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+192-198)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+21-21)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+136-70)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-predselect.ll (+61-61)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-small-size.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+87-87)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-check-needed-but-empty.ll (+16-17)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll (+11-11)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+1026-88)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+66-65)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-reduction-inloop.ll (+69-26)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/scalarize-masked-call.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+86-87)
  • (modified) llvm/test/Transforms/LoopVectorize/skeleton-lcssa-crash.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+33-35)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+14-30)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+135-49)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+64-63)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll (+32-32)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll (+52-52)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll (+210-210)
  • (modified) llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-geps.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+3-1)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing.ll (+14-4)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+30-16)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-vectorize-inner-loop-reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-widen-call-instruction.ll (+1-1)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTest.cpp (-10)
diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 5c7b613ac48c40..7ca13adae87f6a 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -363,6 +363,11 @@ class InductionDescriptor {
     return nullptr;
   }
 
+  const Instruction *getExactFPMathInst() const {
+    return const_cast<const Instruction *>(
+        const_cast<InductionDescriptor *>(this)->getExactFPMathInst());
+  }
+
   /// Returns binary opcode of the induction operator.
   Instruction::BinaryOps getInductionOpcode() const {
     return InductionBinOp ? InductionBinOp->getOpcode()
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 98b177cf5d2d0e..92b783d3badeae 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8114,34 +8114,6 @@ VPHeaderPHIRecipe *VPRecipeBuilder::tryToOptimizeInductionPHI(
   return nullptr;
 }
 
-VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
-    TruncInst *I, ArrayRef<VPValue *> Operands, VFRange &Range, VPlan &Plan) {
-  // Optimize the special case where the source is a constant integer
-  // induction variable. Notice that we can only optimize the 'trunc' case
-  // because (a) FP conversions lose precision, (b) sext/zext may wrap, and
-  // (c) other casts depend on pointer size.
-
-  // Determine whether \p K is a truncation based on an induction variable that
-  // can be optimized.
-  auto isOptimizableIVTruncate =
-      [&](Instruction *K) -> std::function<bool(ElementCount)> {
-    return [=](ElementCount VF) -> bool {
-      return CM.isOptimizableIVTruncate(K, VF);
-    };
-  };
-
-  if (LoopVectorizationPlanner::getDecisionAndClampRange(
-          isOptimizableIVTruncate(I), Range)) {
-
-    auto *Phi = cast<PHINode>(I->getOperand(0));
-    const InductionDescriptor &II = *Legal->getIntOrFpInductionDescriptor(Phi);
-    VPValue *Start = Plan.getVPValueOrAddLiveIn(II.getStartValue());
-    return createWidenInductionRecipes(Phi, I, Start, II, Plan, *PSE.getSE(),
-                                       *OrigLoop, Range);
-  }
-  return nullptr;
-}
-
 VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
                                            ArrayRef<VPValue *> Operands,
                                            VPlanPtr &Plan) {
@@ -8275,6 +8247,70 @@ bool VPRecipeBuilder::shouldWiden(Instruction *I, VFRange &Range) const {
                                                              Range);
 }
 
+VPWidenCastRecipe *VPRecipeBuilder::createCast(VPValue *V, Type *From,
+                                               Type *To) {
+  if (From == To)
+    return nullptr;
+  Instruction::CastOps CastOpcode;
+  if (To->isIntegerTy() && From->isIntegerTy())
+    CastOpcode = To->getPrimitiveSizeInBits() < From->getPrimitiveSizeInBits()
+                     ? Instruction::Trunc
+                     : Instruction::ZExt;
+  else if (To->isIntegerTy())
+    CastOpcode = Instruction::FPToUI;
+  else
+    CastOpcode = Instruction::UIToFP;
+
+  return new VPWidenCastRecipe(CastOpcode, V, To);
+}
+
+VPRecipeBase *
+VPRecipeBuilder::createWidenStep(VPWidenIntOrFpInductionRecipe &WIV,
+                                 ScalarEvolution &SE, VPlan &Plan,
+                                 DenseSet<VPRecipeBase *> *CreatedRecipes) {
+  PHINode *PN = WIV.getPHINode();
+  const InductionDescriptor &IndDesc = WIV.getInductionDescriptor();
+  VPValue *ScalarStep =
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, IndDesc.getStep(), SE);
+  Type *VFxUFTy = Plan.getVFxUF().getElementType();
+  Type *StepTy = IndDesc.getStep()->getType();
+  VPValue *WidenVFxUF = &Plan.getWidenVFxUF();
+  VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
+  if (VPWidenCastRecipe *WidenVFxUFCast =
+          createCast(&Plan.getWidenVFxUF(), VFxUFTy, StepTy)) {
+    WidenVFxUFCast->insertBefore(LatchVPBB->getTerminator());
+    if (CreatedRecipes)
+      CreatedRecipes->insert(WidenVFxUFCast);
+    WidenVFxUF = WidenVFxUFCast->getVPSingleValue();
+  }
+  const Instruction::BinaryOps UpdateOp =
+      IndDesc.getInductionOpcode() != Instruction::BinaryOpsEnd
+          ? IndDesc.getInductionOpcode()
+          : Instruction::Add;
+  VPInstruction *Update;
+  if (StepTy->isIntegerTy()) {
+    VPInstruction *Mul = new VPInstruction(
+        Instruction::Mul, {WidenVFxUF, ScalarStep}, PN->getDebugLoc());
+    Mul->insertBefore(LatchVPBB->getTerminator());
+    if (CreatedRecipes)
+      CreatedRecipes->insert(Mul);
+    Update = new VPInstruction(UpdateOp, {&WIV, Mul}, PN->getDebugLoc());
+    Update->insertBefore(LatchVPBB->getTerminator());
+  } else {
+    FastMathFlags FMF = IndDesc.getExactFPMathInst()
+                            ? IndDesc.getExactFPMathInst()->getFastMathFlags()
+                            : FastMathFlags();
+    VPInstruction *Mul = new VPInstruction(
+        Instruction::FMul, {WidenVFxUF, ScalarStep}, FMF, PN->getDebugLoc());
+    Mul->insertBefore(LatchVPBB->getTerminator());
+    Update = new VPInstruction(UpdateOp, {&WIV, Mul}, FMF, PN->getDebugLoc());
+    Update->insertBefore(LatchVPBB->getTerminator());
+  }
+  if (CreatedRecipes)
+    CreatedRecipes->insert(Update);
+  return Update;
+}
+
 VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
                                            ArrayRef<VPValue *> Operands,
                                            VPBasicBlock *VPBB, VPlanPtr &Plan) {
@@ -8324,10 +8360,15 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
   };
 }
 
-void VPRecipeBuilder::fixHeaderPhis() {
+void VPRecipeBuilder::fixHeaderPhis(VPlan &Plan) {
   BasicBlock *OrigLatch = OrigLoop->getLoopLatch();
   for (VPHeaderPHIRecipe *R : PhisToFix) {
-    auto *PN = cast<PHINode>(R->getUnderlyingValue());
+    if (auto *VPWIFR = dyn_cast<VPWidenIntOrFpInductionRecipe>(R)) {
+      VPWIFR->addOperand(
+          createWidenStep(*VPWIFR, *PSE.getSE(), Plan)->getVPSingleValue());
+      continue;
+    }
+    PHINode *PN = cast<PHINode>(R->getUnderlyingValue());
     VPRecipeBase *IncR =
         getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));
     R->addOperand(IncR->getVPSingleValue());
@@ -8405,8 +8446,12 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
     // can have earlier phis as incoming values.
     recordRecipeOf(Phi);
 
-    if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range)))
+    if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, *Plan, Range))) {
+      if (isa<VPWidenPointerInductionRecipe>(Recipe))
+        return Recipe;
+      PhisToFix.push_back(cast<VPWidenIntOrFpInductionRecipe>(Recipe));
       return Recipe;
+    }
 
     VPHeaderPHIRecipe *PhiRecipe = nullptr;
     assert((Legal->isReductionVariable(Phi) ||
@@ -8441,10 +8486,17 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
     return PhiRecipe;
   }
 
-  if (isa<TruncInst>(Instr) &&
-      (Recipe = tryToOptimizeInductionTruncate(cast<TruncInst>(Instr), Operands,
-                                               Range, *Plan)))
-    return Recipe;
+  if (isa<TruncInst>(Instr)) {
+    auto IsOptimizableIVTruncate =
+        [&](Instruction *K) -> std::function<bool(ElementCount)> {
+      return [=](ElementCount VF) -> bool {
+        return CM.isOptimizableIVTruncate(K, VF);
+      };
+    };
+
+    LoopVectorizationPlanner::getDecisionAndClampRange(
+        IsOptimizableIVTruncate(Instr), Range);
+  }
 
   // All widen recipes below deal only with VF > 1.
   if (LoopVectorizationPlanner::getDecisionAndClampRange(
@@ -8707,7 +8759,7 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
          !Plan->getVectorLoopRegion()->getEntryBasicBlock()->empty() &&
          "entry block must be set to a VPRegionBlock having a non-empty entry "
          "VPBasicBlock");
-  RecipeBuilder.fixHeaderPhis();
+  RecipeBuilder.fixHeaderPhis(*Plan);
 
   // ---------------------------------------------------------------------------
   // Transform initial VPlan: Apply previously taken decisions, in order, to
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index b1498026adadfe..126a6b1c061265 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -146,6 +146,18 @@ class VPRecipeBuilder {
   /// between SRC and DST.
   VPValue *getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const;
 
+  /// A helper function to create VPWidenCastRecipe of a \p V VPValue to a \p To
+  /// type.
+  /// FIXME: Remove \p From argument and take it from a \p V value
+  static VPWidenCastRecipe *createCast(VPValue *V, Type *From, Type *To);
+
+  /// A helper function which widens \p WIV step, multiplies it by WidenVFxUF
+  /// and attaches to loop latch of the \p Plan. Returns multiplication.
+  static VPRecipeBase *
+  createWidenStep(VPWidenIntOrFpInductionRecipe &WIV, ScalarEvolution &SE,
+                  VPlan &Plan,
+                  DenseSet<VPRecipeBase *> *CreatedRecipes = nullptr);
+
   /// Mark given ingredient for recording its recipe once one is created for
   /// it.
   void recordRecipeOf(Instruction *I) {
@@ -171,7 +183,7 @@ class VPRecipeBuilder {
 
   /// Add the incoming values from the backedge to reduction & first-order
   /// recurrence cross-iteration phis.
-  void fixHeaderPhis();
+  void fixHeaderPhis(VPlan &Plan);
 };
 } // end namespace llvm
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 2c0daa82afa59f..96732b77a9db3d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -76,12 +76,25 @@ Value *VPLane::getAsRuntimeExpr(IRBuilderBase &Builder,
   llvm_unreachable("Unknown lane kind");
 }
 
-VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def)
-    : SubclassID(SC), UnderlyingVal(UV), Def(Def) {
+VPValue::VPValue(const unsigned char SC, Value *UV, VPDef *Def, Type *Ty)
+    : SubclassID(SC), UnderlyingVal(UV), UnderlyingTy(Ty), Def(Def) {
+  if (UnderlyingTy)
+    assert((!UnderlyingVal || UnderlyingVal->getType() == UnderlyingTy) &&
+           "VPValue with set type should either be created without underlying "
+           "value or type should match the given type");
   if (Def)
     Def->addDefinedValue(this);
 }
 
+Type *VPValue::getElementType() {
+  return const_cast<Type *>(
+      const_cast<const VPValue *>(this)->getElementType());
+}
+
+const Type *VPValue::getElementType() const {
+  return UnderlyingVal ? UnderlyingVal->getType() : UnderlyingTy;
+}
+
 VPValue::~VPValue() {
   assert(Users.empty() && "trying to delete a VPValue with remaining users");
   if (Def)
@@ -763,6 +776,10 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
   auto Plan = std::make_unique<VPlan>(Preheader, VecPreheader);
   Plan->TripCount =
       vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
+  Type *TCType = TripCount->getType();
+  Plan->getVectorTripCount().setElementType(TCType);
+  Plan->getVFxUF().setElementType(TCType);
+  Plan->getWidenVFxUF().setElementType(TCType);
   // Create empty VPRegionBlock, to be filled during processing later.
   auto *TopRegion = new VPRegionBlock("vector loop", false /*isReplicator*/);
   VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
@@ -796,6 +813,18 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
             createStepForVF(Builder, TripCountV->getType(), State.VF, State.UF),
             0);
 
+  if (WidenVFxUF.getNumUsers() > 0)
+    for (unsigned Part = 0, UF = State.UF; Part < UF; ++Part) {
+      Value *Step =
+          createStepForVF(Builder, TripCountV->getType(), State.VF, Part+1);
+      if (State.VF.isScalar())
+        State.set(&WidenVFxUF, Step, Part);
+      else
+        State.set(&WidenVFxUF,
+                  Builder.CreateVectorSplat(State.VF, Step, "widen.vfxuf"),
+                  Part);
+    }
+
   // When vectorizing the epilogue loop, the canonical induction start value
   // needs to be changed from zero to the value after the main vector loop.
   // FIXME: Improve modeling for canonical IV start values in the epilogue loop.
@@ -845,21 +874,16 @@ void VPlan::execute(VPTransformState *State) {
     if (isa<VPWidenPHIRecipe>(&R))
       continue;
 
-    if (isa<VPWidenPointerInductionRecipe>(&R) ||
-        isa<VPWidenIntOrFpInductionRecipe>(&R)) {
+    if (isa<VPWidenPointerInductionRecipe>(&R)) {
       PHINode *Phi = nullptr;
-      if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-        Phi = cast<PHINode>(State->get(R.getVPSingleValue(), 0));
-      } else {
-        auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
-        // TODO: Split off the case that all users of a pointer phi are scalar
-        // from the VPWidenPointerInductionRecipe.
-        if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
-          continue;
-
-        auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
-        Phi = cast<PHINode>(GEP->getPointerOperand());
-      }
+      auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+      // TODO: Split off the case that all users of a pointer phi are scalar
+      // from the VPWidenPointerInductionRecipe.
+      if (WidenPhi->onlyScalarsGenerated(State->VF.isScalable()))
+        continue;
+
+      auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi, 0));
+      Phi = cast<PHINode>(GEP->getPointerOperand());
 
       Phi->setIncomingBlock(1, VectorLatchBB);
 
@@ -877,6 +901,7 @@ void VPlan::execute(VPTransformState *State) {
     // generated.
     bool SinglePartNeeded = isa<VPCanonicalIVPHIRecipe>(PhiR) ||
                             isa<VPFirstOrderRecurrencePHIRecipe>(PhiR) ||
+                            isa<VPWidenIntOrFpInductionRecipe>(PhiR) ||
                             (isa<VPReductionPHIRecipe>(PhiR) &&
                              cast<VPReductionPHIRecipe>(PhiR)->isOrdered());
     unsigned LastPartForNewPhi = SinglePartNeeded ? 1 : State->UF;
@@ -908,6 +933,12 @@ void VPlan::printLiveIns(raw_ostream &O) const {
     O << " = VF * UF";
   }
 
+  if (WidenVFxUF.getNumUsers() > 0) {
+    O << "\nLive-in ";
+    WidenVFxUF.printAsOperand(O, SlotTracker);
+    O << " = WIDEN VF * UF";
+  }
+
   if (VectorTripCount.getNumUsers() > 0) {
     O << "\nLive-in ";
     VectorTripCount.printAsOperand(O, SlotTracker);
@@ -1083,6 +1114,11 @@ VPlan *VPlan::duplicate() {
   }
   Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount;
   Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF;
+  Old2NewVPValues[&WidenVFxUF] = &NewPlan->WidenVFxUF;
+  NewPlan->getVectorTripCount().setElementType(
+      getVectorTripCount().getElementType());
+  NewPlan->getVFxUF().setElementType(getVFxUF().getElementType());
+  NewPlan->getWidenVFxUF().setElementType(getWidenVFxUF().getElementType());
   if (BackedgeTakenCount) {
     NewPlan->BackedgeTakenCount = new VPValue();
     Old2NewVPValues[BackedgeTakenCount] = NewPlan->BackedgeTakenCount;
@@ -1379,6 +1415,8 @@ void VPSlotTracker::assignSlot(const VPValue *V) {
 void VPSlotTracker::assignSlots(const VPlan &Plan) {
   if (Plan.VFxUF.getNumUsers() > 0)
     assignSlot(&Plan.VFxUF);
+  if (Plan.WidenVFxUF.getNumUsers() > 0)
+    assignSlot(&Plan.WidenVFxUF);
   assignSlot(&Plan.VectorTripCount);
   if (Plan.BackedgeTakenCount)
     assignSlot(Plan.BackedgeTakenCount);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 13e1859ad6b250..306c2200ca34c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1618,38 +1618,65 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
   }
 };
 
-/// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
-class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
-  PHINode *IV;
-  TruncInst *Trunc;
+/// A base class for all widen induction-like recipes
+class VPWidenInductionBasePHIRecipe : public VPHeaderPHIRecipe {
+protected:
   const InductionDescriptor &IndDesc;
 
 public:
-  VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+  VPWidenInductionBasePHIRecipe(unsigned char VPDefID, Instruction *Instr,
+                                VPValue *Start, VPValue *Step,
                                 const InductionDescriptor &IndDesc)
-      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV, Start), IV(IV),
-        Trunc(nullptr), IndDesc(IndDesc) {
+      : VPHeaderPHIRecipe(VPDefID, Instr, Start), IndDesc(IndDesc) {
     addOperand(Step);
   }
 
+  ~VPWidenInductionBasePHIRecipe() override = default;
+
+  /// Returns the step value of the induction.
+  VPValue *getStepValue() { return getOperand(1); }
+  const VPValue *getStepValue() const { return getOperand(1); }
+
+  /// Returns the induction descriptor for the recipe.
+  const InductionDescriptor &getInductionDescriptor() const { return IndDesc; }
+};
+
+/// A recipe for handling phi nodes of integer and floating-point inductions,
+/// producing their vector values.
+class VPWidenIntOrFpInductionRecipe : public VPWidenInductionBasePHIRecipe {
+  PHINode *IV = nullptr;
+  TruncInst *Trunc = nullptr;
+
+public:
+  VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
+                                const InductionDescriptor &IndDesc)
+      : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, IV,
+                                      Start, Step, IndDesc),
+        IV(IV), Trunc(nullptr) {}
+
   VPWidenIntOrFpInductionRecipe(PHINode *IV, VPValue *Start, VPValue *Step,
                                 const InductionDescriptor &IndDesc,
                                 TruncInst *Trunc)
-      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc, Start),
-        IV(IV), Trunc(Trunc), IndDesc(IndDesc) {
-    addOperand(Step);
-  }
+      : VPWidenInductionBasePHIRecipe(VPDef::VPWidenIntOrFpInductionSC, Trunc,
+                                      Start, Step, IndDesc),
+        IV(IV), Trunc(Trunc) {}
 
   ~VPWidenIntOrFpInductionRecipe() override = default;
 
   VPRecipeBase *clone() override {
-    return new VPWidenIntOrFpInductionRecipe(IV, getStartValue(),
-                                             getStepValue(), IndDesc, Trunc);
+    VPRecipeBase *Cloned = new VPWidenIntOrFpInductionRecipe(
+        getPHINode(), getStartValue(), getStepValue(), IndDesc, Trunc);
+    if (getNumOperands() == 3)
+      Cloned->addOperand(getOperand(2));
+    return Cloned;
   }
 
   VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
 
+  static inline bool classof(const VPHeaderPHIRecipe *R) {
+    return R->getVPDefID() == VPDef::VPWidenIntOrFpInductionSC;
+  }
+
   /// Generate the vectorized and scalarized versions of the phi node as
   /// needed by their users.
   void execute(VPTransformState &State) override;
@@ -1660,33 +1687,24 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
              VPSlotTracker &SlotTracker) const override;
 #endif
 
-  VPValue *getBackedgeValue() override {
-    // TODO: All operands of base recipe must exist and be at same index in
-    // derived recipe.
-    llvm_unreachable(
-        "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+  VPValue *getBackedgeValue() override final {
+    if (getNumOperands() != 3)
+      llvm_unreachable(
+          "VPWidenIntOrFpInductionRecipe::getBackedgeValue is not yet valid");
+    return getOperand(2);
   }
 
-  VPRecipeBase &getBackedgeRecipe() override {
-    // TODO: All operands of base recipe must exist and be at same index in
-    // derived recipe.
-    llvm_unreachable(
-        "VPWidenIntOrFpInductionRecipe generates its own backedge value");
+  VPRecipeBase &getBackedgeRecipe() override final {
+    return *getBackedgeValue()->getDefiningRecipe();
   }
 
-  /// Returns the step value of the induction.
-  VPValue *getStepValue() { return getOperand(1); }
-  const VPValue *getStepValue() const { return getOperand(1); }
-
   /// Returns the first defined value as TruncInst, if it is one or nullptr
   /// otherwise.
   TruncInst *getTruncInst() { return Trunc; }
   const TruncInst *getTruncInst() const { retu...
[truncated]

Copy link

github-actions bot commented Feb 16, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -48,7 +49,7 @@ for.end:
; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:
; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 0, %iv.next, ir<1>
; CHECK-NEXT: WIDEN-INDUCTION ir<%iv> = phi ir<0>, vp<[[NEXT_WIV:%.+]]>, ir<1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of ir<1>?
Is it necessary after this patch decompose WidenIntOrFPInduction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in dump ir<> represents VPValue that has underlying LLVM IR value.
Prior to that changeset WidenIntOrFPInduction::print method printed LLVM IR PHINode. I've changed it to print VPValues used by the WidenIntOrFPInduction

Loop Vectorizer still has two recipes `VPWidenIntOrFpInductionRecipe`
and `VPWidenPointerInductionRecipe` that behave in a VPlan as
phi-like, as they're derived from `VPHeaderPHIRecipe`, but their generate
functions construct vector phi and vector self-update in the vectorized loop.

This is not only bad from readability of a VPlan, but also requires more code to
maintain such behavior. For instance, there's already ad-hoc code motion
to move generated updates of these recipes closer to the loop latch.

The changeset:
* Adds `WidenVFxUF` to represent `broadcast({1...UF} x `VFxUF`)` value
* Decomposes existing `VPWidenIntOrFpInductionRecipe` into
```
  WIDEN-INDUCTION vp<%iv> = phi ir<0>, vp<%be-value>
  ...
  EMIT vp<%widen-step> = mul ir<%step>, vp<WidenVFxUF>
  EMIT vp<%be-value> = add vp<%iv>,vp<%widen-step>
```
* Moves trunc optimization of widen IV into VPlan xform
* Adds trivial cyclic dependency removal and mark some binops as
  non side-effecting
* Adds element type to `VPValue` to query it for artifical added
  `VPValue` without underlying instruction
@nikolaypanchenko
Copy link
Contributor Author

@fhahn ping

2 similar comments
@nikolaypanchenko
Copy link
Contributor Author

@fhahn ping

@nikolaypanchenko
Copy link
Contributor Author

@fhahn ping

@fhahn fhahn requested a review from ayalz June 4, 2024 11:56
@nikolaypanchenko
Copy link
Contributor Author

@fhahn ping

@nikolaypanchenko
Copy link
Contributor Author

@fhahn ping

arcbbb added a commit to arcbbb/llvm-project that referenced this pull request Nov 7, 2024
…ataWithEVL vectorization mode.

As an alternative approach to llvm#82021, this patch lowers
VPWidenIntOrFpInductionRecipe into a widen phi recipe and step recipes,
computed using EVL in the EVL transformation phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants