-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[LV] Add initial support for vectorizing literal struct return values #109833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
59b3fa9
to
5518580
Compare
8ce936e
to
0f2cdbb
Compare
Marking the PR as "ready to review". I think this PR likely still needs further changes, but I don't plan on iterating on this much more without review 🙂 (to be sure this is the right direction). |
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-analysis Author: Benjamin Maxwell (MacDue) ChangesThis patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The intended use case for this is vectorizing intrinsics such as:
Mapping them to structure-returning library calls such as:
It could also be possible to vectorize the intrinsic (without a libcall) and then later lower the intrinsic to a library call. This may be desired if the only library calls available take output pointers rather than return multiple values. Implementing this required two main changes:
The first change is relatively straightforward, the second is larger as it requires changing assumptions that types are always scalars or vectors. In this patch, a "wide" type is defined as a vector, or a struct literal where all elements are vectors (of the same element count). To help with the second change some helpers for wide types have been added (that work similarly to existing vector helpers). These have been used along the paths needed to support vectorizing calls, however, I expect there are places that still only expect vector types. Patch is 49.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109833.diff 15 Files Affected:
diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index e2dd4976f39065..2a419560be3030 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -18,6 +18,7 @@
#include "llvm/Analysis/LoopAccessAnalysis.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/VFABIDemangler.h"
+#include "llvm/IR/VectorUtils.h"
#include "llvm/Support/CheckedArithmetic.h"
namespace llvm {
@@ -127,18 +128,8 @@ namespace Intrinsic {
typedef unsigned ID;
}
-/// A helper function for converting Scalar types to vector types. If
-/// the incoming type is void, we return void. If the EC represents a
-/// scalar, we return the scalar type.
-inline Type *ToVectorTy(Type *Scalar, ElementCount EC) {
- if (Scalar->isVoidTy() || Scalar->isMetadataTy() || EC.isScalar())
- return Scalar;
- return VectorType::get(Scalar, EC);
-}
-
-inline Type *ToVectorTy(Type *Scalar, unsigned VF) {
- return ToVectorTy(Scalar, ElementCount::getFixed(VF));
-}
+/// Returns true if `Ty` can be widened by the loop vectorizer.
+bool canWidenType(Type *Ty);
/// Identify if the intrinsic is trivially vectorizable.
/// This method returns true if the intrinsic's argument types are all scalars
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index c36a346c1b2e05..710a9a107e1eba 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1564,8 +1564,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
Type *RetTy = ICA.getReturnType();
ElementCount RetVF =
- (RetTy->isVectorTy() ? cast<VectorType>(RetTy)->getElementCount()
- : ElementCount::getFixed(1));
+ isWideTy(RetTy) ? getWideTypeVF(RetTy) : ElementCount::getFixed(1);
+
const IntrinsicInst *I = ICA.getInst();
const SmallVectorImpl<const Value *> &Args = ICA.getArgs();
FastMathFlags FMF = ICA.getFlags();
@@ -1886,10 +1886,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
InstructionCost ScalarizationCost = InstructionCost::getInvalid();
if (RetVF.isVector() && !RetVF.isScalable()) {
ScalarizationCost = 0;
- if (!RetTy->isVoidTy())
- ScalarizationCost += getScalarizationOverhead(
- cast<VectorType>(RetTy),
- /*Insert*/ true, /*Extract*/ false, CostKind);
+ if (!RetTy->isVoidTy()) {
+ for (Type *VectorTy : getContainedTypes(RetTy)) {
+ ScalarizationCost += getScalarizationOverhead(
+ cast<VectorType>(VectorTy),
+ /*Insert*/ true, /*Extract*/ false, CostKind);
+ }
+ }
ScalarizationCost +=
getOperandsScalarizationOverhead(Args, ICA.getArgTypes(), CostKind);
}
@@ -2480,27 +2483,32 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.
- if (auto *RetVTy = dyn_cast<VectorType>(RetTy)) {
+ if (isWideTy(RetTy)) {
+ const SmallVector<Type *, 2> RetVTys = getContainedTypes(RetTy);
+
// Scalable vectors cannot be scalarized, so return Invalid.
- if (isa<ScalableVectorType>(RetTy) || any_of(Tys, [](const Type *Ty) {
- return isa<ScalableVectorType>(Ty);
- }))
+ if (any_of(concat<Type *const>(RetVTys, Tys),
+ [](Type *Ty) { return isa<ScalableVectorType>(Ty); }))
return InstructionCost::getInvalid();
- InstructionCost ScalarizationCost =
- SkipScalarizationCost
- ? ScalarizationCostPassed
- : getScalarizationOverhead(RetVTy, /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ InstructionCost ScalarizationCost = ScalarizationCostPassed;
+ if (!SkipScalarizationCost) {
+ ScalarizationCost = 0;
+ for (Type *RetVTy : RetVTys) {
+ ScalarizationCost += getScalarizationOverhead(
+ cast<VectorType>(RetVTy), /*Insert*/ true,
+ /*Extract*/ false, CostKind);
+ }
+ }
- unsigned ScalarCalls = cast<FixedVectorType>(RetVTy)->getNumElements();
+ unsigned ScalarCalls = getWideTypeVF(RetTy).getFixedValue();
SmallVector<Type *, 4> ScalarTys;
for (Type *Ty : Tys) {
if (Ty->isVectorTy())
Ty = Ty->getScalarType();
ScalarTys.push_back(Ty);
}
- IntrinsicCostAttributes Attrs(IID, RetTy->getScalarType(), ScalarTys, FMF);
+ IntrinsicCostAttributes Attrs(IID, ToNarrowTy(RetTy), ScalarTys, FMF);
InstructionCost ScalarCost =
thisT()->getIntrinsicInstrCost(Attrs, CostKind);
for (Type *Ty : Tys) {
diff --git a/llvm/include/llvm/IR/VectorUtils.h b/llvm/include/llvm/IR/VectorUtils.h
new file mode 100644
index 00000000000000..e8e838d8287c42
--- /dev/null
+++ b/llvm/include/llvm/IR/VectorUtils.h
@@ -0,0 +1,53 @@
+//===----------- VectorUtils.h - Vector type utility functions -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/IR/DerivedTypes.h"
+
+namespace llvm {
+
+/// A helper function for converting Scalar types to vector types. If
+/// the incoming type is void, we return void. If the EC represents a
+/// scalar, we return the scalar type.
+inline Type *ToVectorTy(Type *Scalar, ElementCount EC) {
+ if (Scalar->isVoidTy() || Scalar->isMetadataTy() || EC.isScalar())
+ return Scalar;
+ return VectorType::get(Scalar, EC);
+}
+
+inline Type *ToVectorTy(Type *Scalar, unsigned VF) {
+ return ToVectorTy(Scalar, ElementCount::getFixed(VF));
+}
+
+/// A helper for converting to wider (vector) types. For scalar types, this is
+/// equivalent to calling `ToVectorTy`. For struct types, this returns a new
+/// struct where each element type has been widened to a vector type. Note: Only
+/// unpacked literal struct types are supported.
+Type *ToWideTy(Type *Ty, ElementCount EC);
+
+/// A helper for converting wide types to narrow (non-vector) types. For vector
+/// types, this is equivalent to calling .getScalarType(). For struct types,
+/// this returns a new struct where each element type has been converted to a
+/// scalar type. Note: Only unpacked literal struct types are supported.
+Type *ToNarrowTy(Type *Ty);
+
+/// Returns the types contained in `Ty`. For struct types, it returns the
+/// elements, all other types are returned directly.
+SmallVector<Type *, 2> getContainedTypes(Type *Ty);
+
+/// Returns true if `Ty` is a vector type or a struct of vector types where all
+/// vector types share the same VF.
+bool isWideTy(Type *Ty);
+
+/// Returns the vectorization factor for a widened type.
+inline ElementCount getWideTypeVF(Type *Ty) {
+ assert(isWideTy(Ty) && "expected widened type!");
+ return cast<VectorType>(getContainedTypes(Ty).front())->getElementCount();
+}
+
+} // namespace llvm
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index dbffbb8a5f81d9..38b9da69ae2b76 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -39,6 +39,20 @@ static cl::opt<unsigned> MaxInterleaveGroupFactor(
cl::desc("Maximum factor for an interleaved access group (default = 8)"),
cl::init(8));
+/// Returns true if `Ty` can be widened by the loop vectorizer.
+bool llvm::canWidenType(Type *Ty) {
+ Type *ElTy = Ty;
+ // For now, only allow widening non-packed literal structs where all
+ // element types are the same. This simplifies the cost model and
+ // conversion between scalar and wide types.
+ if (auto *StructTy = dyn_cast<StructType>(Ty);
+ StructTy && !StructTy->isPacked() && StructTy->isLiteral() &&
+ StructTy->containsHomogeneousTypes()) {
+ ElTy = StructTy->elements().front();
+ }
+ return VectorType::isValidElementType(ElTy);
+}
+
/// Return true if all of the intrinsic's arguments and return type are scalars
/// for the scalar form of the intrinsic, and vectors for the vector form of the
/// intrinsic (except operands that are marked as always being scalar by
diff --git a/llvm/lib/IR/CMakeLists.txt b/llvm/lib/IR/CMakeLists.txt
index 544f4ea9223d0e..7eaf35e10ebc67 100644
--- a/llvm/lib/IR/CMakeLists.txt
+++ b/llvm/lib/IR/CMakeLists.txt
@@ -73,6 +73,7 @@ add_llvm_component_library(LLVMCore
Value.cpp
ValueSymbolTable.cpp
VectorBuilder.cpp
+ VectorUtils.cpp
Verifier.cpp
VFABIDemangler.cpp
RuntimeLibcalls.cpp
diff --git a/llvm/lib/IR/VFABIDemangler.cpp b/llvm/lib/IR/VFABIDemangler.cpp
index cdfb9fbfaa084d..6ccd77fd23793a 100644
--- a/llvm/lib/IR/VFABIDemangler.cpp
+++ b/llvm/lib/IR/VFABIDemangler.cpp
@@ -11,6 +11,7 @@
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringSwitch.h"
#include "llvm/IR/Module.h"
+#include "llvm/IR/VectorUtils.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
#include <limits>
@@ -346,12 +347,15 @@ getScalableECFromSignature(const FunctionType *Signature, const VFISAKind ISA,
// Also check the return type if not void.
Type *RetTy = Signature->getReturnType();
if (!RetTy->isVoidTy()) {
- std::optional<ElementCount> ReturnEC = getElementCountForTy(ISA, RetTy);
- // If we have an unknown scalar element type we can't find a reasonable VF.
- if (!ReturnEC)
- return std::nullopt;
- if (ElementCount::isKnownLT(*ReturnEC, MinEC))
- MinEC = *ReturnEC;
+ for (Type *RetTy : getContainedTypes(RetTy)) {
+ std::optional<ElementCount> ReturnEC = getElementCountForTy(ISA, RetTy);
+ // If we have an unknown scalar element type we can't find a reasonable
+ // VF.
+ if (!ReturnEC)
+ return std::nullopt;
+ if (ElementCount::isKnownLT(*ReturnEC, MinEC))
+ MinEC = *ReturnEC;
+ }
}
// The SVE Vector function call ABI bases the VF on the widest element types
@@ -566,7 +570,7 @@ FunctionType *VFABI::createFunctionType(const VFInfo &Info,
auto *RetTy = ScalarFTy->getReturnType();
if (!RetTy->isVoidTy())
- RetTy = VectorType::get(RetTy, VF);
+ RetTy = ToWideTy(RetTy, VF);
return FunctionType::get(RetTy, VecTypes, false);
}
diff --git a/llvm/lib/IR/VectorUtils.cpp b/llvm/lib/IR/VectorUtils.cpp
new file mode 100644
index 00000000000000..c89a8eaf2ad1e0
--- /dev/null
+++ b/llvm/lib/IR/VectorUtils.cpp
@@ -0,0 +1,69 @@
+//===----------- VectorUtils.cpp - Vector type utility functions ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/IR/VectorUtils.h"
+#include "llvm/ADT/SmallVectorExtras.h"
+
+using namespace llvm;
+
+/// A helper for converting to wider (vector) types. For scalar types, this is
+/// equivalent to calling `ToVectorTy`. For struct types, this returns a new
+/// struct where each element type has been widened to a vector type. Note: Only
+/// unpacked literal struct types are supported.
+Type *llvm::ToWideTy(Type *Ty, ElementCount EC) {
+ if (EC.isScalar())
+ return Ty;
+ auto *StructTy = dyn_cast<StructType>(Ty);
+ if (!StructTy)
+ return ToVectorTy(Ty, EC);
+ assert(StructTy->isLiteral() && !StructTy->isPacked() &&
+ "expected unpacked struct literal");
+ return StructType::get(
+ Ty->getContext(),
+ map_to_vector(StructTy->elements(), [&](Type *ElTy) -> Type * {
+ return VectorType::get(ElTy, EC);
+ }));
+}
+
+/// A helper for converting wide types to narrow (non-vector) types. For vector
+/// types, this is equivalent to calling .getScalarType(). For struct types,
+/// this returns a new struct where each element type has been converted to a
+/// scalar type. Note: Only unpacked literal struct types are supported.
+Type *llvm::ToNarrowTy(Type *Ty) {
+ auto *StructTy = dyn_cast<StructType>(Ty);
+ if (!StructTy)
+ return Ty->getScalarType();
+ assert(StructTy->isLiteral() && !StructTy->isPacked() &&
+ "expected unpacked struct literal");
+ return StructType::get(
+ Ty->getContext(),
+ map_to_vector(StructTy->elements(), [](Type *ElTy) -> Type * {
+ return ElTy->getScalarType();
+ }));
+}
+
+/// Returns the types contained in `Ty`. For struct types, it returns the
+/// elements, all other types are returned directly.
+SmallVector<Type *, 2> llvm::getContainedTypes(Type *Ty) {
+ auto *StructTy = dyn_cast<StructType>(Ty);
+ if (StructTy)
+ return to_vector<2>(StructTy->elements());
+ return {Ty};
+}
+
+/// Returns true if `Ty` is a vector type or a struct of vector types where all
+/// vector types share the same VF.
+bool llvm::isWideTy(Type *Ty) {
+ auto ContainedTys = getContainedTypes(Ty);
+ if (ContainedTys.empty() || !ContainedTys.front()->isVectorTy())
+ return false;
+ ElementCount VF = cast<VectorType>(ContainedTys.front())->getElementCount();
+ return all_of(ContainedTys, [&](Type *Ty) {
+ return Ty->isVectorTy() && cast<VectorType>(Ty)->getElementCount() == VF;
+ });
+}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 43be72f0f34d45..cb6327640dbdbb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -949,8 +949,8 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
// Check that the instruction return type is vectorizable.
// We can't vectorize casts from vector type to scalar type.
// Also, we can't vectorize extractelement instructions.
- if ((!VectorType::isValidElementType(I.getType()) &&
- !I.getType()->isVoidTy()) ||
+ Type *InstTy = I.getType();
+ if (!(InstTy->isVoidTy() || canWidenType(InstTy)) ||
(isa<CastInst>(I) &&
!VectorType::isValidElementType(I.getOperand(0)->getType())) ||
isa<ExtractElementInst>(I)) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6e082b1c134dee..7283bf8d323971 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2302,7 +2302,9 @@ void InnerLoopVectorizer::scalarizeInstruction(const Instruction *Instr,
VPReplicateRecipe *RepRecipe,
const VPLane &Lane,
VPTransformState &State) {
- assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
+ assert((!Instr->getType()->isAggregateType() ||
+ canWidenType(Instr->getType())) &&
+ "expected widenable or non-aggregate type!");
// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();
@@ -2864,10 +2866,10 @@ LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,
return ScalarCallCost;
}
-static Type *maybeVectorizeType(Type *Elt, ElementCount VF) {
- if (VF.isScalar() || (!Elt->isIntOrPtrTy() && !Elt->isFloatingPointTy()))
- return Elt;
- return VectorType::get(Elt, VF);
+static Type *maybeVectorizeType(Type *Ty, ElementCount VF) {
+ if (VF.isScalar() || !canWidenType(Ty))
+ return Ty;
+ return ToWideTy(Ty, VF);
}
InstructionCost
@@ -3633,9 +3635,8 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// ExtractValue instructions must be uniform, because the operands are
// known to be loop-invariant.
- if (auto *EVI = dyn_cast<ExtractValueInst>(&I)) {
- assert(IsOutOfScope(EVI->getAggregateOperand()) &&
- "Expected aggregate value to be loop invariant");
+ if (auto *EVI = dyn_cast<ExtractValueInst>(&I);
+ EVI && IsOutOfScope(EVI->getAggregateOperand())) {
AddToWorklistIfAllowed(EVI);
continue;
}
@@ -4471,8 +4472,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
llvm_unreachable("unhandled recipe");
}
- auto WillWiden = [&TTI, VF](Type *ScalarTy) {
- Type *VectorTy = ToVectorTy(ScalarTy, VF);
+ auto WillWiden = [&TTI, VF](Type *VectorTy) {
unsigned NumLegalParts = TTI.getNumberOfParts(VectorTy);
if (!NumLegalParts)
return false;
@@ -4503,7 +4503,8 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
if (!Visited.insert({ScalarTy}).second)
continue;
- if (WillWiden(ScalarTy))
+ Type *WideTy = ToWideTy(ScalarTy, VF);
+ if (any_of(getContainedTypes(WideTy), WillWiden))
return true;
}
}
@@ -5452,10 +5453,13 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
// and phi nodes.
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
if (isScalarWithPredication(I, VF) && !I->getType()->isVoidTy()) {
- ScalarCost += TTI.getScalarizationOverhead(
- cast<VectorType>(ToVectorTy(I->getType(), VF)),
- APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ Type *WideTy = ToWideTy(I->getType(), VF);
+ for (Type *VectorTy : getContainedTypes(WideTy)) {
+ ScalarCost += TTI.getScalarizationOverhead(
+ cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getFixedValue()),
+ /*Insert*/ true,
+ /*Extract*/ false, CostKind);
+ }
ScalarCost +=
VF.getFixedValue() * TTI.getCFInstrCost(Instruction::PHI, CostKind);
}
@@ -5944,13 +5948,17 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
return 0;
InstructionCost Cost = 0;
- Type *RetTy = ToVectorTy(I->getType(), VF);
+ Type *RetTy = ToWideTy(I->getType(), VF);
if (!RetTy->isVoidTy() &&
- (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
- Cost += TTI.getScalarizationOverhead(
- cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
- /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore())) {
+
+ for (Type *VectorTy : getContainedTypes(RetTy)) {
+ Cost += TTI.getScalarizationOverhead(
+ cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getKnownMinValue()),
+ /*Insert*/ true,
+ /*Extract*/ false, CostKind);
+ }
+ }
// Some targets keep addresses scalar.
if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
@@ -6209,9 +6217,9 @@ void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
bool MaskRequired = Legal->isMaskRequired(CI);
// Compute corresponding vector type for return value and arguments.
- Type *RetTy = ToVectorTy(ScalarRetTy, VF);
+ Type *RetTy = ToWideTy(ScalarRetTy, VF);
for (Type *ScalarTy : ScalarTys)
- Tys.push_back(ToVectorTy(ScalarTy, VF));
+ Tys.push_back(ToWideTy(ScalarTy, VF));
// An in-loop reduction using an fmuladd intrinsic is a special case;
// we don't want the normal cost for that intrinsic.
@@ -6388,7 +6396,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
HasSingleCopyAfterVectorization(I, VF));
VectorTy = RetTy;
} else
- VectorTy = ToVectorTy(RetTy, VF);
+ VectorTy = ToWideTy(RetTy, VF);
if (VF.isVector() && VectorTy->isVectorTy() &&
!TTI.getNumberOfParts(VectorTy))
@@ -8423,6 +8431,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
case Instruction::Sub:
case Instruction::Xor:
case Instruction::Freeze:
+ case Instruction::ExtractValue:
SmallVector<VPValue *> NewOps(Operands);
if (Instruction::isBinaryOp(I->getOpcode())) {
// The legacy cost model uses SCEV to check if some of the operands are
@@ -9466,7 +9475,7 @...
[truncated]
|
61f8736
to
671bc7c
Compare
Kind ping 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me with a couple of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlan supports recipes that produce multiple results, e.g. as VPInterleaveRecipe
. The recipe then would take care of de-interleaving the result, this would help avoid the need to modify various places in VPTransformState.
I might have missed it, but could you make sure that there are tests with struct return values that cannot be widened?
I'm a little unclear on how this would work. Do you mean making
There's a few tests in |
Hm, yes, One question is how to deal with VPReplicateRecipe, I suppose we would need a version as well that produces multiple results.
When lowering
Thanks! |
1b44fdc
to
5d79394
Compare
Ping :) |
case Instruction::ExtractValue: | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this call something like getVectorInstrCost
?
Would be good to make sure we estimate the cost correctly when the extracts might not be completely free, e.g. if the results are returned via the stack (e.g. something like https://llvm.godbolt.org/z/zsv3G3h7n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I did look around (including getVectorInstrCost()
), but it does not look like there's a hook for this right now. getVectorInstrCost()
is currently only for insertelement
and extractelement
. I guess I could modify this hook to handle it, or add a new one though?
llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp
Lines 1079 to 1084 in 455b3d6
InstructionCost TargetTransformInfo::getVectorInstrCost( | |
unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index, | |
Value *Op0, Value *Op1) const { | |
assert((Opcode == Instruction::InsertElement || | |
Opcode == Instruction::ExtractElement) && | |
"Expecting Opcode to be insertelement/extractelement."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a new hook getInsertExtractValueCost
rather than shoehorn it into the vector insert/extract cost functions. By default this just returns 0, but there should be enough information passed to it to implement target-specific costs in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests to make sure we do the right thing when results are not returned in vector registers (as in https://llvm.godbolt.org/z/zsv3G3h7n)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If by the "right thing" you mean "Am I costing the return values based on the target-specific call ABI?" then the answer is no. I think the answer to that is also "no" for non-struct-return functions too (I don't think there's any costing based on how the target passes arguments/returns values).
I don't really think that needs doing in this patch (which just adds basic support). Since nothing is using struct-return vectorization right now. Currently, it needs things to be manually annotated with vector-function-abi-variant
, and I don't really think your example is a very realistic use case.
a8e68a8
to
026e9d3
Compare
/// ArrayRef to indicate that there is no information on the indices. This is | ||
/// used when the instruction is not available; a typical use case is to | ||
/// provision the cost of vectorization/scalarization in vectorizer passes. | ||
InstructionCost getInsertExtractValueCost(unsigned Opcode, Type *AggType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to get input from someone else on the new hook (maybe @davemgreen / @RKSimon ?)
Not sure about returning 0 by default, might be better to use a more conservative value? Can this by tested by a cost model test ? (Analysis/CostModel)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose 0 to match the default value used in getInstructionCost()
:
llvm-project/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Lines 1299 to 1301 in 80ab237
case Instruction::ExtractValue: | |
case Instruction::Freeze: | |
return TTI::TCC_Free; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes some sense (since normally sroa will remove any extractvalues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this suggests you're going to add target overrides at some point - what will they need to do? Also, how will you use the AggDef arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was based on @fhahn's comment here: #109833 (comment). The idea of passing AggDef
came from that, so you could cost the extractvalue
differently based on the source of the aggregate. If the aggregate comes from a call return value that is passed on the stack, then you would want to cost the extractvalue
like a load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we ever expect it to be non-zero from the vectorizer? I think I would expect the cost of say an add.with.overflow to include the extract cost in it's own cost, if one was needed. Perhaps for calls it would depend on the calling convention, with multiple values often being returned in multiple registers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to start with it would be fine to assume the main cost is accounted for in the producer? We should probably add a comment explaining why we use 0 to avoid confusion in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Not sure what others think, but IMO it would probably be better to drop the extra arguments, instead of passing them without being used. They can always be added once they are needed.
/// ArrayRef to indicate that there is no information on the indices. This is | ||
/// used when the instruction is not available; a typical use case is to | ||
/// provision the cost of vectorization/scalarization in vectorizer passes. | ||
InstructionCost getInsertExtractValueCost(unsigned Opcode, Type *AggType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Not sure what others think, but IMO it would probably be better to drop the extra arguments, instead of passing them without being used. They can always be added once they are needed.
This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x i32> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.
13e9865
to
af8febd
Compare
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the latest updates, a few more small comments inline.
What would be the best way to test this on some larger code? Is there a vector library that already defines some functions with multiple return values (or some follow-up patches)?
cast<VectorType>(toVectorTy(I->getType(), VF)), | ||
APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true, | ||
/*Extract*/ false, CostKind); | ||
Type *WideTy = toVectorizedTy(I->getType(), VF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking if this has been addressed in the latest version?
There's a follow-up patch #123210 to enable vectorizing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
…llvm#109833) This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.
This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be
extractvalue
instructions.The intended use case for this is vectorizing intrinsics such as:
Mapping them to structure-returning library calls such as:
Or their widened form (such as
@llvm.sincos.v4f32
in this case).Implementing this required two main changes:
extractvalue
Since the supported use case is narrow, the required changes are relatively small.