[LV] Add initial support for vectorizing literal struct return values #109833

MacDue · 2024-09-24T17:19:46Z

This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be extractvalue instructions.

The intended use case for this is vectorizing intrinsics such as:

declare { float, float } @llvm.sincos.f32(float %x)

Mapping them to structure-returning library calls such as:

declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>)

Or their widened form (such as @llvm.sincos.v4f32 in this case).

Implementing this required two main changes:

Supporting widening extractvalue
Adding support for vectorized struct types in LV

This is mostly limited to parts of the cost model and scalarization

Since the supported use case is narrow, the required changes are relatively small.

github-actions · 2024-09-24T17:23:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MacDue · 2024-10-03T16:34:00Z

Marking the PR as "ready to review". I think this PR likely still needs further changes, but I don't plan on iterating on this much more without review 🙂 (to be sure this is the right direction).

llvmbot · 2024-10-03T16:34:42Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-analysis

Author: Benjamin Maxwell (MacDue)

Changes

This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed.

The intended use case for this is vectorizing intrinsics such as:

declare { float, float } @<!-- -->llvm.sincos.f32(float %x)

Mapping them to structure-returning library calls such as:

declare { &lt;4 x float&gt;, &lt;4 x i32&gt; } @<!-- -->Sleef_sincosf4_u10advsimd(&lt;4 x float&gt;)

It could also be possible to vectorize the intrinsic (without a libcall) and then later lower the intrinsic to a library call. This may be desired if the only library calls available take output pointers rather than return multiple values.

Implementing this required two main changes:

Supporting widening extractvalue
Adding support for "wide" types (in LV and parts of the cost model)

The first change is relatively straightforward, the second is larger as it requires changing assumptions that types are always scalars or vectors.

In this patch, a "wide" type is defined as a vector, or a struct literal where all elements are vectors (of the same element count).

To help with the second change some helpers for wide types have been added (that work similarly to existing vector helpers). These have been used along the paths needed to support vectorizing calls, however, I expect there are places that still only expect vector types.

Patch is 49.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109833.diff

15 Files Affected:

(modified) llvm/include/llvm/Analysis/VectorUtils.h (+3-12)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+25-17)
(added) llvm/include/llvm/IR/VectorUtils.h (+53)
(modified) llvm/lib/Analysis/VectorUtils.cpp (+14)
(modified) llvm/lib/IR/CMakeLists.txt (+1)
(modified) llvm/lib/IR/VFABIDemangler.cpp (+11-7)
(added) llvm/lib/IR/VectorUtils.cpp (+69)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+2-2)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+34-25)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+20-8)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+3-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+9-1)
(modified) llvm/test/Analysis/CostModel/AMDGPU/frexp.ll (+28-28)
(added) llvm/test/Transforms/LoopVectorize/AArch64/struct-return.ll (+251)

diff --git a/llvm/include/llvm/Analysis/VectorUtils.h b/llvm/include/llvm/Analysis/VectorUtils.h
index e2dd4976f39065..2a419560be3030 100644
--- a/llvm/include/llvm/Analysis/VectorUtils.h
+++ b/llvm/include/llvm/Analysis/VectorUtils.h
@@ -18,6 +18,7 @@
 #include "llvm/Analysis/LoopAccessAnalysis.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/VFABIDemangler.h"
+#include "llvm/IR/VectorUtils.h"
 #include "llvm/Support/CheckedArithmetic.h"
 
 namespace llvm {
@@ -127,18 +128,8 @@ namespace Intrinsic {
 typedef unsigned ID;
 }
 
-/// A helper function for converting Scalar types to vector types. If
-/// the incoming type is void, we return void. If the EC represents a
-/// scalar, we return the scalar type.
-inline Type *ToVectorTy(Type *Scalar, ElementCount EC) {
-  if (Scalar->isVoidTy() || Scalar->isMetadataTy() || EC.isScalar())
-    return Scalar;
-  return VectorType::get(Scalar, EC);
-}
-
-inline Type *ToVectorTy(Type *Scalar, unsigned VF) {
-  return ToVectorTy(Scalar, ElementCount::getFixed(VF));
-}
+/// Returns true if `Ty` can be widened by the loop vectorizer.
+bool canWidenType(Type *Ty);
 
 /// Identify if the intrinsic is trivially vectorizable.
 /// This method returns true if the intrinsic's argument types are all scalars
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index c36a346c1b2e05..710a9a107e1eba 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1564,8 +1564,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     Type *RetTy = ICA.getReturnType();
 
     ElementCount RetVF =
-        (RetTy->isVectorTy() ? cast<VectorType>(RetTy)->getElementCount()
-                             : ElementCount::getFixed(1));
+        isWideTy(RetTy) ? getWideTypeVF(RetTy) : ElementCount::getFixed(1);
+
     const IntrinsicInst *I = ICA.getInst();
     const SmallVectorImpl<const Value *> &Args = ICA.getArgs();
     FastMathFlags FMF = ICA.getFlags();
@@ -1886,10 +1886,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     InstructionCost ScalarizationCost = InstructionCost::getInvalid();
     if (RetVF.isVector() && !RetVF.isScalable()) {
       ScalarizationCost = 0;
-      if (!RetTy->isVoidTy())
-        ScalarizationCost += getScalarizationOverhead(
-            cast<VectorType>(RetTy),
-            /*Insert*/ true, /*Extract*/ false, CostKind);
+      if (!RetTy->isVoidTy()) {
+        for (Type *VectorTy : getContainedTypes(RetTy)) {
+          ScalarizationCost += getScalarizationOverhead(
+              cast<VectorType>(VectorTy),
+              /*Insert*/ true, /*Extract*/ false, CostKind);
+        }
+      }
       ScalarizationCost +=
           getOperandsScalarizationOverhead(Args, ICA.getArgTypes(), CostKind);
     }
@@ -2480,27 +2483,32 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     // Else, assume that we need to scalarize this intrinsic. For math builtins
     // this will emit a costly libcall, adding call overhead and spills. Make it
     // very expensive.
-    if (auto *RetVTy = dyn_cast<VectorType>(RetTy)) {
+    if (isWideTy(RetTy)) {
+      const SmallVector<Type *, 2> RetVTys = getContainedTypes(RetTy);
+
       // Scalable vectors cannot be scalarized, so return Invalid.
-      if (isa<ScalableVectorType>(RetTy) || any_of(Tys, [](const Type *Ty) {
-            return isa<ScalableVectorType>(Ty);
-          }))
+      if (any_of(concat<Type *const>(RetVTys, Tys),
+                 [](Type *Ty) { return isa<ScalableVectorType>(Ty); }))
         return InstructionCost::getInvalid();
 
-      InstructionCost ScalarizationCost =
-          SkipScalarizationCost
-              ? ScalarizationCostPassed
-              : getScalarizationOverhead(RetVTy, /*Insert*/ true,
-                                         /*Extract*/ false, CostKind);
+      InstructionCost ScalarizationCost = ScalarizationCostPassed;
+      if (!SkipScalarizationCost) {
+        ScalarizationCost = 0;
+        for (Type *RetVTy : RetVTys) {
+          ScalarizationCost += getScalarizationOverhead(
+              cast<VectorType>(RetVTy), /*Insert*/ true,
+              /*Extract*/ false, CostKind);
+        }
+      }
 
-      unsigned ScalarCalls = cast<FixedVectorType>(RetVTy)->getNumElements();
+      unsigned ScalarCalls = getWideTypeVF(RetTy).getFixedValue();
       SmallVector<Type *, 4> ScalarTys;
       for (Type *Ty : Tys) {
         if (Ty->isVectorTy())
           Ty = Ty->getScalarType();
         ScalarTys.push_back(Ty);
       }
-      IntrinsicCostAttributes Attrs(IID, RetTy->getScalarType(), ScalarTys, FMF);
+      IntrinsicCostAttributes Attrs(IID, ToNarrowTy(RetTy), ScalarTys, FMF);
       InstructionCost ScalarCost =
           thisT()->getIntrinsicInstrCost(Attrs, CostKind);
       for (Type *Ty : Tys) {
diff --git a/llvm/include/llvm/IR/VectorUtils.h b/llvm/include/llvm/IR/VectorUtils.h
new file mode 100644
index 00000000000000..e8e838d8287c42
--- /dev/null
+++ b/llvm/include/llvm/IR/VectorUtils.h
@@ -0,0 +1,53 @@
+//===----------- VectorUtils.h -  Vector type utility functions -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/IR/DerivedTypes.h"
+
+namespace llvm {
+
+/// A helper function for converting Scalar types to vector types. If
+/// the incoming type is void, we return void. If the EC represents a
+/// scalar, we return the scalar type.
+inline Type *ToVectorTy(Type *Scalar, ElementCount EC) {
+  if (Scalar->isVoidTy() || Scalar->isMetadataTy() || EC.isScalar())
+    return Scalar;
+  return VectorType::get(Scalar, EC);
+}
+
+inline Type *ToVectorTy(Type *Scalar, unsigned VF) {
+  return ToVectorTy(Scalar, ElementCount::getFixed(VF));
+}
+
+/// A helper for converting to wider (vector) types. For scalar types, this is
+/// equivalent to calling `ToVectorTy`. For struct types, this returns a new
+/// struct where each element type has been widened to a vector type. Note: Only
+/// unpacked literal struct types are supported.
+Type *ToWideTy(Type *Ty, ElementCount EC);
+
+/// A helper for converting wide types to narrow (non-vector) types. For vector
+/// types, this is equivalent to calling .getScalarType(). For struct types,
+/// this returns a new struct where each element type has been converted to a
+/// scalar type. Note: Only unpacked literal struct types are supported.
+Type *ToNarrowTy(Type *Ty);
+
+/// Returns the types contained in `Ty`. For struct types, it returns the
+/// elements, all other types are returned directly.
+SmallVector<Type *, 2> getContainedTypes(Type *Ty);
+
+/// Returns true if `Ty` is a vector type or a struct of vector types where all
+/// vector types share the same VF.
+bool isWideTy(Type *Ty);
+
+/// Returns the vectorization factor for a widened type.
+inline ElementCount getWideTypeVF(Type *Ty) {
+  assert(isWideTy(Ty) && "expected widened type!");
+  return cast<VectorType>(getContainedTypes(Ty).front())->getElementCount();
+}
+
+} // namespace llvm
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index dbffbb8a5f81d9..38b9da69ae2b76 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -39,6 +39,20 @@ static cl::opt<unsigned> MaxInterleaveGroupFactor(
     cl::desc("Maximum factor for an interleaved access group (default = 8)"),
     cl::init(8));
 
+/// Returns true if `Ty` can be widened by the loop vectorizer.
+bool llvm::canWidenType(Type *Ty) {
+  Type *ElTy = Ty;
+  // For now, only allow widening non-packed literal structs where all
+  // element types are the same. This simplifies the cost model and
+  // conversion between scalar and wide types.
+  if (auto *StructTy = dyn_cast<StructType>(Ty);
+      StructTy && !StructTy->isPacked() && StructTy->isLiteral() &&
+      StructTy->containsHomogeneousTypes()) {
+    ElTy = StructTy->elements().front();
+  }
+  return VectorType::isValidElementType(ElTy);
+}
+
 /// Return true if all of the intrinsic's arguments and return type are scalars
 /// for the scalar form of the intrinsic, and vectors for the vector form of the
 /// intrinsic (except operands that are marked as always being scalar by
diff --git a/llvm/lib/IR/CMakeLists.txt b/llvm/lib/IR/CMakeLists.txt
index 544f4ea9223d0e..7eaf35e10ebc67 100644
--- a/llvm/lib/IR/CMakeLists.txt
+++ b/llvm/lib/IR/CMakeLists.txt
@@ -73,6 +73,7 @@ add_llvm_component_library(LLVMCore
   Value.cpp
   ValueSymbolTable.cpp
   VectorBuilder.cpp
+  VectorUtils.cpp
   Verifier.cpp
   VFABIDemangler.cpp
   RuntimeLibcalls.cpp
diff --git a/llvm/lib/IR/VFABIDemangler.cpp b/llvm/lib/IR/VFABIDemangler.cpp
index cdfb9fbfaa084d..6ccd77fd23793a 100644
--- a/llvm/lib/IR/VFABIDemangler.cpp
+++ b/llvm/lib/IR/VFABIDemangler.cpp
@@ -11,6 +11,7 @@
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/IR/Module.h"
+#include "llvm/IR/VectorUtils.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include <limits>
@@ -346,12 +347,15 @@ getScalableECFromSignature(const FunctionType *Signature, const VFISAKind ISA,
   // Also check the return type if not void.
   Type *RetTy = Signature->getReturnType();
   if (!RetTy->isVoidTy()) {
-    std::optional<ElementCount> ReturnEC = getElementCountForTy(ISA, RetTy);
-    // If we have an unknown scalar element type we can't find a reasonable VF.
-    if (!ReturnEC)
-      return std::nullopt;
-    if (ElementCount::isKnownLT(*ReturnEC, MinEC))
-      MinEC = *ReturnEC;
+    for (Type *RetTy : getContainedTypes(RetTy)) {
+      std::optional<ElementCount> ReturnEC = getElementCountForTy(ISA, RetTy);
+      // If we have an unknown scalar element type we can't find a reasonable
+      // VF.
+      if (!ReturnEC)
+        return std::nullopt;
+      if (ElementCount::isKnownLT(*ReturnEC, MinEC))
+        MinEC = *ReturnEC;
+    }
   }
 
   // The SVE Vector function call ABI bases the VF on the widest element types
@@ -566,7 +570,7 @@ FunctionType *VFABI::createFunctionType(const VFInfo &Info,
 
   auto *RetTy = ScalarFTy->getReturnType();
   if (!RetTy->isVoidTy())
-    RetTy = VectorType::get(RetTy, VF);
+    RetTy = ToWideTy(RetTy, VF);
   return FunctionType::get(RetTy, VecTypes, false);
 }
 
diff --git a/llvm/lib/IR/VectorUtils.cpp b/llvm/lib/IR/VectorUtils.cpp
new file mode 100644
index 00000000000000..c89a8eaf2ad1e0
--- /dev/null
+++ b/llvm/lib/IR/VectorUtils.cpp
@@ -0,0 +1,69 @@
+//===----------- VectorUtils.cpp - Vector type utility functions ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/IR/VectorUtils.h"
+#include "llvm/ADT/SmallVectorExtras.h"
+
+using namespace llvm;
+
+/// A helper for converting to wider (vector) types. For scalar types, this is
+/// equivalent to calling `ToVectorTy`. For struct types, this returns a new
+/// struct where each element type has been widened to a vector type. Note: Only
+/// unpacked literal struct types are supported.
+Type *llvm::ToWideTy(Type *Ty, ElementCount EC) {
+  if (EC.isScalar())
+    return Ty;
+  auto *StructTy = dyn_cast<StructType>(Ty);
+  if (!StructTy)
+    return ToVectorTy(Ty, EC);
+  assert(StructTy->isLiteral() && !StructTy->isPacked() &&
+         "expected unpacked struct literal");
+  return StructType::get(
+      Ty->getContext(),
+      map_to_vector(StructTy->elements(), [&](Type *ElTy) -> Type * {
+        return VectorType::get(ElTy, EC);
+      }));
+}
+
+/// A helper for converting wide types to narrow (non-vector) types. For vector
+/// types, this is equivalent to calling .getScalarType(). For struct types,
+/// this returns a new struct where each element type has been converted to a
+/// scalar type. Note: Only unpacked literal struct types are supported.
+Type *llvm::ToNarrowTy(Type *Ty) {
+  auto *StructTy = dyn_cast<StructType>(Ty);
+  if (!StructTy)
+    return Ty->getScalarType();
+  assert(StructTy->isLiteral() && !StructTy->isPacked() &&
+         "expected unpacked struct literal");
+  return StructType::get(
+      Ty->getContext(),
+      map_to_vector(StructTy->elements(), [](Type *ElTy) -> Type * {
+        return ElTy->getScalarType();
+      }));
+}
+
+/// Returns the types contained in `Ty`. For struct types, it returns the
+/// elements, all other types are returned directly.
+SmallVector<Type *, 2> llvm::getContainedTypes(Type *Ty) {
+  auto *StructTy = dyn_cast<StructType>(Ty);
+  if (StructTy)
+    return to_vector<2>(StructTy->elements());
+  return {Ty};
+}
+
+/// Returns true if `Ty` is a vector type or a struct of vector types where all
+/// vector types share the same VF.
+bool llvm::isWideTy(Type *Ty) {
+  auto ContainedTys = getContainedTypes(Ty);
+  if (ContainedTys.empty() || !ContainedTys.front()->isVectorTy())
+    return false;
+  ElementCount VF = cast<VectorType>(ContainedTys.front())->getElementCount();
+  return all_of(ContainedTys, [&](Type *Ty) {
+    return Ty->isVectorTy() && cast<VectorType>(Ty)->getElementCount() == VF;
+  });
+}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 43be72f0f34d45..cb6327640dbdbb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -949,8 +949,8 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
       // Check that the instruction return type is vectorizable.
       // We can't vectorize casts from vector type to scalar type.
       // Also, we can't vectorize extractelement instructions.
-      if ((!VectorType::isValidElementType(I.getType()) &&
-           !I.getType()->isVoidTy()) ||
+      Type *InstTy = I.getType();
+      if (!(InstTy->isVoidTy() || canWidenType(InstTy)) ||
           (isa<CastInst>(I) &&
            !VectorType::isValidElementType(I.getOperand(0)->getType())) ||
           isa<ExtractElementInst>(I)) {
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6e082b1c134dee..7283bf8d323971 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2302,7 +2302,9 @@ void InnerLoopVectorizer::scalarizeInstruction(const Instruction *Instr,
                                                VPReplicateRecipe *RepRecipe,
                                                const VPLane &Lane,
                                                VPTransformState &State) {
-  assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
+  assert((!Instr->getType()->isAggregateType() ||
+          canWidenType(Instr->getType())) &&
+         "expected widenable or non-aggregate type!");
 
   // Does this instruction return a value ?
   bool IsVoidRetTy = Instr->getType()->isVoidTy();
@@ -2864,10 +2866,10 @@ LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,
   return ScalarCallCost;
 }
 
-static Type *maybeVectorizeType(Type *Elt, ElementCount VF) {
-  if (VF.isScalar() || (!Elt->isIntOrPtrTy() && !Elt->isFloatingPointTy()))
-    return Elt;
-  return VectorType::get(Elt, VF);
+static Type *maybeVectorizeType(Type *Ty, ElementCount VF) {
+  if (VF.isScalar() || !canWidenType(Ty))
+    return Ty;
+  return ToWideTy(Ty, VF);
 }
 
 InstructionCost
@@ -3633,9 +3635,8 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
 
       // ExtractValue instructions must be uniform, because the operands are
       // known to be loop-invariant.
-      if (auto *EVI = dyn_cast<ExtractValueInst>(&I)) {
-        assert(IsOutOfScope(EVI->getAggregateOperand()) &&
-               "Expected aggregate value to be loop invariant");
+      if (auto *EVI = dyn_cast<ExtractValueInst>(&I);
+          EVI && IsOutOfScope(EVI->getAggregateOperand())) {
         AddToWorklistIfAllowed(EVI);
         continue;
       }
@@ -4471,8 +4472,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
         llvm_unreachable("unhandled recipe");
       }
 
-      auto WillWiden = [&TTI, VF](Type *ScalarTy) {
-        Type *VectorTy = ToVectorTy(ScalarTy, VF);
+      auto WillWiden = [&TTI, VF](Type *VectorTy) {
         unsigned NumLegalParts = TTI.getNumberOfParts(VectorTy);
         if (!NumLegalParts)
           return false;
@@ -4503,7 +4503,8 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       Type *ScalarTy = TypeInfo.inferScalarType(ToCheck);
       if (!Visited.insert({ScalarTy}).second)
         continue;
-      if (WillWiden(ScalarTy))
+      Type *WideTy = ToWideTy(ScalarTy, VF);
+      if (any_of(getContainedTypes(WideTy), WillWiden))
         return true;
     }
   }
@@ -5452,10 +5453,13 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
     // and phi nodes.
     TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     if (isScalarWithPredication(I, VF) && !I->getType()->isVoidTy()) {
-      ScalarCost += TTI.getScalarizationOverhead(
-          cast<VectorType>(ToVectorTy(I->getType(), VF)),
-          APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true,
-          /*Extract*/ false, CostKind);
+      Type *WideTy = ToWideTy(I->getType(), VF);
+      for (Type *VectorTy : getContainedTypes(WideTy)) {
+        ScalarCost += TTI.getScalarizationOverhead(
+            cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getFixedValue()),
+            /*Insert*/ true,
+            /*Extract*/ false, CostKind);
+      }
       ScalarCost +=
           VF.getFixedValue() * TTI.getCFInstrCost(Instruction::PHI, CostKind);
     }
@@ -5944,13 +5948,17 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
     return 0;
 
   InstructionCost Cost = 0;
-  Type *RetTy = ToVectorTy(I->getType(), VF);
+  Type *RetTy = ToWideTy(I->getType(), VF);
   if (!RetTy->isVoidTy() &&
-      (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
-    Cost += TTI.getScalarizationOverhead(
-        cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
-        /*Insert*/ true,
-        /*Extract*/ false, CostKind);
+      (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore())) {
+
+    for (Type *VectorTy : getContainedTypes(RetTy)) {
+      Cost += TTI.getScalarizationOverhead(
+          cast<VectorType>(VectorTy), APInt::getAllOnes(VF.getKnownMinValue()),
+          /*Insert*/ true,
+          /*Extract*/ false, CostKind);
+    }
+  }
 
   // Some targets keep addresses scalar.
   if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
@@ -6209,9 +6217,9 @@ void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
 
       bool MaskRequired = Legal->isMaskRequired(CI);
       // Compute corresponding vector type for return value and arguments.
-      Type *RetTy = ToVectorTy(ScalarRetTy, VF);
+      Type *RetTy = ToWideTy(ScalarRetTy, VF);
       for (Type *ScalarTy : ScalarTys)
-        Tys.push_back(ToVectorTy(ScalarTy, VF));
+        Tys.push_back(ToWideTy(ScalarTy, VF));
 
       // An in-loop reduction using an fmuladd intrinsic is a special case;
       // we don't want the normal cost for that intrinsic.
@@ -6388,7 +6396,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
            HasSingleCopyAfterVectorization(I, VF));
     VectorTy = RetTy;
   } else
-    VectorTy = ToVectorTy(RetTy, VF);
+    VectorTy = ToWideTy(RetTy, VF);
 
   if (VF.isVector() && VectorTy->isVectorTy() &&
       !TTI.getNumberOfParts(VectorTy))
@@ -8423,6 +8431,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
   case Instruction::Sub:
   case Instruction::Xor:
   case Instruction::Freeze:
+  case Instruction::ExtractValue:
     SmallVector<VPValue *> NewOps(Operands);
     if (Instruction::isBinaryOp(I->getOpcode())) {
       // The legacy cost model uses SCEV to check if some of the operands are
@@ -9466,7 +9475,7 @...
[truncated]

llvm/test/Transforms/LoopVectorize/AArch64/struct-return.ll

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/test/Analysis/CostModel/AMDGPU/frexp.ll

MacDue · 2024-10-09T17:03:34Z

Kind ping 🙂

SamTebbs33

Looks good to me with a couple of comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/struct-return.ll

fhahn

VPlan supports recipes that produce multiple results, e.g. as VPInterleaveRecipe. The recipe then would take care of de-interleaving the result, this would help avoid the need to modify various places in VPTransformState.

I might have missed it, but could you make sure that there are tests with struct return values that cannot be widened?

llvm/test/Transforms/LoopVectorize/AArch64/struct-return.ll

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

MacDue · 2024-10-11T15:06:27Z

VPlan supports recipes that produce multiple results, e.g. as VPInterleaveRecipe. The recipe then would take care of de-interleaving the result, this would help avoid the need to modify various places in VPTransformState.

I'm a little unclear on how this would work. Do you mean making VPWidenCallRecipe/VPReplicateRecipe return a value for each element of the struct? I'm not sure what to do around the extractvalues in the original IR/plan, I guess you'd have to replace them with the widen call recipe while constructing the plan?

I might have missed it, but could you make sure that there are tests with struct return values that cannot be widened?

There's a few tests in llvm/test/Transforms/LoopVectorize/struct-return.ll (negative_mixed_element_type_struct_return, test_named_struct_return, test_overflow_intrinsic).

fhahn · 2024-10-14T19:48:58Z

VPlan supports recipes that produce multiple results, e.g. as VPInterleaveRecipe. The recipe then would take care of de-interleaving the result, this would help avoid the need to modify various places in VPTransformState.

I'm a little unclear on how this would work. Do you mean making VPWidenCallRecipe/VPReplicateRecipe return a value for each element of the struct? I'm not sure what to do around the extractvalues in the original IR/plan, I guess you'd have to replace them with the widen call recipe while constructing the plan?

Hm, yes, VPWidenCallRecipe could define one VPValue per element vector and take care of splitting up the return value directly, which may also allow more accurately estimating the cost overhead. It might be worth introducing a separate recipe for calls that return multiple results. Perhaps @ayalz has any additional thoughts?

One question is how to deal with VPReplicateRecipe, I suppose we would need a version as well that produces multiple results.

I'm not sure what to do around the extractvalues in the original IR/plan, I guess you'd have to replace them with the widen call recipe while constructing the plan?

When lowering extractvalue's, they would be replaced by just looking up the VPValue from the first multi-def operand at the index of the extract.

I might have missed it, but could you make sure that there are tests with struct return values that cannot be widened?

There's a few tests in llvm/test/Transforms/LoopVectorize/struct-return.ll (negative_mixed_element_type_struct_return, test_named_struct_return, test_overflow_intrinsic).

Thanks!

MacDue · 2025-01-20T15:32:01Z

Ping :)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

fhahn · 2025-01-21T08:47:40Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

+  case Instruction::ExtractValue:
+    return 0;


Should this call something like getVectorInstrCost?

Would be good to make sure we estimate the cost correctly when the extracts might not be completely free, e.g. if the results are returned via the stack (e.g. something like https://llvm.godbolt.org/z/zsv3G3h7n)

I don't think so, I did look around (including getVectorInstrCost()), but it does not look like there's a hook for this right now. getVectorInstrCost() is currently only for insertelement and extractelement. I guess I could modify this hook to handle it, or add a new one though?

llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp

Lines 1079 to 1084 in 455b3d6

InstructionCost TargetTransformInfo::getVectorInstrCost(

unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,

Value *Op0, Value *Op1) const {

assert((Opcode == Instruction::InsertElement ||

Opcode == Instruction::ExtractElement) &&

"Expecting Opcode to be insertelement/extractelement.");

I've added a new hook getInsertExtractValueCost rather than shoehorn it into the vector insert/extract cost functions. By default this just returns 0, but there should be enough information passed to it to implement target-specific costs in the future.

Do we have tests to make sure we do the right thing when results are not returned in vector registers (as in https://llvm.godbolt.org/z/zsv3G3h7n)?

If by the "right thing" you mean "Am I costing the return values based on the target-specific call ABI?" then the answer is no. I think the answer to that is also "no" for non-struct-return functions too (I don't think there's any costing based on how the target passes arguments/returns values).

I don't really think that needs doing in this patch (which just adds basic support). Since nothing is using struct-return vectorization right now. Currently, it needs things to be manually annotated with vector-function-abi-variant, and I don't really think your example is a very realistic use case.

fhahn · 2025-01-26T21:39:51Z

llvm/include/llvm/Analysis/TargetTransformInfo.h

+  /// ArrayRef to indicate that there is no information on the indices. This is
+  /// used when the instruction is not available; a typical use case is to
+  /// provision the cost of vectorization/scalarization in vectorizer passes.
+  InstructionCost getInsertExtractValueCost(unsigned Opcode, Type *AggType,


Would be good to get input from someone else on the new hook (maybe @davemgreen / @RKSimon ?)

Not sure about returning 0 by default, might be better to use a more conservative value? Can this by tested by a cost model test ? (Analysis/CostModel)?

I chose 0 to match the default value used in getInstructionCost():

llvm-project/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Lines 1299 to 1301 in 80ab237

case Instruction::ExtractValue:

case Instruction::Freeze:

return TTI::TCC_Free;

I think it makes some sense (since normally sroa will remove any extractvalues).

Adding this suggests you're going to add target overrides at some point - what will they need to do? Also, how will you use the AggDef arg?

It was based on @fhahn's comment here: #109833 (comment). The idea of passing AggDef came from that, so you could cost the extractvalue differently based on the source of the aggregate. If the aggregate comes from a call return value that is passed on the stack, then you would want to cost the extractvalue like a load.

Would we ever expect it to be non-zero from the vectorizer? I think I would expect the cost of say an add.with.overflow to include the extract cost in it's own cost, if one was needed. Perhaps for calls it would depend on the calling convention, with multiple values often being returned in multiple registers?

Maybe to start with it would be fine to assume the main cost is accounted for in the producer? We should probably add a comment explaining why we use 0 to avoid confusion in the future

I've added a comment 👍

Thanks. Not sure what others think, but IMO it would probably be better to drop the extra arguments, instead of passing them without being used. They can always be added once they are needed.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

fhahn · 2025-01-30T22:27:51Z

llvm/include/llvm/Analysis/TargetTransformInfo.h

+  /// ArrayRef to indicate that there is no information on the indices. This is
+  /// used when the instruction is not available; a typical use case is to
+  /// provision the cost of vectorization/scalarization in vectorizer passes.
+  InstructionCost getInsertExtractValueCost(unsigned Opcode, Type *AggType,


Thanks. Not sure what others think, but IMO it would probably be better to drop the extra arguments, instead of passing them without being used. They can always be added once they are needed.

This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x i32> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.

Change-Id: I687099076728ddb73466cfd7cf1a8307f064b4f7

MacDue · 2025-02-07T16:01:53Z

Ping

fhahn

Thanks for the latest updates, a few more small comments inline.

What would be the best way to test this on some larger code? Is there a vector library that already defines some functions with multiple return values (or some follow-up patches)?

llvm/include/llvm/Analysis/TargetTransformInfo.h

fhahn · 2025-02-09T21:46:37Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-          cast<VectorType>(toVectorTy(I->getType(), VF)),
-          APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true,
-          /*Extract*/ false, CostKind);
+      Type *WideTy = toVectorizedTy(I->getType(), VF);


Just checking if this has been addressed in the latest version?

MacDue · 2025-02-10T00:41:08Z

What would be the best way to test this on some larger code? Is there a vector library that already defines some functions with multiple return values (or some follow-up patches)?

There's a follow-up patch #123210 to enable vectorizing the llvm.sincos intrinsic (which can be lowered to from clang's __builtin_sincos()), and we intend to do similar patches for a few other intrinsics.

fhahn

LGTM, thanks!

…llvm#109833) This patch adds initial support for vectorizing literal struct return values. Currently, this is limited to the case where the struct is homogeneous (all elements have the same type) and not packed. The users of the call also must all be `extractvalue` instructions. The intended use case for this is vectorizing intrinsics such as: ``` declare { float, float } @llvm.sincos.f32(float %x) ``` Mapping them to structure-returning library calls such as: ``` declare { <4 x float>, <4 x float> } @Sleef_sincosf4_u10advsimd(<4 x float>) ``` Or their widened form (such as `@llvm.sincos.v4f32` in this case). Implementing this required two main changes: 1. Supporting widening `extractvalue` 2. Adding support for vectorized struct types in LV * This is mostly limited to parts of the cost model and scalarization Since the supported use case is narrow, the required changes are relatively small.

MacDue mentioned this pull request Sep 24, 2024

[IR] Add llvm.sincos intrinsic #109825

Merged

MacDue force-pushed the loop_vec_struct_experiment branch from 59b3fa9 to 5518580 Compare September 24, 2024 17:25

SamTebbs33 self-requested a review September 25, 2024 10:04

MacDue force-pushed the loop_vec_struct_experiment branch 3 times, most recently from 8ce936e to 0f2cdbb Compare October 3, 2024 15:40

MacDue marked this pull request as ready for review October 3, 2024 16:34

llvmbot added backend:AMDGPU vectorizers llvm:ir llvm:analysis llvm:transforms labels Oct 3, 2024

MacDue requested review from huntergr-arm and fhahn October 3, 2024 16:34

MacDue requested review from RKSimon, NickGuy-Arm and alexey-bataev October 3, 2024 16:34

arsenm reviewed Oct 3, 2024

View reviewed changes

llvm/test/Analysis/CostModel/AMDGPU/frexp.ll Outdated Show resolved Hide resolved

fhahn reviewed Oct 3, 2024

View reviewed changes

llvm/test/Analysis/CostModel/AMDGPU/frexp.ll Outdated Show resolved Hide resolved

MacDue force-pushed the loop_vec_struct_experiment branch 2 times, most recently from 61f8736 to 671bc7c Compare October 9, 2024 16:11

SamTebbs33 approved these changes Oct 10, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/LoopVectorize/AArch64/struct-return.ll Outdated Show resolved Hide resolved

fhahn reviewed Oct 10, 2024

View reviewed changes

MacDue force-pushed the loop_vec_struct_experiment branch from 1b44fdc to 5d79394 Compare January 17, 2025 15:55

fhahn reviewed Jan 21, 2025

View reviewed changes

MacDue force-pushed the loop_vec_struct_experiment branch 2 times, most recently from a8e68a8 to 026e9d3 Compare January 22, 2025 13:26

fhahn reviewed Jan 26, 2025

View reviewed changes

fhahn reviewed Jan 30, 2025

View reviewed changes

MacDue added 9 commits February 4, 2025 12:27

Fixups

178b2c3

Fixups

c0f9813

Fixups

cc7e82a

Change-Id: I687099076728ddb73466cfd7cf1a8307f064b4f7

Fixups

6f1e15c

Fixups

4b1534e

More cost checks

e5af9fc

Add comment

2ee07ab

Fixups

af8febd

MacDue force-pushed the loop_vec_struct_experiment branch from 13e9865 to af8febd Compare February 4, 2025 12:29

fhahn reviewed Feb 9, 2025

View reviewed changes

Fix typo

8ddfd70

This was referenced Feb 10, 2025

(release 20.x non-upstream patch) [LV] Add initial support for vectorizing literal struct return values arm/arm-toolchain#82

Merged

[Downstream Change] sincos() vectorization arm/arm-toolchain#87

Open

fhahn approved these changes Feb 16, 2025

View reviewed changes

MacDue merged commit e0e67a6 into llvm:main Feb 17, 2025
8 checks passed

MacDue deleted the loop_vec_struct_experiment branch February 17, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LV] Add initial support for vectorizing literal struct return values #109833

[LV] Add initial support for vectorizing literal struct return values #109833

MacDue commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited

Loading

MacDue commented Oct 3, 2024

llvmbot commented Oct 3, 2024 •

edited

Loading

MacDue commented Oct 9, 2024

SamTebbs33 left a comment

fhahn left a comment

MacDue commented Oct 11, 2024 •

edited

Loading

fhahn commented Oct 14, 2024

MacDue commented Jan 20, 2025

fhahn Jan 21, 2025

MacDue Jan 21, 2025

MacDue Jan 21, 2025

fhahn Jan 26, 2025

MacDue Jan 27, 2025

fhahn Jan 26, 2025

MacDue Jan 26, 2025

MacDue Jan 26, 2025

RKSimon Jan 27, 2025

MacDue Jan 27, 2025

davemgreen Jan 28, 2025

fhahn Jan 28, 2025

MacDue Jan 28, 2025

fhahn Jan 30, 2025

fhahn Jan 30, 2025

MacDue commented Feb 7, 2025

fhahn left a comment

fhahn Feb 9, 2025

MacDue commented Feb 10, 2025

fhahn left a comment

	InstructionCost TargetTransformInfo::getVectorInstrCost(
	unsigned Opcode, Type *Val, TTI::TargetCostKind CostKind, unsigned Index,
	Value Op0, Value Op1) const {
	assert((Opcode == Instruction::InsertElement \|\|
	Opcode == Instruction::ExtractElement) &&
	"Expecting Opcode to be insertelement/extractelement.");

	case Instruction::ExtractValue:
	case Instruction::Freeze:
	return TTI::TCC_Free;

[LV] Add initial support for vectorizing literal struct return values #109833

[LV] Add initial support for vectorizing literal struct return values #109833

Conversation

MacDue commented Sep 24, 2024 • edited Loading

github-actions bot commented Sep 24, 2024 • edited Loading

MacDue commented Oct 3, 2024

llvmbot commented Oct 3, 2024 • edited Loading

MacDue commented Oct 9, 2024

SamTebbs33 left a comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

MacDue commented Oct 11, 2024 • edited Loading

fhahn commented Oct 14, 2024

MacDue commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MacDue commented Feb 7, 2025

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MacDue commented Feb 10, 2025

fhahn left a comment

Choose a reason for hiding this comment

MacDue commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited

Loading

llvmbot commented Oct 3, 2024 •

edited

Loading

MacDue commented Oct 11, 2024 •

edited

Loading