-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV] Add cost for @llvm.experimental.vp.splat #117313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is split off from llvm#115274. There doesn't seem to be an easy way to share this with getShuffleCost since that requires passing in a real insert_element operand to get it to recognise it's a scalar splat. There's no tests for i1 vectors, since we can't lower vp splats of them yet and currently crash. Co-authored-by: Shih-Po Hung <[email protected]>
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis is split off from #115274. There doesn't seem to be an easy way to share this with getShuffleCost since that requires passing in a real insert_element operand to get it to recognise it's a scalar splat. There's no tests for i1 vectors, since we can't lower vp splats of them yet and currently crash. Full diff: https://github.com/llvm/llvm-project/pull/117313.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2b16dcbcd8695b..ecdecd7edff07c 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1155,6 +1155,11 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
return getCmpSelInstrCost(Instruction::Select, ICA.getReturnType(),
ICA.getArgTypes()[0], CmpInst::BAD_ICMP_PREDICATE,
CostKind);
+ case Intrinsic::experimental_vp_splat: {
+ auto LT = getTypeLegalizationCost(RetTy);
+ return LT.first *
+ getRISCVInstructionCost(RISCV::VMV_V_X, LT.second, CostKind);
+ }
case Intrinsic::vp_reduce_add:
case Intrinsic::vp_reduce_fadd:
case Intrinsic::vp_reduce_mul:
diff --git a/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
index 800ea223850d31..a7ce6f660ca4af 100644
--- a/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
@@ -2125,6 +2125,112 @@ define void @vp_fdiv(){
ret void
}
+define void @splat() {
+; CHECK-LABEL: 'splat'
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <2 x i8> @llvm.experimental.vp.splat.v2i8(i8 undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = call <4 x i8> @llvm.experimental.vp.splat.v4i8(i8 undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call <8 x i8> @llvm.experimental.vp.splat.v8i8(i8 undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call <16 x i8> @llvm.experimental.vp.splat.v16i8(i8 undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = call <2 x i16> @llvm.experimental.vp.splat.v2i16(i16 undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = call <4 x i16> @llvm.experimental.vp.splat.v4i16(i16 undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <8 x i16> @llvm.experimental.vp.splat.v8i16(i16 undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <16 x i16> @llvm.experimental.vp.splat.v16i16(i16 undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = call <2 x i32> @llvm.experimental.vp.splat.v2i32(i32 undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = call <4 x i32> @llvm.experimental.vp.splat.v4i32(i32 undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <8 x i32> @llvm.experimental.vp.splat.v8i32(i32 undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %12 = call <16 x i32> @llvm.experimental.vp.splat.v16i32(i32 undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <2 x i64> @llvm.experimental.vp.splat.v2i64(i64 undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <4 x i64> @llvm.experimental.vp.splat.v4i64(i64 undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %15 = call <8 x i64> @llvm.experimental.vp.splat.v8i64(i64 undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %16 = call <16 x i64> @llvm.experimental.vp.splat.v16i64(i64 undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %17 = call <vscale x 2 x i8> @llvm.experimental.vp.splat.nxv2i8(i8 undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = call <vscale x 4 x i8> @llvm.experimental.vp.splat.nxv4i8(i8 undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %19 = call <vscale x 8 x i8> @llvm.experimental.vp.splat.nxv8i8(i8 undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %20 = call <vscale x 16 x i8> @llvm.experimental.vp.splat.nxv16i8(i8 undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = call <vscale x 2 x i16> @llvm.experimental.vp.splat.nxv2i16(i16 undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %22 = call <vscale x 4 x i16> @llvm.experimental.vp.splat.nxv4i16(i16 undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 8 x i16> @llvm.experimental.vp.splat.nxv8i16(i16 undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %24 = call <vscale x 16 x i16> @llvm.experimental.vp.splat.nxv16i16(i16 undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %25 = call <vscale x 2 x i32> @llvm.experimental.vp.splat.nxv2i32(i32 undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %26 = call <vscale x 4 x i32> @llvm.experimental.vp.splat.nxv4i32(i32 undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %27 = call <vscale x 8 x i32> @llvm.experimental.vp.splat.nxv8i32(i32 undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %28 = call <vscale x 16 x i32> @llvm.experimental.vp.splat.nxv16i32(i32 undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 2 x i64> @llvm.experimental.vp.splat.nxv2i64(i64 undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %30 = call <vscale x 4 x i64> @llvm.experimental.vp.splat.nxv4i64(i64 undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %31 = call <vscale x 8 x i64> @llvm.experimental.vp.splat.nxv8i64(i64 undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %32 = call <vscale x 16 x i64> @llvm.experimental.vp.splat.nxv16i64(i64 undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; TYPEBASED-LABEL: 'splat'
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = call <2 x i8> @llvm.experimental.vp.splat.v2i8(i8 undef, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = call <4 x i8> @llvm.experimental.vp.splat.v4i8(i8 undef, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = call <8 x i8> @llvm.experimental.vp.splat.v8i8(i8 undef, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = call <16 x i8> @llvm.experimental.vp.splat.v16i8(i8 undef, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = call <2 x i16> @llvm.experimental.vp.splat.v2i16(i16 undef, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = call <4 x i16> @llvm.experimental.vp.splat.v4i16(i16 undef, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <8 x i16> @llvm.experimental.vp.splat.v8i16(i16 undef, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <16 x i16> @llvm.experimental.vp.splat.v16i16(i16 undef, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = call <2 x i32> @llvm.experimental.vp.splat.v2i32(i32 undef, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = call <4 x i32> @llvm.experimental.vp.splat.v4i32(i32 undef, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <8 x i32> @llvm.experimental.vp.splat.v8i32(i32 undef, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %12 = call <16 x i32> @llvm.experimental.vp.splat.v16i32(i32 undef, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <2 x i64> @llvm.experimental.vp.splat.v2i64(i64 undef, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <4 x i64> @llvm.experimental.vp.splat.v4i64(i64 undef, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %15 = call <8 x i64> @llvm.experimental.vp.splat.v8i64(i64 undef, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %16 = call <16 x i64> @llvm.experimental.vp.splat.v16i64(i64 undef, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %17 = call <vscale x 2 x i8> @llvm.experimental.vp.splat.nxv2i8(i8 undef, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %18 = call <vscale x 4 x i8> @llvm.experimental.vp.splat.nxv4i8(i8 undef, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %19 = call <vscale x 8 x i8> @llvm.experimental.vp.splat.nxv8i8(i8 undef, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %20 = call <vscale x 16 x i8> @llvm.experimental.vp.splat.nxv16i8(i8 undef, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %21 = call <vscale x 2 x i16> @llvm.experimental.vp.splat.nxv2i16(i16 undef, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %22 = call <vscale x 4 x i16> @llvm.experimental.vp.splat.nxv4i16(i16 undef, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 8 x i16> @llvm.experimental.vp.splat.nxv8i16(i16 undef, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %24 = call <vscale x 16 x i16> @llvm.experimental.vp.splat.nxv16i16(i16 undef, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %25 = call <vscale x 2 x i32> @llvm.experimental.vp.splat.nxv2i32(i32 undef, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %26 = call <vscale x 4 x i32> @llvm.experimental.vp.splat.nxv4i32(i32 undef, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %27 = call <vscale x 8 x i32> @llvm.experimental.vp.splat.nxv8i32(i32 undef, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %28 = call <vscale x 16 x i32> @llvm.experimental.vp.splat.nxv16i32(i32 undef, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 2 x i64> @llvm.experimental.vp.splat.nxv2i64(i64 undef, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %30 = call <vscale x 4 x i64> @llvm.experimental.vp.splat.nxv4i64(i64 undef, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %31 = call <vscale x 8 x i64> @llvm.experimental.vp.splat.nxv8i64(i64 undef, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %32 = call <vscale x 16 x i64> @llvm.experimental.vp.splat.nxv16i64(i64 undef, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+ call <2 x i8> @llvm.experimental.vp.splat.v2i8(i8 undef, <2 x i1> undef, i32 undef)
+ call <4 x i8> @llvm.experimental.vp.splat.v4i8(i8 undef, <4 x i1> undef, i32 undef)
+ call <8 x i8> @llvm.experimental.vp.splat.v8i8(i8 undef, <8 x i1> undef, i32 undef)
+ call <16 x i8> @llvm.experimental.vp.splat.v16i8(i8 undef, <16 x i1> undef, i32 undef)
+ call <2 x i16> @llvm.experimental.vp.splat.v2i16(i16 undef, <2 x i1> undef, i32 undef)
+ call <4 x i16> @llvm.experimental.vp.splat.v4i16(i16 undef, <4 x i1> undef, i32 undef)
+ call <8 x i16> @llvm.experimental.vp.splat.v8i16(i16 undef, <8 x i1> undef, i32 undef)
+ call <16 x i16> @llvm.experimental.vp.splat.v16i16(i16 undef, <16 x i1> undef, i32 undef)
+ call <2 x i32> @llvm.experimental.vp.splat.v2i32(i32 undef, <2 x i1> undef, i32 undef)
+ call <4 x i32> @llvm.experimental.vp.splat.v4i32(i32 undef, <4 x i1> undef, i32 undef)
+ call <8 x i32> @llvm.experimental.vp.splat.v8i32(i32 undef, <8 x i1> undef, i32 undef)
+ call <16 x i32> @llvm.experimental.vp.splat.v16i32(i32 undef, <16 x i1> undef, i32 undef)
+ call <2 x i64> @llvm.experimental.vp.splat.v2i64(i64 undef, <2 x i1> undef, i32 undef)
+ call <4 x i64> @llvm.experimental.vp.splat.v4i64(i64 undef, <4 x i1> undef, i32 undef)
+ call <8 x i64> @llvm.experimental.vp.splat.v8i64(i64 undef, <8 x i1> undef, i32 undef)
+ call <16 x i64> @llvm.experimental.vp.splat.v16i64(i64 undef, <16 x i1> undef, i32 undef)
+ call <vscale x 2 x i8> @llvm.experimental.vp.splat.nxv2i8(i8 undef, <vscale x 2 x i1> undef, i32 undef)
+ call <vscale x 4 x i8> @llvm.experimental.vp.splat.nxv4i8(i8 undef, <vscale x 4 x i1> undef, i32 undef)
+ call <vscale x 8 x i8> @llvm.experimental.vp.splat.nxv8i8(i8 undef, <vscale x 8 x i1> undef, i32 undef)
+ call <vscale x 16 x i8> @llvm.experimental.vp.splat.nxv16i8(i8 undef, <vscale x 16 x i1> undef, i32 undef)
+ call <vscale x 2 x i16> @llvm.experimental.vp.splat.nxv2i16(i16 undef, <vscale x 2 x i1> undef, i32 undef)
+ call <vscale x 4 x i16> @llvm.experimental.vp.splat.nxv4i16(i16 undef, <vscale x 4 x i1> undef, i32 undef)
+ call <vscale x 8 x i16> @llvm.experimental.vp.splat.nxv8i16(i16 undef, <vscale x 8 x i1> undef, i32 undef)
+ call <vscale x 16 x i16> @llvm.experimental.vp.splat.nxv16i16(i16 undef, <vscale x 16 x i1> undef, i32 undef)
+ call <vscale x 2 x i32> @llvm.experimental.vp.splat.nxv2i32(i32 undef, <vscale x 2 x i1> undef, i32 undef)
+ call <vscale x 4 x i32> @llvm.experimental.vp.splat.nxv4i32(i32 undef, <vscale x 4 x i1> undef, i32 undef)
+ call <vscale x 8 x i32> @llvm.experimental.vp.splat.nxv8i32(i32 undef, <vscale x 8 x i1> undef, i32 undef)
+ call <vscale x 16 x i32> @llvm.experimental.vp.splat.nxv16i32(i32 undef, <vscale x 16 x i1> undef, i32 undef)
+ call <vscale x 2 x i64> @llvm.experimental.vp.splat.nxv2i64(i64 undef, <vscale x 2 x i1> undef, i32 undef)
+ call <vscale x 4 x i64> @llvm.experimental.vp.splat.nxv4i64(i64 undef, <vscale x 4 x i1> undef, i32 undef)
+ call <vscale x 8 x i64> @llvm.experimental.vp.splat.nxv8i64(i64 undef, <vscale x 8 x i1> undef, i32 undef)
+ call <vscale x 16 x i64> @llvm.experimental.vp.splat.nxv16i64(i64 undef, <vscale x 16 x i1> undef, i32 undef)
+ ret void
+}
+
declare <2 x i8> @llvm.vp.add.v2i8(<2 x i8>, <2 x i8>, <2 x i1>, i32)
declare <4 x i8> @llvm.vp.add.v4i8(<4 x i8>, <4 x i8>, <4 x i1>, i32)
declare <8 x i8> @llvm.vp.add.v8i8(<8 x i8>, <8 x i8>, <8 x i1>, i32)
|
@@ -1155,6 +1155,11 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, | |||
return getCmpSelInstrCost(Instruction::Select, ICA.getReturnType(), | |||
ICA.getArgTypes()[0], CmpInst::BAD_ICMP_PREDICATE, | |||
CostKind); | |||
case Intrinsic::experimental_vp_splat: { | |||
auto LT = getTypeLegalizationCost(RetTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the hasV check, and it looks like this is only handling integer (not FP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also the i1 case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no V instructions then getTypeLegalizationCost
will return an invalid type legalization cost for scalable vectors, or a scalar legalized type for fixed vectors, both of which will end up in invalid cost. We could be more explicit about this though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixed vector case will probably end up passing a scalar type to the getRISCVInstructionCost routine which I don't think is what we want. Please add the same type of guard we see elsewhere in this switch for the moment, we can come back and revisit in batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I've made it return invalid rather than falling out of the switch since we can't scalarize scalable nor fixed versions of this intrinsic yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, can you add handling for fixed vectors (only) in BasicTTI via getShuffleCost? We should be modeling the scalarization here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look into this, but ended up in a bit of a rabbit hole. The default BasicTTIImpl scalarization implementation calls into the InsertElement/ExtractElement cost, but currently we return a cost of 0 for constant indices which at least isn't true for the inserts generated from a scalarized splat shuffle, for example.
However changing the InsertElement/ExtractElement cost seems somewhat sensitive, see #67334. I think we may want to make the cost "dumber", e.g. just a single scalar load/store for any insert or extract, but we would need to double check this doesn't cause excessive unrolling.
This is more complicated than what I first thought it would be, so leaving a note here to maybe return to later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM. Thanks! |
This is split off from #115274. There doesn't seem to be an easy way to share this with getShuffleCost since that requires passing in a real insert_element operand to get it to recognise it's a scalar splat.
For i1 vectors we can't currently lower them so it returns an invalid cost.