Skip to content

[RISCV] Use subreg extract for extract_vector_elt when vlen is known #72666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 27, 2023

Conversation

preames
Copy link
Collaborator

@preames preames commented Nov 17, 2023

This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option.

For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index.

This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul.

This is the first in a planned patch series to teach our vector lowering
how to exploit register boundaries in LMUL>1 types when VLEN is known
to be an exact constant.  This corresponds to code compiled by clang
with the -mrvv-vector-bits=zvl option.

For extract_vector_elt, if we have a constant index and a known vlen,
then we can identify which register out of a register group is being
accessed.  Given this, we can do a sub-register extract for that
register, and then shift any remaining index.

This results in all constant index extracts becoming m1 operations,
and thus eliminates the complexity concern for explode-vector idioms
at high lmul.
@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2023

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option.

For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index.

This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul.


Full diff: https://github.com/llvm/llvm-project/pull/72666.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll (+25-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll (+28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+130)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f89f300a4e9e50c..e9e6e92ea06fbac 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -7895,6 +7895,29 @@ SDValue RISCVTargetLowering::lowerEXTRACT_VECTOR_ELT(SDValue Op,
     Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
   }
 
+  // If we're compiling for an exact VLEN value and we have a known
+  // constant index, we can always perform the extract in m1 (or
+  // smaller) as we can determine the register corresponding to
+  // the index in the register group.
+  const unsigned MinVLen = Subtarget.getRealMinVLen();
+  const unsigned MaxVLen = Subtarget.getRealMaxVLen();
+  if (auto *IdxC = dyn_cast<ConstantSDNode>(Idx);
+      IdxC && MinVLen == MaxVLen &&
+      VecVT.getSizeInBits().getKnownMinValue() > MinVLen) {
+    unsigned OrigIdx = IdxC->getZExtValue();
+    EVT ElemVT = VecVT.getVectorElementType();
+    unsigned ElemSize = ElemVT.getSizeInBits().getKnownMinValue();
+    unsigned ElemsPerVReg = MinVLen / ElemSize;
+    unsigned RemIdx = OrigIdx % ElemsPerVReg;
+    unsigned SubRegIdx = OrigIdx / ElemsPerVReg;
+    unsigned ExtractIdx =
+      SubRegIdx * ContainerVT.getVectorElementCount().getKnownMinValue();
+    ContainerVT = getLMUL1VT(ContainerVT);
+    Vec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ContainerVT, Vec,
+                      DAG.getVectorIdxConstant(ExtractIdx, DL));
+    Idx = DAG.getVectorIdxConstant(RemIdx, DL);
+  }
+
   // Reduce the LMUL of our slidedown and vmv.x.s to the smallest LMUL which
   // contains our index.
   std::optional<uint64_t> MaxIdx;
diff --git a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
index 34dcce3fe058bc9..9df0871046959ed 100644
--- a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
@@ -697,6 +697,27 @@ define i64 @extractelt_nxv8i64_imm(<vscale x 8 x i64> %v) {
   ret i64 %r
 }
 
+define i64 @extractelt_nxv8i64_2_exact_vlen(<vscale x 8 x i64> %v) vscale_range(2,2) {
+; CHECK-LABEL: extractelt_nxv8i64_2_exact_vlen:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; CHECK-NEXT:    vmv.x.s a0, v9
+; CHECK-NEXT:    ret
+  %r = extractelement <vscale x 8 x i64> %v, i32 2
+  ret i64 %r
+}
+
+define i64 @extractelt_nxv8i64_15_exact_vlen(<vscale x 8 x i64> %v) vscale_range(2,2) {
+; CHECK-LABEL: extractelt_nxv8i64_15_exact_vlen:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v15, 1
+; CHECK-NEXT:    vmv.x.s a0, v8
+; CHECK-NEXT:    ret
+  %r = extractelement <vscale x 8 x i64> %v, i32 15
+  ret i64 %r
+}
+
 define i64 @extractelt_nxv8i64_idx(<vscale x 8 x i64> %v, i32 zeroext %idx) {
 ; CHECK-LABEL: extractelt_nxv8i64_idx:
 ; CHECK:       # %bb.0:
@@ -860,10 +881,10 @@ define i64 @extractelt_nxv16i64_neg1(<vscale x 16 x i64> %v) {
 ; CHECK-NEXT:    slli a2, a2, 1
 ; CHECK-NEXT:    addi a2, a2, -1
 ; CHECK-NEXT:    vs8r.v v16, (a3)
-; CHECK-NEXT:    bltu a2, a1, .LBB72_2
+; CHECK-NEXT:    bltu a2, a1, .LBB74_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    mv a2, a1
-; CHECK-NEXT:  .LBB72_2:
+; CHECK-NEXT:  .LBB74_2:
 ; CHECK-NEXT:    slli a2, a2, 3
 ; CHECK-NEXT:    add a0, a0, a2
 ; CHECK-NEXT:    ld a0, 0(a0)
@@ -893,10 +914,10 @@ define i64 @extractelt_nxv16i64_idx(<vscale x 16 x i64> %v, i32 zeroext %idx) {
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a2, a1, 1
 ; CHECK-NEXT:    addi a2, a2, -1
-; CHECK-NEXT:    bltu a0, a2, .LBB74_2
+; CHECK-NEXT:    bltu a0, a2, .LBB76_2
 ; CHECK-NEXT:  # %bb.1:
 ; CHECK-NEXT:    mv a0, a2
-; CHECK-NEXT:  .LBB74_2:
+; CHECK-NEXT:  .LBB76_2:
 ; CHECK-NEXT:    addi sp, sp, -80
 ; CHECK-NEXT:    .cfi_def_cfa_offset 80
 ; CHECK-NEXT:    sd ra, 72(sp) # 8-byte Folded Spill
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
index 95c1beb284c4003..d3c4b0f5cddd127 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
@@ -1137,3 +1137,31 @@ define float @extractelt_fdiv_v4f32(<4 x float> %x) {
   %ext = extractelement <4 x float> %bo, i32 2
   ret float %ext
 }
+
+define i32 @extractelt_v16i32_idx7_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
+; CHECK-LABEL: extractelt_v16i32_idx7_exact_vlen:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v9, 3
+; CHECK-NEXT:    vmv.x.s a0, v8
+; CHECK-NEXT:    ret
+  %a = load <16 x i32>, ptr %x
+  %b = extractelement <16 x i32> %a, i32 7
+  ret i32 %b
+}
+
+define i32 @extractelt_v16i32_idx15_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
+; CHECK-LABEL: extractelt_v16i32_idx15_exact_vlen:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; CHECK-NEXT:    vslidedown.vi v8, v11, 3
+; CHECK-NEXT:    vmv.x.s a0, v8
+; CHECK-NEXT:    ret
+  %a = load <16 x i32>, ptr %x
+  %b = extractelement <16 x i32> %a, i32 15
+  ret i32 %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll
index f3570495600f3c3..e5bbbd661e6a1df 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll
@@ -1084,3 +1084,133 @@ define i64 @explode_16xi64(<16 x i64> %v) {
   %add14 = add i64 %add13, %e15
   ret i64 %add14
 }
+
+define i32 @explode_16xi32_exact_vlen(<16 x i32> %v) vscale_range(2, 2) {
+; RV32-LABEL: explode_16xi32_exact_vlen:
+; RV32:       # %bb.0:
+; RV32-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV32-NEXT:    vslidedown.vi v12, v8, 2
+; RV32-NEXT:    vmv.x.s a0, v12
+; RV32-NEXT:    vslidedown.vi v12, v8, 3
+; RV32-NEXT:    vmv.x.s a1, v12
+; RV32-NEXT:    vmv.x.s a2, v9
+; RV32-NEXT:    vslidedown.vi v12, v9, 1
+; RV32-NEXT:    vmv.x.s a3, v12
+; RV32-NEXT:    vslidedown.vi v12, v9, 2
+; RV32-NEXT:    vmv.x.s a4, v12
+; RV32-NEXT:    vslidedown.vi v9, v9, 3
+; RV32-NEXT:    vmv.x.s a5, v9
+; RV32-NEXT:    vmv.x.s a6, v10
+; RV32-NEXT:    vslidedown.vi v9, v10, 1
+; RV32-NEXT:    vmv.x.s a7, v9
+; RV32-NEXT:    vslidedown.vi v9, v10, 2
+; RV32-NEXT:    vmv.x.s t0, v9
+; RV32-NEXT:    vslidedown.vi v9, v10, 3
+; RV32-NEXT:    vmv.x.s t1, v9
+; RV32-NEXT:    vmv.x.s t2, v11
+; RV32-NEXT:    vslidedown.vi v9, v11, 1
+; RV32-NEXT:    vmv.x.s t3, v9
+; RV32-NEXT:    vslidedown.vi v9, v11, 2
+; RV32-NEXT:    vmv.x.s t4, v9
+; RV32-NEXT:    vslidedown.vi v9, v11, 3
+; RV32-NEXT:    vmv.x.s t5, v9
+; RV32-NEXT:    vmv.s.x v9, zero
+; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV32-NEXT:    vredxor.vs v8, v8, v9
+; RV32-NEXT:    vmv.x.s t6, v8
+; RV32-NEXT:    add a0, a0, a1
+; RV32-NEXT:    add a0, t6, a0
+; RV32-NEXT:    add a2, a2, a3
+; RV32-NEXT:    add a2, a2, a4
+; RV32-NEXT:    add a0, a0, a2
+; RV32-NEXT:    add a5, a5, a6
+; RV32-NEXT:    add a5, a5, a7
+; RV32-NEXT:    add a5, a5, t0
+; RV32-NEXT:    add a0, a0, a5
+; RV32-NEXT:    add t1, t1, t2
+; RV32-NEXT:    add t1, t1, t3
+; RV32-NEXT:    add t1, t1, t4
+; RV32-NEXT:    add t1, t1, t5
+; RV32-NEXT:    add a0, a0, t1
+; RV32-NEXT:    ret
+;
+; RV64-LABEL: explode_16xi32_exact_vlen:
+; RV64:       # %bb.0:
+; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; RV64-NEXT:    vslidedown.vi v12, v8, 2
+; RV64-NEXT:    vmv.x.s a0, v12
+; RV64-NEXT:    vslidedown.vi v12, v8, 3
+; RV64-NEXT:    vmv.x.s a1, v12
+; RV64-NEXT:    vmv.x.s a2, v9
+; RV64-NEXT:    vslidedown.vi v12, v9, 1
+; RV64-NEXT:    vmv.x.s a3, v12
+; RV64-NEXT:    vslidedown.vi v12, v9, 2
+; RV64-NEXT:    vmv.x.s a4, v12
+; RV64-NEXT:    vslidedown.vi v9, v9, 3
+; RV64-NEXT:    vmv.x.s a5, v9
+; RV64-NEXT:    vmv.x.s a6, v10
+; RV64-NEXT:    vslidedown.vi v9, v10, 1
+; RV64-NEXT:    vmv.x.s a7, v9
+; RV64-NEXT:    vslidedown.vi v9, v10, 2
+; RV64-NEXT:    vmv.x.s t0, v9
+; RV64-NEXT:    vslidedown.vi v9, v10, 3
+; RV64-NEXT:    vmv.x.s t1, v9
+; RV64-NEXT:    vmv.x.s t2, v11
+; RV64-NEXT:    vslidedown.vi v9, v11, 1
+; RV64-NEXT:    vmv.x.s t3, v9
+; RV64-NEXT:    vslidedown.vi v9, v11, 2
+; RV64-NEXT:    vmv.x.s t4, v9
+; RV64-NEXT:    vslidedown.vi v9, v11, 3
+; RV64-NEXT:    vmv.x.s t5, v9
+; RV64-NEXT:    vmv.s.x v9, zero
+; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; RV64-NEXT:    vredxor.vs v8, v8, v9
+; RV64-NEXT:    vmv.x.s t6, v8
+; RV64-NEXT:    add a0, a0, a1
+; RV64-NEXT:    add a0, t6, a0
+; RV64-NEXT:    add a2, a2, a3
+; RV64-NEXT:    add a2, a2, a4
+; RV64-NEXT:    add a0, a0, a2
+; RV64-NEXT:    add a5, a5, a6
+; RV64-NEXT:    add a5, a5, a7
+; RV64-NEXT:    add a5, a5, t0
+; RV64-NEXT:    add a0, a0, a5
+; RV64-NEXT:    add t1, t1, t2
+; RV64-NEXT:    add t1, t1, t3
+; RV64-NEXT:    add t1, t1, t4
+; RV64-NEXT:    add t1, t1, t5
+; RV64-NEXT:    addw a0, a0, t1
+; RV64-NEXT:    ret
+  %e0 = extractelement <16 x i32> %v, i32 0
+  %e1 = extractelement <16 x i32> %v, i32 1
+  %e2 = extractelement <16 x i32> %v, i32 2
+  %e3 = extractelement <16 x i32> %v, i32 3
+  %e4 = extractelement <16 x i32> %v, i32 4
+  %e5 = extractelement <16 x i32> %v, i32 5
+  %e6 = extractelement <16 x i32> %v, i32 6
+  %e7 = extractelement <16 x i32> %v, i32 7
+  %e8 = extractelement <16 x i32> %v, i32 8
+  %e9 = extractelement <16 x i32> %v, i32 9
+  %e10 = extractelement <16 x i32> %v, i32 10
+  %e11 = extractelement <16 x i32> %v, i32 11
+  %e12 = extractelement <16 x i32> %v, i32 12
+  %e13 = extractelement <16 x i32> %v, i32 13
+  %e14 = extractelement <16 x i32> %v, i32 14
+  %e15 = extractelement <16 x i32> %v, i32 15
+  %add0 = xor i32 %e0, %e1
+  %add1 = add i32 %add0, %e2
+  %add2 = add i32 %add1, %e3
+  %add3 = add i32 %add2, %e4
+  %add4 = add i32 %add3, %e5
+  %add5 = add i32 %add4, %e6
+  %add6 = add i32 %add5, %e7
+  %add7 = add i32 %add6, %e8
+  %add8 = add i32 %add7, %e9
+  %add9 = add i32 %add8, %e10
+  %add10 = add i32 %add9, %e11
+  %add11 = add i32 %add10, %e12
+  %add12 = add i32 %add11, %e13
+  %add13 = add i32 %add12, %e14
+  %add14 = add i32 %add13, %e15
+  ret i32 %add14
+}

Copy link

github-actions bot commented Nov 17, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 7d1a9e81b0b59d020a52c789d659acb5ee5fdc41 c0ad734630a13f4b9da1df460db84c7fba5bfe6b -- llvm/lib/Target/RISCV/RISCVISelLowering.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index c5c75ae19d..ced5b6f08d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -7912,7 +7912,7 @@ SDValue RISCVTargetLowering::lowerEXTRACT_VECTOR_ELT(SDValue Op,
     unsigned RemIdx = OrigIdx % ElemsPerVReg;
     unsigned SubRegIdx = OrigIdx / ElemsPerVReg;
     unsigned ExtractIdx =
-      SubRegIdx * M1VT.getVectorElementCount().getKnownMinValue();
+        SubRegIdx * M1VT.getVectorElementCount().getKnownMinValue();
     Vec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, M1VT, Vec,
                       DAG.getVectorIdxConstant(ExtractIdx, DL));
     Idx = DAG.getVectorIdxConstant(RemIdx, DL);
@@ -16569,34 +16569,32 @@ RISCVTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
 
 #define PseudoVFCVT_RM_CASE(RMOpc, Opc)                                        \
   PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M1)                                     \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M2)                                     \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M4)                                     \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF2)                                    \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF4)
+      PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M2)                                 \
+          PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M4)                             \
+              PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF2)                        \
+                  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF4)
 
 #define PseudoVFCVT_RM_CASE_M8(RMOpc, Opc)                                     \
-  PseudoVFCVT_RM_CASE(RMOpc, Opc)                                              \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M8)
+  PseudoVFCVT_RM_CASE(RMOpc, Opc) PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M8)
 
 #define PseudoVFCVT_RM_CASE_MF8(RMOpc, Opc)                                    \
-  PseudoVFCVT_RM_CASE(RMOpc, Opc)                                              \
-  PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF8)
-
-  // VFCVT
-  PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_X_F_V, PseudoVFCVT_X_F_V)
-  PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_XU_F_V, PseudoVFCVT_XU_F_V)
-  PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_XU_V, PseudoVFCVT_F_XU_V)
-  PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_X_V, PseudoVFCVT_F_X_V)
-
-  // VFWCVT
-  PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_XU_F_V, PseudoVFWCVT_XU_F_V);
-  PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_X_F_V, PseudoVFWCVT_X_F_V);
-
-  // VFNCVT
-  PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_XU_F_W, PseudoVFNCVT_XU_F_W);
-  PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_X_F_W, PseudoVFNCVT_X_F_W);
-  PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_XU_W, PseudoVFNCVT_F_XU_W);
-  PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_X_W, PseudoVFNCVT_F_X_W);
+  PseudoVFCVT_RM_CASE(RMOpc, Opc) PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF8)
+
+    // VFCVT
+    PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_X_F_V, PseudoVFCVT_X_F_V)
+        PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_XU_F_V, PseudoVFCVT_XU_F_V)
+            PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_XU_V, PseudoVFCVT_F_XU_V)
+                PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_X_V, PseudoVFCVT_F_X_V)
+
+        // VFWCVT
+        PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_XU_F_V, PseudoVFWCVT_XU_F_V);
+    PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_X_F_V, PseudoVFWCVT_X_F_V);
+
+    // VFNCVT
+    PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_XU_F_W, PseudoVFNCVT_XU_F_W);
+    PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_X_F_W, PseudoVFNCVT_X_F_W);
+    PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_XU_W, PseudoVFNCVT_F_XU_W);
+    PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_X_W, PseudoVFNCVT_F_X_W);
 
   case RISCV::PseudoVFROUND_NOEXCEPT_V_M1_MASK:
     return emitVFROUND_NOEXCEPT_MASK(MI, BB, RISCV::PseudoVFCVT_X_F_V_M1_MASK,

@lukel97 lukel97 requested review from lukel97 and removed request for luke957 November 20, 2023 06:12
unsigned SubRegIdx = OrigIdx / ElemsPerVReg;
unsigned ExtractIdx =
SubRegIdx * ContainerVT.getVectorElementCount().getKnownMinValue();
ContainerVT = getLMUL1VT(ContainerVT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the ExtractIdx not need to be computed in terms of the LMUL1 ContainerVT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does. I checked out this patch and all of the tests hit an assert with the current code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird. I know exactly what the problem is here; I moved the redefinition of ContainerVT below the ElementIdx initialization. The weird part is that I swear I saw that bug downstream, and fixed it before pushing this branch for review. And yet, the broken code is both in the remote and local branches. All I can think is that I screwed up a rebase and lost a change.

Sorry for the noise here, will republish correct patch once back from holiday.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed version pushed, sorry again for the noise.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit cf17a24 into llvm:main Nov 27, 2023
@preames preames deleted the riscv-extract-vector-elt-exact-vlen branch November 27, 2023 22:33
preames added a commit to preames/llvm-project that referenced this pull request Nov 28, 2023
…#72666)

If we have a constant index and a known vlen, then we can identify which
registers out of a register group is being accessed.  Given this, we can
reuse the (slightly generalized) existing handling for working on
sub-register groups.  This results in all constant index extracts with
known vlen becoming m1 operations.

One bit of weirdness to highlight and explain: the existing code uses
the VL from the original vector type, not the inner vector type.  This
is correct because the inner register group must be smaller than the
original (possibly fixed length) vector type.  Overall, this seems to
a reasonable codegen tradeoff as it biases us towards immediate AVLs,
which avoids needing the vsetvli form which clobbers a GPR for no
real purpose.  The downside is that for large fixed length vectors, we
end up materializing an immediate in register for little value.  We
should probably generalize this idea and try to optimize the large
fixed length vector case, but that can be done in separate work.
preames added a commit that referenced this pull request Nov 28, 2023
…) (#73680)

If we have a constant index and a known vlen, then we can identify which
registers out of a register group is being accessed. Given this, we can
reuse the (slightly generalized) existing handling for working on
sub-register groups. This results in all constant index extracts with
known vlen becoming m1 operations.

One bit of weirdness to highlight and explain: the existing code uses
the VL from the original vector type, not the inner vector type. This is
correct because the inner register group must be smaller than the
original (possibly fixed length) vector type. Overall, this seems to a
reasonable codegen tradeoff as it biases us towards immediate AVLs,
which avoids needing the vsetvli form which clobbers a GPR for no real
purpose. The downside is that for large fixed length vectors, we end up
materializing an immediate in register for little value. We should
probably generalize this idea and try to optimize the large fixed length
vector case, but that can be done in separate work.
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Nov 30, 2023
Local branch amd-gfx 5118390 Merged main:202dda8e5c3f into amd-gfx:6034dce6d758
Remote branch main 02cbae4 [RISCV] Work on subreg for insert_vector_elt when vlen is known (llvm#72666) (llvm#73680)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants