[RISCV] Handle .vx/.vi pseudos in hasAllNBitUsers #67419

lukel97 · 2023-09-26T12:08:12Z

Vector pseudos with scalar operands only use the lower SEW bits (or less in the
case of shifts and clips). This patch accounts for this in hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.

There's a lot of duplication between RISCVISelDAGToDAG and RISCVOptWInstrs here, but the rest of hasAllNBitUsers seems to be duplicated too. Happy to move this somewhere if preferred

Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart.

llvmbot · 2023-09-26T12:09:18Z

@llvm/pr-subscribers-backend-risc-v

Changes

Vector pseudos with scalar operands only use the lower SEW bits (or less in the
case of shifts and clips). This patch accounts for this in hasAllNBitUsers for
both SDNodes in RISCVISelDAGToDAG. We also need to handle this in
RISCVOptWInstrs otherwise we introduce slliw instructions that are less
compressible than their original slli counterpart.

This could be extended later to account for the VL operand (and potentially the case where SEW > XLEN?)

There's a lot of duplication between RISCVISelDAGToDAG and RISCVOptWInstrs here, but the rest of hasAllNBitUsers seems to be duplicated too. Happy to move this somewhere if preferred

Patch is 97.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67419.diff

11 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+171)
(modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+164)
(modified) llvm/test/CodeGen/RISCV/rvv/constant-folding.ll (+12-20)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll (+32-61)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll (+4-8)
(modified) llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll (+614-498)
(modified) llvm/test/CodeGen/RISCV/rvv/sshl_sat_vec.ll (+4-8)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+29-55)
(modified) llvm/test/CodeGen/RISCV/rvv/vmulh-sdnode.ll (+27-60)
(modified) llvm/test/CodeGen/RISCV/rvv/vmulhu-sdnode.ll (+24-60)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-int-vp.ll (+7-14)

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 140473c595bbeb7..6925ab557e47776 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -2752,6 +2752,175 @@ bool RISCVDAGToDAGISel::selectSHXADD_UWOp(SDValue N, unsigned ShAmt,
   return false;
 }
 
+static bool vectorPseudoHasAllNBitUsers(SDNode *User, unsigned UserOpNo,
+                                        unsigned Bits,
+                                        const TargetInstrInfo *TII) {
+  const RISCVVPseudosTable::PseudoInfo *PseudoInfo =
+      RISCVVPseudosTable::getPseudoInfo(User->getMachineOpcode());
+
+  if (!PseudoInfo)
+    return false;
+
+  const MCInstrDesc &MCID = TII->get(User->getMachineOpcode());
+  const uint64_t TSFlags = MCID.TSFlags;
+  if (!RISCVII::hasSEWOp(TSFlags))
+    return false;
+  assert(RISCVII::hasVLOp(TSFlags));
+
+  bool HasGlueOp = User->getGluedNode() != nullptr;
+  unsigned ChainOpIdx = User->getNumOperands() - HasGlueOp - 1;
+  bool HasChainOp = User->getOperand(ChainOpIdx).getValueType() == MVT::Other;
+  bool HasVecPolicyOp = RISCVII::hasVecPolicyOp(TSFlags);
+  unsigned VLIdx =
+      User->getNumOperands() - HasVecPolicyOp - HasChainOp - HasGlueOp - 2;
+  const unsigned Log2SEW = User->getConstantOperandVal(VLIdx + 1);
+
+  // TODO: The Largest VL 65,536 occurs for LMUL=8 and SEW=8 with
+  // VLEN=65,536. We could check if Bits < 16 here.
+  if (UserOpNo == VLIdx)
+    return false;
+
+  // TODO: Handle Zvbb instructions
+  switch (PseudoInfo->BaseInstr) {
+  default:
+    return false;
+
+  // 11.6. Vector Single-Width Shift Instructions
+  case RISCV::VSLL_VX:
+  case RISCV::VSLL_VI:
+  case RISCV::VSRL_VX:
+  case RISCV::VSRL_VI:
+  case RISCV::VSRA_VX:
+  case RISCV::VSRA_VI:
+  // 12.4. Vector Single-Width Scaling Shift Instructions
+  case RISCV::VSSRL_VX:
+  case RISCV::VSSRL_VI:
+  case RISCV::VSSRA_VX:
+  case RISCV::VSSRA_VI:
+    // Only the low lg2(SEW) bits of the shift-amount value are used.
+    if (Bits < Log2SEW)
+      return false;
+    break;
+
+  // 11.7 Vector Narrowing Integer Right Shift Instructions
+  case RISCV::VNSRL_WX:
+  case RISCV::VNSRL_WI:
+  case RISCV::VNSRA_WX:
+  case RISCV::VNSRA_WI:
+  // 12.5. Vector Narrowing Fixed-Point Clip Instructions
+  case RISCV::VNCLIPU_WX:
+  case RISCV::VNCLIPU_WI:
+  case RISCV::VNCLIP_WX:
+  case RISCV::VNCLIP_WI:
+    // Only the low lg2(2*SEW) bits of the shift-amount value are used.
+    if (Bits < Log2SEW + 1)
+      return false;
+    break;
+
+  // 11.1. Vector Single-Width Integer Add and Subtract
+  case RISCV::VADD_VX:
+  case RISCV::VADD_VI:
+  case RISCV::VSUB_VX:
+  case RISCV::VRSUB_VX:
+  case RISCV::VRSUB_VI:
+  // 11.2. Vector Widening Integer Add/Subtract
+  case RISCV::VWADDU_VX:
+  case RISCV::VWSUBU_VX:
+  case RISCV::VWADD_VX:
+  case RISCV::VWSUB_VX:
+  case RISCV::VWADDU_WX:
+  case RISCV::VWSUBU_WX:
+  case RISCV::VWADD_WX:
+  case RISCV::VWSUB_WX:
+  // 11.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions
+  case RISCV::VADC_VXM:
+  case RISCV::VADC_VIM:
+  case RISCV::VMADC_VXM:
+  case RISCV::VMADC_VIM:
+  case RISCV::VMADC_VX:
+  case RISCV::VMADC_VI:
+  case RISCV::VSBC_VXM:
+  case RISCV::VMSBC_VXM:
+  case RISCV::VMSBC_VX:
+  // 11.5 Vector Bitwise Logical Instructions
+  case RISCV::VAND_VX:
+  case RISCV::VAND_VI:
+  case RISCV::VOR_VX:
+  case RISCV::VOR_VI:
+  case RISCV::VXOR_VX:
+  case RISCV::VXOR_VI:
+  // 11.8. Vector Integer Compare Instructions
+  case RISCV::VMSEQ_VX:
+  case RISCV::VMSEQ_VI:
+  case RISCV::VMSNE_VX:
+  case RISCV::VMSNE_VI:
+  case RISCV::VMSLTU_VX:
+  case RISCV::VMSLT_VX:
+  case RISCV::VMSLEU_VX:
+  case RISCV::VMSLEU_VI:
+  case RISCV::VMSLE_VX:
+  case RISCV::VMSLE_VI:
+  case RISCV::VMSGTU_VX:
+  case RISCV::VMSGTU_VI:
+  case RISCV::VMSGT_VX:
+  case RISCV::VMSGT_VI:
+  // 11.9. Vector Integer Min/Max Instructions
+  case RISCV::VMINU_VX:
+  case RISCV::VMIN_VX:
+  case RISCV::VMAXU_VX:
+  case RISCV::VMAX_VX:
+  // 11.10. Vector Single-Width Integer Multiply Instructions
+  case RISCV::VMUL_VX:
+  case RISCV::VMULH_VX:
+  case RISCV::VMULHU_VX:
+  case RISCV::VMULHSU_VX:
+  // 11.11. Vector Integer Divide Instructions
+  case RISCV::VDIVU_VX:
+  case RISCV::VDIV_VX:
+  case RISCV::VREMU_VX:
+  case RISCV::VREM_VX:
+  // 11.12. Vector Widening Integer Multiply Instructions
+  case RISCV::VWMUL_VX:
+  case RISCV::VWMULU_VX:
+  case RISCV::VWMULSU_VX:
+  // 11.13. Vector Single-Width Integer Multiply-Add Instructions
+  case RISCV::VMACC_VX:
+  case RISCV::VNMSAC_VX:
+  case RISCV::VMADD_VX:
+  case RISCV::VNMSUB_VX:
+  // 11.14. Vector Widening Integer Multiply-Add Instructions
+  case RISCV::VWMACCU_VX:
+  case RISCV::VWMACC_VX:
+  case RISCV::VWMACCSU_VX:
+  case RISCV::VWMACCUS_VX:
+  // 11.15. Vector Integer Merge Instructions
+  case RISCV::VMERGE_VXM:
+  case RISCV::VMERGE_VIM:
+  // 11.16. Vector Integer Move Instructions
+  case RISCV::VMV_V_X:
+  case RISCV::VMV_V_I:
+  // 12.1. Vector Single-Width Saturating Add and Subtract
+  case RISCV::VSADDU_VX:
+  case RISCV::VSADDU_VI:
+  case RISCV::VSADD_VX:
+  case RISCV::VSADD_VI:
+  case RISCV::VSSUBU_VX:
+  case RISCV::VSSUB_VX:
+  // 12.2. Vector Single-Width Averaging Add and Subtract
+  case RISCV::VAADDU_VX:
+  case RISCV::VAADD_VX:
+  case RISCV::VASUBU_VX:
+  case RISCV::VASUB_VX:
+  // 12.3. Vector Single-Width Fractional Multiply with Rounding and Saturation
+  case RISCV::VSMUL_VX:
+  // 16.1. Integer Scalar Move Instructions
+  case RISCV::VMV_S_X:
+    if (Bits < (1 << Log2SEW))
+      return false;
+  }
+  return true;
+}
+
 // Return true if all users of this SDNode* only consume the lower \p Bits.
 // This can be used to form W instructions for add/sub/mul/shl even when the
 // root isn't a sext_inreg. This can allow the ADDW/SUBW/MULW/SLLIW to CSE if
@@ -2783,6 +2952,8 @@ bool RISCVDAGToDAGISel::hasAllNBitUsers(SDNode *Node, unsigned Bits,
     // TODO: Add more opcodes?
     switch (User->getMachineOpcode()) {
     default:
+      if (vectorPseudoHasAllNBitUsers(User, UI.getOperandNo(), Bits, TII))
+        break;
       return false;
     case RISCV::ADDW:
     case RISCV::ADDIW:
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index bd294c669735f4f..f6353aa723c2fa7 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -77,6 +77,168 @@ FunctionPass *llvm::createRISCVOptWInstrsPass() {
   return new RISCVOptWInstrs();
 }
 
+static bool vectorPseudoHasAllNBitUsers(const MachineOperand &UserOp,
+                                        unsigned Bits) {
+  const MachineInstr &MI = *UserOp.getParent();
+  const RISCVVPseudosTable::PseudoInfo *PseudoInfo =
+      RISCVVPseudosTable::getPseudoInfo(MI.getOpcode());
+
+  if (!PseudoInfo)
+    return false;
+
+  const MCInstrDesc &MCID = MI.getDesc();
+  const uint64_t TSFlags = MI.getDesc().TSFlags;
+  if (!RISCVII::hasSEWOp(TSFlags))
+    return false;
+  assert(RISCVII::hasVLOp(TSFlags));
+  const unsigned Log2SEW = MI.getOperand(RISCVII::getSEWOpNum(MCID)).getImm();
+
+  // TODO: The Largest VL 65,536 occurs for LMUL=8 and SEW=8 with
+  // VLEN=65,536. We could check if Bits < 16 here.
+  if (UserOp.getOperandNo() == RISCVII::getVLOpNum(MCID))
+    return false;
+
+  // TODO: Handle Zvbb instructions
+  switch (PseudoInfo->BaseInstr) {
+  default:
+    return false;
+
+  // 11.6. Vector Single-Width Shift Instructions
+  case RISCV::VSLL_VX:
+  case RISCV::VSLL_VI:
+  case RISCV::VSRL_VX:
+  case RISCV::VSRL_VI:
+  case RISCV::VSRA_VX:
+  case RISCV::VSRA_VI:
+  // 12.4. Vector Single-Width Scaling Shift Instructions
+  case RISCV::VSSRL_VX:
+  case RISCV::VSSRL_VI:
+  case RISCV::VSSRA_VX:
+  case RISCV::VSSRA_VI:
+    // Only the low lg2(SEW) bits of the shift-amount value are used.
+    if (Bits < Log2SEW)
+      return false;
+    break;
+
+  // 11.7 Vector Narrowing Integer Right Shift Instructions
+  case RISCV::VNSRL_WX:
+  case RISCV::VNSRL_WI:
+  case RISCV::VNSRA_WX:
+  case RISCV::VNSRA_WI:
+  // 12.5. Vector Narrowing Fixed-Point Clip Instructions
+  case RISCV::VNCLIPU_WX:
+  case RISCV::VNCLIPU_WI:
+  case RISCV::VNCLIP_WX:
+  case RISCV::VNCLIP_WI:
+    // Only the low lg2(2*SEW) bits of the shift-amount value are used.
+    if (Bits < Log2SEW + 1)
+      return false;
+    break;
+
+  // 11.1. Vector Single-Width Integer Add and Subtract
+  case RISCV::VADD_VX:
+  case RISCV::VADD_VI:
+  case RISCV::VSUB_VX:
+  case RISCV::VRSUB_VX:
+  case RISCV::VRSUB_VI:
+  // 11.2. Vector Widening Integer Add/Subtract
+  case RISCV::VWADDU_VX:
+  case RISCV::VWSUBU_VX:
+  case RISCV::VWADD_VX:
+  case RISCV::VWSUB_VX:
+  case RISCV::VWADDU_WX:
+  case RISCV::VWSUBU_WX:
+  case RISCV::VWADD_WX:
+  case RISCV::VWSUB_WX:
+  // 11.4. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions
+  case RISCV::VADC_VXM:
+  case RISCV::VADC_VIM:
+  case RISCV::VMADC_VXM:
+  case RISCV::VMADC_VIM:
+  case RISCV::VMADC_VX:
+  case RISCV::VMADC_VI:
+  case RISCV::VSBC_VXM:
+  case RISCV::VMSBC_VXM:
+  case RISCV::VMSBC_VX:
+  // 11.5 Vector Bitwise Logical Instructions
+  case RISCV::VAND_VX:
+  case RISCV::VAND_VI:
+  case RISCV::VOR_VX:
+  case RISCV::VOR_VI:
+  case RISCV::VXOR_VX:
+  case RISCV::VXOR_VI:
+  // 11.8. Vector Integer Compare Instructions
+  case RISCV::VMSEQ_VX:
+  case RISCV::VMSEQ_VI:
+  case RISCV::VMSNE_VX:
+  case RISCV::VMSNE_VI:
+  case RISCV::VMSLTU_VX:
+  case RISCV::VMSLT_VX:
+  case RISCV::VMSLEU_VX:
+  case RISCV::VMSLEU_VI:
+  case RISCV::VMSLE_VX:
+  case RISCV::VMSLE_VI:
+  case RISCV::VMSGTU_VX:
+  case RISCV::VMSGTU_VI:
+  case RISCV::VMSGT_VX:
+  case RISCV::VMSGT_VI:
+  // 11.9. Vector Integer Min/Max Instructions
+  case RISCV::VMINU_VX:
+  case RISCV::VMIN_VX:
+  case RISCV::VMAXU_VX:
+  case RISCV::VMAX_VX:
+  // 11.10. Vector Single-Width Integer Multiply Instructions
+  case RISCV::VMUL_VX:
+  case RISCV::VMULH_VX:
+  case RISCV::VMULHU_VX:
+  case RISCV::VMULHSU_VX:
+  // 11.11. Vector Integer Divide Instructions
+  case RISCV::VDIVU_VX:
+  case RISCV::VDIV_VX:
+  case RISCV::VREMU_VX:
+  case RISCV::VREM_VX:
+  // 11.12. Vector Widening Integer Multiply Instructions
+  case RISCV::VWMUL_VX:
+  case RISCV::VWMULU_VX:
+  case RISCV::VWMULSU_VX:
+  // 11.13. Vector Single-Width Integer Multiply-Add Instructions
+  case RISCV::VMACC_VX:
+  case RISCV::VNMSAC_VX:
+  case RISCV::VMADD_VX:
+  case RISCV::VNMSUB_VX:
+  // 11.14. Vector Widening Integer Multiply-Add Instructions
+  case RISCV::VWMACCU_VX:
+  case RISCV::VWMACC_VX:
+  case RISCV::VWMACCSU_VX:
+  case RISCV::VWMACCUS_VX:
+  // 11.15. Vector Integer Merge Instructions
+  case RISCV::VMERGE_VXM:
+  case RISCV::VMERGE_VIM:
+  // 11.16. Vector Integer Move Instructions
+  case RISCV::VMV_V_X:
+  case RISCV::VMV_V_I:
+  // 12.1. Vector Single-Width Saturating Add and Subtract
+  case RISCV::VSADDU_VX:
+  case RISCV::VSADDU_VI:
+  case RISCV::VSADD_VX:
+  case RISCV::VSADD_VI:
+  case RISCV::VSSUBU_VX:
+  case RISCV::VSSUB_VX:
+  // 12.2. Vector Single-Width Averaging Add and Subtract
+  case RISCV::VAADDU_VX:
+  case RISCV::VAADD_VX:
+  case RISCV::VASUBU_VX:
+  case RISCV::VASUB_VX:
+  // 12.3. Vector Single-Width Fractional Multiply with Rounding and Saturation
+  case RISCV::VSMUL_VX:
+  // 16.1. Integer Scalar Move Instructions
+  case RISCV::VMV_S_X:
+    if (Bits < (1 << Log2SEW))
+      return false;
+  }
+  return true;
+}
+
 // Checks if all users only demand the lower \p OrigBits of the original
 // instruction's result.
 // TODO: handle multiple interdependent transformations
@@ -107,6 +269,8 @@ static bool hasAllNBitUsers(const MachineInstr &OrigMI,
 
       switch (UserMI->getOpcode()) {
       default:
+        if (vectorPseudoHasAllNBitUsers(UserOp, Bits))
+          break;
         return false;
 
       case RISCV::ADDIW:
diff --git a/llvm/test/CodeGen/RISCV/rvv/constant-folding.ll b/llvm/test/CodeGen/RISCV/rvv/constant-folding.ll
index e3a878052ee19b6..98bc4081b3a34e7 100644
--- a/llvm/test/CodeGen/RISCV/rvv/constant-folding.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/constant-folding.ll
@@ -14,26 +14,15 @@
 ; a constant SPLAT_VECTOR didn't follow suit.
 
 define <2 x i16> @fixedlen(<2 x i32> %x) {
-; RV32-LABEL: fixedlen:
-; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; RV32-NEXT:    vsrl.vi v8, v8, 16
-; RV32-NEXT:    lui a0, 1048568
-; RV32-NEXT:    vand.vx v8, v8, a0
-; RV32-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; RV32-NEXT:    vnsrl.wi v8, v8, 0
-; RV32-NEXT:    ret
-;
-; RV64-LABEL: fixedlen:
-; RV64:       # %bb.0:
-; RV64-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; RV64-NEXT:    vsrl.vi v8, v8, 16
-; RV64-NEXT:    lui a0, 131071
-; RV64-NEXT:    slli a0, a0, 3
-; RV64-NEXT:    vand.vx v8, v8, a0
-; RV64-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
-; RV64-NEXT:    vnsrl.wi v8, v8, 0
-; RV64-NEXT:    ret
+; CHECK-LABEL: fixedlen:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-NEXT:    vsrl.vi v8, v8, 16
+; CHECK-NEXT:    lui a0, 1048568
+; CHECK-NEXT:    vand.vx v8, v8, a0
+; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vnsrl.wi v8, v8, 0
+; CHECK-NEXT:    ret
   %v41 = insertelement <2 x i32> poison, i32 16, i32 0
   %v42 = shufflevector <2 x i32> %v41, <2 x i32> poison, <2 x i32> zeroinitializer
   %v43 = lshr <2 x i32> %x, %v42
@@ -63,3 +52,6 @@ define <vscale x 2 x i16> @scalable(<vscale x 2 x i32> %x) {
   %v48 = and <vscale x 2 x i16> %v44, %v47
   ret <vscale x 2 x i16> %v48
 }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32: {{.*}}
+; RV64: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
index 3167bcf26837b6f..8e298d962edf173 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-deinterleave-load.ll
@@ -5,67 +5,35 @@
 ; Integers
 
 define {<16 x i1>, <16 x i1>} @vector_deinterleave_load_v16i1_v32i1(ptr %p) {
-; RV32-LABEL: vector_deinterleave_load_v16i1_v32i1:
-; RV32:       # %bb.0:
-; RV32-NEXT:    li a1, 32
-; RV32-NEXT:    vsetvli zero, a1, e8, m2, ta, ma
-; RV32-NEXT:    vlm.v v0, (a0)
-; RV32-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; RV32-NEXT:    vmv.v.i v8, 0
-; RV32-NEXT:    vmerge.vim v10, v8, 1, v0
-; RV32-NEXT:    vid.v v9
-; RV32-NEXT:    vadd.vv v11, v9, v9
-; RV32-NEXT:    vrgather.vv v9, v10, v11
-; RV32-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV32-NEXT:    vslidedown.vi v0, v0, 2
-; RV32-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; RV32-NEXT:    vmerge.vim v8, v8, 1, v0
-; RV32-NEXT:    vadd.vi v12, v11, -16
-; RV32-NEXT:    lui a0, 16
-; RV32-NEXT:    addi a0, a0, -256
-; RV32-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
-; RV32-NEXT:    vmv.s.x v0, a0
-; RV32-NEXT:    vsetivli zero, 16, e8, m1, ta, mu
-; RV32-NEXT:    vrgather.vv v9, v8, v12, v0.t
-; RV32-NEXT:    vmsne.vi v9, v9, 0
-; RV32-NEXT:    vadd.vi v12, v11, 1
-; RV32-NEXT:    vrgather.vv v13, v10, v12
-; RV32-NEXT:    vadd.vi v10, v11, -15
-; RV32-NEXT:    vrgather.vv v13, v8, v10, v0.t
-; RV32-NEXT:    vmsne.vi v8, v13, 0
-; RV32-NEXT:    vmv.v.v v0, v9
-; RV32-NEXT:    ret
-;
-; RV64-LABEL: vector_deinterleave_load_v16i1_v32i1:
-; RV64:       # %bb.0:
-; RV64-NEXT:    li a1, 32
-; RV64-NEXT:    vsetvli zero, a1, e8, m2, ta, ma
-; RV64-NEXT:    vlm.v v0, (a0)
-; RV64-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; RV64-NEXT:    vmv.v.i v8, 0
-; RV64-NEXT:    vmerge.vim v10, v8, 1, v0
-; RV64-NEXT:    vid.v v9
-; RV64-NEXT:    vadd.vv v11, v9, v9
-; RV64-NEXT:    vrgather.vv v9, v10, v11
-; RV64-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
-; RV64-NEXT:    vslidedown.vi v0, v0, 2
-; RV64-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; RV64-NEXT:    vmerge.vim v8, v8, 1, v0
-; RV64-NEXT:    vadd.vi v12, v11, -16
-; RV64-NEXT:    lui a0, 16
-; RV64-NEXT:    addiw a0, a0, -256
-; RV64-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
-; RV64-NEXT:    vmv.s.x v0, a0
-; RV64-NEXT:    vsetivli zero, 16, e8, m1, ta, mu
-; RV64-NEXT:    vrgather.vv v9, v8, v12, v0.t
-; RV64-NEXT:    vmsne.vi v9, v9, 0
-; RV64-NEXT:    vadd.vi v12, v11, 1
-; RV64-NEXT:    vrgather.vv v13, v10, v12
-; RV64-NEXT:    vadd.vi v10, v11, -15
-; RV64-NEXT:    vrgather.vv v13, v8, v10, v0.t
-; RV64-NEXT:    vmsne.vi v8, v13, 0
-; RV64-NEXT:    vmv.v.v v0, v9
-; RV64-NEXT:    ret
+; CHECK-LABEL: vector_deinterleave_load_v16i1_v32i1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    li a1, 32
+; CHECK-NEXT:    vsetvli zero, a1, e8, m2, ta, ma
+; CHECK-NEXT:    vlm.v v0, (a0)
+; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    vmerge.vim v10, v8, 1, v0
+; CHECK-NEXT:    vid.v v9
+; CHECK-NEXT:    vadd.vv v11, v9, v9
+; CHECK-NEXT:    vrgather.vv v9, v10, v11
+; CHECK-NEXT:    vsetivli zero, 2, e8, mf4, ta, ma
+; CHECK-NEXT:    vslidedown.vi v0, v0, 2
+; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-NEXT:    vadd.vi v12, v11, -16
+; CHECK-NEXT:    li a0, -256
+; CHECK-NEXT:    vsetivli zero, 1, e16, mf4, ta, ma
+; CHECK-NEXT:    vmv.s.x v0, a0
+; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, mu
+; CHECK-NEXT:    vrgather.vv v9, v8, v12, v0.t
+; CHECK-NEXT:    vmsne.vi v9, v9, 0
+; CHECK-NEXT:    vadd.vi v12, v11, 1
+; CHECK-NEXT:    vrgather.vv v13, v10, v12
+; CHECK-NEXT:    vadd.vi v10, v11, -15
+; CHECK-NEXT:    vrgather.vv v13, v8, v10, v0.t
+; CHECK-NEXT:    vmsne.vi v8, v13, 0
+; CHECK-NEXT:    vmv.v.v v0, v9
+; CHECK-NEXT:    ret
   %vec = load <32 x i1>, ptr %p
   %retval = call {<16 x i1>, <16 x i1>} @llvm.experimental.vector.deinterleave2.v32i1(<32 x i1> %vec)
   ret {<16 x i1>, <16 x i1>} %retval
@@ -211,3 +179,6 @@ declare {<2 x float>, <2 x float>} @llvm.experimental.vector.deinterleave2.v4f32
 declare {<8 x half>, <8 x half>} @llvm.experimental.vector.deinterleave2.v16f16(<16 x half>)
 declare {<4 x float>, <4 x float>} @llvm.experimental.vector.deinterleave2.v8f32(<8 x float>)
 declare {<2 x double>, <2 x double>} @llvm.experimental.vector.deinterleave2.v4f64(<4 x double>)
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; RV32: {{.*}}
+; RV64: {{.*}}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll
index f0a359c13ce5d3b..742002bda8a9e9c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll
@@ -585,8 +585,7 @@ define signext i32 @vpreduce_umax_v2i32(i32 signext %s, <2 x i32> %v, <2 x i1> %
 ;
 ; RV64-LABEL: vpreduce_umax_v2i32:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    slli a0, a0, 32
-; RV64-NEXT:    srli a0, a0, 32
+; RV64-NEXT:    andi a0, a0, -1
 ; RV64-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; RV64-NEXT:    vmv.s.x v9, a0
 ; RV64-NEXT:    vsetvli zero, a1, e32, mf2, ta, ma
@@ -626,8 +625,7 @@ define signext i32 @vpreduce_umin_v2i32(i32 signext %s, <2 x i32> %v, <2 x i1> %
 ;
 ; RV64-LABEL: vpreduce_umin_v2i32:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    slli a0, a0, 32
-; RV64-NEXT:    srli a0, a0, 32
+; RV64-NEXT:    andi a0, a0, -1
 ; RV64-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; RV64-NEXT:    vmv.s.x v9, a0
 ; RV64-NEXT:    vsetvli zero, a1, e32, mf2, ta, ma
@@ -727,8 +725,7 @@ define signext i32 @vpreduce_umax_v4i32(i32 signext %s, <4 x i32> %v, <4 x i1> %
 ;
 ; RV64-LABEL: vpreduce_umax_v4i32:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    slli a0, a0, 32
-; RV64-NEXT:    srli a0, a0, 32
+; RV64-NEXT:    andi a0, a0, -1
 ; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64-NEXT:    vmv.s.x v9, a0
 ; RV64-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
@@ -768,8 +765,7 @@ define signext i32 @vpreduce_umin_v4i32(i32 signext %s, <4 x i32> %v, <4 x i1> %
 ;
 ; RV64-LABEL: vpreduce_umin_v4i32:
 ; RV64:       # %bb.0:
-; RV64-NEXT:    slli a0, a0, 32
-; RV64-NEXT:    srli a0, a0, 32
+; RV64-NEXT:    andi a0, a0, -1
 ; RV64-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
 ; RV64-NEXT:    vmv.s.x v9, a0
 ; RV64-NEXT:    vsetvli zero, a1, e32, m1, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll b/llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-...
[truncated]

topperc · 2023-09-26T15:29:30Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+
+  // TODO: The Largest VL 65,536 occurs for LMUL=8 and SEW=8 with
+  // VLEN=65,536. We could check if Bits < 16 here.
+  if (UserOpNo == VLIdx)


The operand is really AVL not VL. It will be used as the input to a vsetvli. So it's not constrained.

Isn't it though? The largest AVL supported by the vsetvli will still be bounded, and we don't split instructions to support larger AVLs. As such, don't we know that either AVL is bounded or we have UB?

If the AVL was something large like 0x80000000 and we say that the instruction only demands the lower 17 bits, we could turn the AVL into 0. That seems incorrect.

Isn't such an AVL undefined to start with though? I thought AVL had to be smaller than VLMAX to be well defined - both in our internal representation and the intrinsic definition.

I'm being pedantic here. I don't think it is worth optimizing on right now, just want to make sure I fully have my head wrapped around this.

I ended up talking with Luke abut this offline, and convinced myself that I'm at least mostly wrong here.

We have existing transforms - such as narrowing a vmv.s.x to lmul in DAG combine - which assume the AVL value here will be well defined. I couldn't find clear evidence in code that we consistently follow the AVL to VL rules from the vector specification, but we definitely do rely on AVL values larger than VL being defined.

As an aside, looking at the intrinsic docs they say "Intrinsics will at most operate vlmax (derived in 3.4.2 of v-spec) elements. " I'm unclear if this is meant to match the vector specifications rules, or if this is meant to imply a clamp.

So, in terms of this review, Craig's correct, and removing the TODO was definitely the right call here.

topperc · 2023-09-26T15:29:58Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+
+  // 11.6. Vector Single-Width Shift Instructions
+  case RISCV::VSLL_VX:
+  case RISCV::VSLL_VI:


Do we need VI instructions here? They don't use any scalar register.

topperc · 2023-09-26T16:22:34Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+  case RISCV::VADC_VXM:
+  case RISCV::VADC_VIM:
+  case RISCV::VMADC_VXM:
+  case RISCV::VMADC_VIM:


I think you can drop VIM?

topperc · 2023-09-26T16:22:44Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+  case RISCV::VWMACCUS_VX:
+  // 11.15. Vector Integer Merge Instructions
+  case RISCV::VMERGE_VXM:
+  case RISCV::VMERGE_VIM:


You can drop VIM

topperc · 2023-09-26T16:22:54Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+  case RISCV::VMERGE_VIM:
+  // 11.16. Vector Integer Move Instructions
+  case RISCV::VMV_V_X:
+  case RISCV::VMV_V_I:


You can drop VMV_V_I

topperc

LGTM

preames

(Non blocking comments)

preames · 2023-09-26T16:36:10Z

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

+
+  // TODO: The Largest VL 65,536 occurs for LMUL=8 and SEW=8 with
+  // VLEN=65,536. We could check if Bits < 16 here.
+  if (UserOpNo == VLIdx)


Isn't it though? The largest AVL supported by the vsetvli will still be bounded, and we don't split instructions to support larger AVLs. As such, don't we know that either AVL is bounded or we have UB?

preames · 2023-09-26T16:41:13Z

llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp

+    return false;
+
+  // TODO: Handle Zvbb instructions
+  switch (PseudoInfo->BaseInstr) {


Having a utility which took the base opcode, and returned the number of bits used by the scalar operand would seem to common a lot of the implementation here. That could reasonable live in RISCVBaseInfo.h/cpp.

jacquesguan

LGTM, and some test could remove unused check prefixes.

This reverts commit aff6ffc. Version landed differs from version reviewed in (stylistic) manner worthy of separate review.

preames · 2023-09-27T18:26:14Z

I reverted this change. The stylistic changes made after approval deserve separate discussion. You may reland the approved version. Please move the stylistic change to it's own review.

Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart. This is a reland of aff6ffc with the refactoring omitted.

Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart.

This reverts commit aff6ffc. Version landed differs from version reviewed in (stylistic) manner worthy of separate review.

Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart. This is a reland of aff6ffc with the refactoring omitted.

lukel97 requested review from asb, preames, jacquesguan, topperc and nitinjohnraj September 26, 2023 12:08

llvmbot added the backend:RISC-V label Sep 26, 2023

lukel97 mentioned this pull request Sep 26, 2023

[RISCV][SelectionDAG] Sign extend splats of i32 in getConstant on RV64 #67027

Merged

lukel97 force-pushed the vector-pseudos-hasAllNBitsUsers branch from d992949 to 8f320df Compare September 26, 2023 13:35

topperc reviewed Sep 26, 2023

View reviewed changes

Remove VL TODO and unnecessary _VI cases

2f0aa7d

topperc reviewed Sep 26, 2023

View reviewed changes

Remove vmerge_vim and vmv_v_i cases

f5340e8

topperc approved these changes Sep 26, 2023

View reviewed changes

preames reviewed Sep 26, 2023

View reviewed changes

M4ximumPizza approved these changes Sep 26, 2023

View reviewed changes

jacquesguan approved these changes Sep 27, 2023

View reviewed changes

lukel97 added 2 commits September 27, 2023 12:20

Move switch into RISCVBaseInfo.{h,cpp}

6f168fd

Remove unused test prefixes

caf82f1

lukel97 requested a review from preames September 27, 2023 11:25

lukel97 merged commit aff6ffc into llvm:main Sep 27, 2023

lukel97 mentioned this pull request Sep 27, 2023

[RISCV] Fix illegal build_vector when lowering double id buildvec on RV32 #67017

Merged

preames added a commit that referenced this pull request Sep 27, 2023

Revert "[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419)"

487dd5f

This reverts commit aff6ffc. Version landed differs from version reviewed in (stylistic) manner worthy of separate review.

preames mentioned this pull request Sep 27, 2023

[RISCV] Reduce LMUL when index is known when lowering insert_vector_elt #66087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Handle .vx/.vi pseudos in hasAllNBitUsers #67419

[RISCV] Handle .vx/.vi pseudos in hasAllNBitUsers #67419

lukel97 commented Sep 26, 2023 •

edited

Loading

llvmbot commented Sep 26, 2023

topperc Sep 26, 2023

preames Sep 26, 2023

topperc Sep 26, 2023

preames Sep 26, 2023

preames Sep 27, 2023

topperc Sep 26, 2023

topperc Sep 26, 2023

topperc Sep 26, 2023

topperc Sep 26, 2023

topperc left a comment

preames left a comment

preames Sep 26, 2023

preames Sep 26, 2023

jacquesguan left a comment

preames commented Sep 27, 2023

[RISCV] Handle .vx/.vi pseudos in hasAllNBitUsers #67419

[RISCV] Handle .vx/.vi pseudos in hasAllNBitUsers #67419

Conversation

lukel97 commented Sep 26, 2023 • edited Loading

llvmbot commented Sep 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topperc left a comment

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesguan left a comment

Choose a reason for hiding this comment

preames commented Sep 27, 2023

lukel97 commented Sep 26, 2023 •

edited

Loading