[Legalizer] Expand fmaximum and fminimum #67301

ecnelises · 2023-09-25T09:10:02Z

According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN.

Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.

Migrated from https://reviews.llvm.org/D158053

Fixes #64209

llvmbot · 2023-09-25T09:13:07Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-arm

Changes

According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN.

Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.

Migrated from https://reviews.llvm.org/D158053

Patch is 39.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67301.diff

9 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+3)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+6)
(modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+7)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+58)
(modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+5-9)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+6-4)
(modified) llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll (+10-18)
(added) llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll (+97)
(added) llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll (+847)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index c6a7aa17146dd4f..429cfd72af2e6e0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5089,6 +5089,9 @@ class TargetLowering : public TargetLoweringBase {
   /// Expand fminnum/fmaxnum into fminnum_ieee/fmaxnum_ieee with quieted inputs.
   SDValue expandFMINNUM_FMAXNUM(SDNode *N, SelectionDAG &DAG) const;
 
+  /// Expand fminimum/fmaximum into multiple comparison with selects.
+  SDValue expandFMINIMUM_FMAXIMUM(SDNode *N, SelectionDAG &DAG) const;
+
   /// Expand FP_TO_[US]INT_SAT into FP_TO_[US]INT and selects or min/max.
   /// \param N Node to expand
   /// \returns The expansion result
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index f19beea3a3ed8b7..33f6354d5584540 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3540,6 +3540,12 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
       Results.push_back(Expanded);
     break;
   }
+  case ISD::FMINIMUM:
+  case ISD::FMAXIMUM: {
+    if (SDValue Expanded = TLI.expandFMINIMUM_FMAXIMUM(Node, DAG))
+      Results.push_back(Expanded);
+    break;
+  }
   case ISD::FSIN:
   case ISD::FCOS: {
     EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index dec81475f3a88fc..db132035adcf293 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -949,6 +949,13 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
       return;
     }
     break;
+  case ISD::FMINIMUM:
+  case ISD::FMAXIMUM:
+    if (SDValue Expanded = TLI.expandFMINIMUM_FMAXIMUM(Node, DAG)) {
+      Results.push_back(Expanded);
+      return;
+    }
+    break;
   case ISD::SMIN:
   case ISD::SMAX:
   case ISD::UMIN:
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 39489e0bf142eb2..23de9829b5e9ffd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8177,6 +8177,64 @@ SDValue TargetLowering::expandFMINNUM_FMAXNUM(SDNode *Node,
   return SDValue();
 }
 
+SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
+                                                SelectionDAG &DAG) const {
+  SDLoc DL(N);
+  SDValue LHS = N->getOperand(0);
+  SDValue RHS = N->getOperand(1);
+  unsigned Opc = N->getOpcode();
+  EVT VT = N->getValueType(0);
+  EVT CCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
+  bool NoNaN = (N->getFlags().hasNoNaNs() ||
+                (DAG.isKnownNeverNaN(LHS) && DAG.isKnownNeverNaN(RHS)));
+  bool NoZeroSign =
+      (N->getFlags().hasNoSignedZeros() || DAG.isKnownNeverZeroFloat(LHS) ||
+       DAG.isKnownNeverZeroFloat(RHS));
+  bool IsMax = Opc == ISD::FMAXIMUM;
+
+  if (VT.isVector() &&
+      isOperationLegalOrCustomOrPromote(Opc, VT.getScalarType()))
+    return SDValue();
+
+  SDValue MinMax;
+  if (isOperationLegalOrCustom(IsMax ? ISD::FMAXNUM_IEEE : ISD::FMINNUM_IEEE,
+                               VT))
+    MinMax = DAG.getNode(IsMax ? ISD::FMAXNUM_IEEE : ISD::FMINNUM_IEEE, DL, VT,
+                         LHS, RHS);
+  else if (isOperationLegalOrCustom(IsMax ? ISD::FMAXNUM : ISD::FMINNUM, VT))
+    MinMax = DAG.getNode(IsMax ? ISD::FMAXNUM : ISD::FMINNUM, DL, VT, LHS, RHS);
+  else
+    MinMax = DAG.getSelect(
+        DL, VT,
+        DAG.getSetCC(DL, CCVT, LHS, RHS, IsMax ? ISD::SETGT : ISD::SETLT), LHS,
+        RHS);
+
+  // Propagate any NaN of both operands
+  if (!NoNaN) {
+    ConstantFP *FPNaN = ConstantFP::get(
+        *DAG.getContext(), APFloat::getNaN(DAG.EVTToAPFloatSemantics(VT)));
+    MinMax = DAG.getSelect(DL, VT, DAG.getSetCC(DL, CCVT, LHS, RHS, ISD::SETUO),
+                           DAG.getConstantFP(*FPNaN, DL, VT), MinMax);
+  }
+
+  // fminimum/fmaximum requires -0.0 less than +0.0
+  if (!NoZeroSign) {
+    SDValue IsZero = DAG.getSetCC(DL, CCVT, MinMax,
+                                  DAG.getConstantFP(0.0, DL, VT), ISD::SETEQ);
+    SDValue TestZero =
+        DAG.getTargetConstant(IsMax ? fcPosZero : fcNegZero, DL, MVT::i32);
+    SDValue LCmp = DAG.getSelect(
+        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, LHS, TestZero), LHS,
+        MinMax);
+    SDValue RCmp = DAG.getSelect(
+        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
+        LCmp);
+    MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax);
+  }
+
+  return MinMax;
+}
+
 /// Returns a true value if if this FPClassTest can be performed with an ordered
 /// fcmp to 0, and a false value if it's an unordered fcmp to 0. Returns
 /// std::nullopt if it cannot be performed as a compare with 0.
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 69ef942df1f6e78..9eac62175ee5fc4 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -1549,15 +1549,11 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
 
   if (Subtarget->hasNEON()) {
     // vmin and vmax aren't available in a scalar form, so we can use
-    // a NEON instruction with an undef lane instead.  This has a performance
-    // penalty on some cores, so we don't do this unless we have been
-    // asked to by the core tuning model.
-    if (Subtarget->useNEONForSinglePrecisionFP()) {
-      setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
-      setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
-      setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
-      setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
-    }
+    // a NEON instruction with an undef lane instead.
+    setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
+    setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
+    setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
+    setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
     setOperationAction(ISD::FMINIMUM, MVT::v2f32, Legal);
     setOperationAction(ISD::FMAXIMUM, MVT::v2f32, Legal);
     setOperationAction(ISD::FMINIMUM, MVT::v4f32, Legal);
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 8e6644821031c17..2c2f3926fc93c58 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -440,8 +440,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
     if (Subtarget.is64Bit())
       setOperationAction(ISD::FPOWI, MVT::i32, Custom);
 
-    if (!Subtarget.hasStdExtZfa())
-      setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16, Custom);
+    setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f16,
+                       Subtarget.hasStdExtZfa() ? Legal : Custom);
   }
 
   if (Subtarget.hasStdExtFOrZfinx()) {
@@ -464,9 +464,10 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::FP_TO_FP16, MVT::f32, Custom);
     setOperationAction(ISD::FP16_TO_FP, MVT::f32, Custom);
 
-    if (Subtarget.hasStdExtZfa())
+    if (Subtarget.hasStdExtZfa()) {
       setOperationAction(ISD::FNEARBYINT, MVT::f32, Legal);
-    else
+      setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Legal);
+    } else
       setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f32, Custom);
   }
 
@@ -481,6 +482,7 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::FNEARBYINT, MVT::f64, Legal);
       setOperationAction(ISD::BITCAST, MVT::i64, Custom);
       setOperationAction(ISD::BITCAST, MVT::f64, Custom);
+      setOperationAction({ISD::FMAXIMUM, ISD::FMINIMUM}, MVT::f64, Legal);
     } else {
       if (Subtarget.is64Bit())
         setOperationAction(FPRndMode, MVT::f64, Custom);
diff --git a/llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll b/llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
index be741f536ac757f..528bfe0411730ab 100644
--- a/llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
+++ b/llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
@@ -46,12 +46,10 @@ define float @fminnum32_intrinsic(float %x, float %y) {
 define float @fminnum32_nsz_intrinsic(float %x, float %y) {
 ; ARMV7-LABEL: fminnum32_nsz_intrinsic:
 ; ARMV7:       @ %bb.0:
-; ARMV7-NEXT:    vmov s0, r0
-; ARMV7-NEXT:    vmov s2, r1
-; ARMV7-NEXT:    vcmp.f32 s0, s2
-; ARMV7-NEXT:    vmrs APSR_nzcv, fpscr
-; ARMV7-NEXT:    vmovlt.f32 s2, s0
-; ARMV7-NEXT:    vmov r0, s2
+; ARMV7-NEXT:    vmov s0, r1
+; ARMV7-NEXT:    vmov s2, r0
+; ARMV7-NEXT:    vmin.f32 d0, d1, d0
+; ARMV7-NEXT:    vmov r0, s0
 ; ARMV7-NEXT:    bx lr
 ;
 ; ARMV8-LABEL: fminnum32_nsz_intrinsic:
@@ -78,9 +76,7 @@ define float @fminnum32_non_zero_intrinsic(float %x) {
 ; ARMV7:       @ %bb.0:
 ; ARMV7-NEXT:    vmov.f32 s0, #-1.000000e+00
 ; ARMV7-NEXT:    vmov s2, r0
-; ARMV7-NEXT:    vcmp.f32 s2, s0
-; ARMV7-NEXT:    vmrs APSR_nzcv, fpscr
-; ARMV7-NEXT:    vmovlt.f32 s0, s2
+; ARMV7-NEXT:    vmin.f32 d0, d1, d0
 ; ARMV7-NEXT:    vmov r0, s0
 ; ARMV7-NEXT:    bx lr
 ;
@@ -136,12 +132,10 @@ define float @fmaxnum32_intrinsic(float %x, float %y) {
 define float @fmaxnum32_nsz_intrinsic(float %x, float %y) {
 ; ARMV7-LABEL: fmaxnum32_nsz_intrinsic:
 ; ARMV7:       @ %bb.0:
-; ARMV7-NEXT:    vmov s0, r0
-; ARMV7-NEXT:    vmov s2, r1
-; ARMV7-NEXT:    vcmp.f32 s0, s2
-; ARMV7-NEXT:    vmrs APSR_nzcv, fpscr
-; ARMV7-NEXT:    vmovgt.f32 s2, s0
-; ARMV7-NEXT:    vmov r0, s2
+; ARMV7-NEXT:    vmov s0, r1
+; ARMV7-NEXT:    vmov s2, r0
+; ARMV7-NEXT:    vmax.f32 d0, d1, d0
+; ARMV7-NEXT:    vmov r0, s0
 ; ARMV7-NEXT:    bx lr
 ;
 ; ARMV8-LABEL: fmaxnum32_nsz_intrinsic:
@@ -210,9 +204,7 @@ define float @fmaxnum32_non_zero_intrinsic(float %x) {
 ; ARMV7:       @ %bb.0:
 ; ARMV7-NEXT:    vmov.f32 s0, #1.000000e+00
 ; ARMV7-NEXT:    vmov s2, r0
-; ARMV7-NEXT:    vcmp.f32 s2, s0
-; ARMV7-NEXT:    vmrs APSR_nzcv, fpscr
-; ARMV7-NEXT:    vmovgt.f32 s0, s2
+; ARMV7-NEXT:    vmax.f32 d0, d1, d0
 ; ARMV7-NEXT:    vmov r0, s0
 ; ARMV7-NEXT:    bx lr
 ;
diff --git a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll
new file mode 100644
index 000000000000000..6d9eb1337682740
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s | FileCheck %s
+
+define fp128 @f128_minimum(fp128 %a, fp128 %b) {
+; CHECK-LABEL: f128_minimum:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    xscmpuqp 0, 2, 3
+; CHECK-NEXT:    vmr 4, 2
+; CHECK-NEXT:    bge 0, .LBB0_8
+; CHECK-NEXT:  # %bb.1: # %entry
+; CHECK-NEXT:    bun 0, .LBB0_9
+; CHECK-NEXT:  .LBB0_2: # %entry
+; CHECK-NEXT:    xststdcqp 0, 2, 4
+; CHECK-NEXT:    bc 4, 2, .LBB0_10
+; CHECK-NEXT:  .LBB0_3: # %entry
+; CHECK-NEXT:    xststdcqp 0, 3, 4
+; CHECK-NEXT:    bc 12, 2, .LBB0_5
+; CHECK-NEXT:  .LBB0_4: # %entry
+; CHECK-NEXT:    vmr 3, 2
+; CHECK-NEXT:  .LBB0_5: # %entry
+; CHECK-NEXT:    addis 3, 2, .LCPI0_1@toc@ha
+; CHECK-NEXT:    addi 3, 3, .LCPI0_1@toc@l
+; CHECK-NEXT:    lxv 34, 0(3)
+; CHECK-NEXT:    xscmpuqp 0, 4, 2
+; CHECK-NEXT:    beq 0, .LBB0_7
+; CHECK-NEXT:  # %bb.6: # %entry
+; CHECK-NEXT:    vmr 3, 4
+; CHECK-NEXT:  .LBB0_7: # %entry
+; CHECK-NEXT:    vmr 2, 3
+; CHECK-NEXT:    blr
+; CHECK-NEXT:  .LBB0_8: # %entry
+; CHECK-NEXT:    vmr 4, 3
+; CHECK-NEXT:    bnu 0, .LBB0_2
+; CHECK-NEXT:  .LBB0_9:
+; CHECK-NEXT:    addis 3, 2, .LCPI0_0@toc@ha
+; CHECK-NEXT:    addi 3, 3, .LCPI0_0@toc@l
+; CHECK-NEXT:    lxv 36, 0(3)
+; CHECK-NEXT:    xststdcqp 0, 2, 4
+; CHECK-NEXT:    bc 12, 2, .LBB0_3
+; CHECK-NEXT:  .LBB0_10: # %entry
+; CHECK-NEXT:    vmr 2, 4
+; CHECK-NEXT:    xststdcqp 0, 3, 4
+; CHECK-NEXT:    bc 4, 2, .LBB0_4
+; CHECK-NEXT:    b .LBB0_5
+entry:
+  %m = call fp128 @llvm.minimum.f128(fp128 %a, fp128 %b)
+  ret fp128 %m
+}
+
+define fp128 @f128_maximum(fp128 %a, fp128 %b) {
+; CHECK-LABEL: f128_maximum:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    xscmpuqp 0, 2, 3
+; CHECK-NEXT:    vmr 4, 2
+; CHECK-NEXT:    ble 0, .LBB1_8
+; CHECK-NEXT:  # %bb.1: # %entry
+; CHECK-NEXT:    bun 0, .LBB1_9
+; CHECK-NEXT:  .LBB1_2: # %entry
+; CHECK-NEXT:    xststdcqp 0, 2, 8
+; CHECK-NEXT:    bc 4, 2, .LBB1_10
+; CHECK-NEXT:  .LBB1_3: # %entry
+; CHECK-NEXT:    xststdcqp 0, 3, 8
+; CHECK-NEXT:    bc 12, 2, .LBB1_5
+; CHECK-NEXT:  .LBB1_4: # %entry
+; CHECK-NEXT:    vmr 3, 2
+; CHECK-NEXT:  .LBB1_5: # %entry
+; CHECK-NEXT:    addis 3, 2, .LCPI1_1@toc@ha
+; CHECK-NEXT:    addi 3, 3, .LCPI1_1@toc@l
+; CHECK-NEXT:    lxv 34, 0(3)
+; CHECK-NEXT:    xscmpuqp 0, 4, 2
+; CHECK-NEXT:    beq 0, .LBB1_7
+; CHECK-NEXT:  # %bb.6: # %entry
+; CHECK-NEXT:    vmr 3, 4
+; CHECK-NEXT:  .LBB1_7: # %entry
+; CHECK-NEXT:    vmr 2, 3
+; CHECK-NEXT:    blr
+; CHECK-NEXT:  .LBB1_8: # %entry
+; CHECK-NEXT:    vmr 4, 3
+; CHECK-NEXT:    bnu 0, .LBB1_2
+; CHECK-NEXT:  .LBB1_9:
+; CHECK-NEXT:    addis 3, 2, .LCPI1_0@toc@ha
+; CHECK-NEXT:    addi 3, 3, .LCPI1_0@toc@l
+; CHECK-NEXT:    lxv 36, 0(3)
+; CHECK-NEXT:    xststdcqp 0, 2, 8
+; CHECK-NEXT:    bc 12, 2, .LBB1_3
+; CHECK-NEXT:  .LBB1_10: # %entry
+; CHECK-NEXT:    vmr 2, 4
+; CHECK-NEXT:    xststdcqp 0, 3, 8
+; CHECK-NEXT:    bc 4, 2, .LBB1_4
+; CHECK-NEXT:    b .LBB1_5
+entry:
+  %m = call fp128 @llvm.maximum.f128(fp128 %a, fp128 %b)
+  ret fp128 %m
+}
+
+declare fp128 @llvm.minimum.f128(fp128, fp128)
+declare fp128 @llvm.maximum.f128(fp128, fp128)
diff --git a/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
new file mode 100644
index 000000000000000..24fa7c716ea2951
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
@@ -0,0 +1,847 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
+; RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -mattr=-vsx < %s | FileCheck %s --check-prefix=NOVSX
+; RUN: llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 < %s | FileCheck %s --check-prefix=VSX
+; RUN: llc -mtriple=powerpc64-ibm-aix -mcpu=pwr8 < %s | FileCheck %s --check-prefix=AIX
+
+define float @f32_minimum(float %a, float %b) {
+; NOVSX-LABEL: f32_minimum:
+; NOVSX:       # %bb.0: # %entry
+; NOVSX-NEXT:    fcmpu 0, 1, 2
+; NOVSX-NEXT:    fmr 0, 1
+; NOVSX-NEXT:    stfs 2, -8(1)
+; NOVSX-NEXT:    stfs 1, -4(1)
+; NOVSX-NEXT:    bc 12, 0, .LBB0_2
+; NOVSX-NEXT:  # %bb.1: # %entry
+; NOVSX-NEXT:    fmr 0, 2
+; NOVSX-NEXT:  .LBB0_2: # %entry
+; NOVSX-NEXT:    lwz 3, -4(1)
+; NOVSX-NEXT:    bc 4, 3, .LBB0_4
+; NOVSX-NEXT:  # %bb.3:
+; NOVSX-NEXT:    addis 4, 2, .LCPI0_0@toc@ha
+; NOVSX-NEXT:    lfs 0, .LCPI0_0@toc@l(4)
+; NOVSX-NEXT:  .LBB0_4: # %entry
+; NOVSX-NEXT:    xoris 3, 3, 32768
+; NOVSX-NEXT:    cmplwi 3, 0
+; NOVSX-NEXT:    lwz 3, -8(1)
+; NOVSX-NEXT:    bc 12, 2, .LBB0_6
+; NOVSX-NEXT:  # %bb.5: # %entry
+; NOVSX-NEXT:    fmr 1, 0
+; NOVSX-NEXT:  .LBB0_6: # %entry
+; NOVSX-NEXT:    xoris 3, 3, 32768
+; NOVSX-NEXT:    cmplwi 3, 0
+; NOVSX-NEXT:    bc 12, 2, .LBB0_8
+; NOVSX-NEXT:  # %bb.7: # %entry
+; NOVSX-NEXT:    fmr 2, 1
+; NOVSX-NEXT:  .LBB0_8: # %entry
+; NOVSX-NEXT:    addis 3, 2, .LCPI0_1@toc@ha
+; NOVSX-NEXT:    lfs 1, .LCPI0_1@toc@l(3)
+; NOVSX-NEXT:    fcmpu 0, 0, 1
+; NOVSX-NEXT:    bc 12, 2, .LBB0_10
+; NOVSX-NEXT:  # %bb.9: # %entry
+; NOVSX-NEXT:    fmr 2, 0
+; NOVSX-NEXT:  .LBB0_10: # %entry
+; NOVSX-NEXT:    fmr 1, 2
+; NOVSX-NEXT:    blr
+;
+; VSX-LABEL: f32_minimum:
+; VSX:       # %bb.0: # %entry
+; VSX-NEXT:    fcmpu 0, 1, 2
+; VSX-NEXT:    xscvdpspn 0, 1
+; VSX-NEXT:    xscvdpspn 3, 2
+; VSX-NEXT:    mffprwz 3, 0
+; VSX-NEXT:    bc 12, 3, .LBB0_2
+; VSX-NEXT:  # %bb.1: # %entry
+; VSX-NEXT:    xsmindp 0, 1, 2
+; VSX-NEXT:    b .LBB0_3
+; VSX-NEXT:  .LBB0_2:
+; VSX-NEXT:    addis 4, 2, .LCPI0_0@toc@ha
+; VSX-NEXT:    lfs 0, .LCPI0_0@toc@l(4)
+; VSX-NEXT:  .LBB0_3: # %entry
+; VSX-NEXT:    xoris 3, 3, 32768
+; VSX-NEXT:    cmplwi 3, 0
+; VSX-NEXT:    mffprwz 3, 3
+; VSX-NEXT:    bc 12, 2, .LBB0_5
+; VSX-NEXT:  # %bb.4: # %entry
+; VSX-NEXT:    fmr 1, 0
+; VSX-NEXT:  .LBB0_5: # %entry
+; VSX-NEXT:    xoris 3, 3, 32768
+; VSX-NEXT:    cmplwi 3, 0
+; VSX-NEXT:    bc 12, 2, .LBB0_7
+; VSX-NEXT:  # %bb.6: # %entry
+; VSX-NEXT:    fmr 2, 1
+; VSX-NEXT:  .LBB0_7: # %entry
+; VSX-NEXT:    xxlxor 1, 1, 1
+; VSX-NEXT:    fcmpu 0, 0, 1
+; VSX-NEXT:    bc 12, 2, .LBB0_9
+; VSX-NEXT:  # %bb.8: # %entry
+; VSX-NEXT:    fmr 2, 0
+; VSX-NEXT:  .LBB0_9: # %entry
+; VSX-NEXT:    fmr 1, 2
+; VSX-NEXT:    blr
+;
+; AIX-LABEL: f32_minimum:
+; AIX:       # %bb.0: # %entry
+; AIX-NEXT:    fcmpu 0, 1, 2
+; AIX-NEXT:    xscvdpspn 0, 1
+; AIX-NEXT:    xscvdpspn 3, 2
+; AIX-NEXT:    mffprwz 3, 0
+; AIX-NEXT:    bc 12, 3, L..BB0_2
+; AIX-NEXT:  # %bb.1: # %entry
+; AIX-NEXT:    xsmindp 0, 1, 2
+; AIX-NEXT:    b L..BB0_3
+; AIX-NEXT:  L..BB0_2:
+; AIX-NEXT:    ld 4, L..C0(2) # %const.0
+; AIX-NEXT:    lfs 0, 0(4)
+; AIX-NEXT:  L..BB0_3: # %entry
+; AIX-NEXT:    xoris 3, 3, 32768
+; AIX-NEXT:    cmplwi 3, 0
+; AIX-NEXT:    mffprwz 3, 3
+; AIX-NEXT:    bc 12, 2, L..BB0_5
+; AIX-NEXT:  # %bb.4: # %entry
+; AIX-NEXT:    fmr 1, 0
+; AIX-NEXT:  L..BB0_5: # %entry
+; AIX-NEXT:    xoris 3, 3, 32768
+; AIX-NEXT:    cmplwi 3, 0
+; AIX-NEXT:    bc 12, 2, L..BB0_7
+; AIX-NEXT:  # %bb.6: # %entry
+; AIX-NEXT:    fmr 2, 1
+; AIX-NEXT:  L..BB0_7: # %entry
+; AIX-NEXT:    xxlxor 1, 1, 1
+; AIX-NEXT:    fcmpu 0, 0, 1
+; AIX-NEXT:    bc 12, 2, L..BB0_9
+; AIX-NEXT:  # %bb.8: # %entry
+; AIX-NEXT:    fmr 2, 0
+; AIX-NEXT:  L..BB0_9: # %entry
+; AIX-NEXT:    fmr 1, 2
+; AIX-NEXT:    blr
+entry:
+  %m = call float @llvm.minimum.f32(float %a, float %b)
+  ret float %m
+}
+
+define float @f32_maximum(float %a, float %b) {
+; NOVSX-LABEL: f32_maximum:
+; NOVSX:       # %bb.0: # %entry
+; NOVSX-NEXT:    fcmpu 0, 1, 2
+; NOVSX-NEXT:    fmr 0, 1
+; NOVSX-NEXT:    stfs 2, -8(1)
+; NOVSX-NEXT:    stfs 1, -4(1)
+; NOVSX-NEXT:    bc 12, 1, .LBB1_2
+; NOVSX-NEXT:  # %bb.1: # %entry
+; NOVSX-NEXT:    fmr 0, 2
+; NOVSX-NEXT:  .LBB1_2: # %entry
+; NOVSX-NEXT:    lwz 3, -4(1)
+; NOVSX-NEXT:    bc 4, 3, .LBB1_4
+; NOVSX-NEXT:  # %bb.3:
+; NOVSX-NEXT:    addis 4, 2, .LCPI1_0@toc@ha
+; NOVSX-NEXT:    lfs 0, .LCPI1_0@toc@l(4)
+; NOVSX-NEXT:  .LBB1_4: # %entry
+; NOVSX-NEXT:    cmpwi 3, 0
+; NOVSX-NEXT:    lwz 3, -8(1)
+; NOVSX-NEXT:    bc 12, 2, .LBB1_6
+; NOVSX-NEXT:  # %bb.5: # %entry
+; NOVSX-NEXT:    fmr 1, 0
+; NOVSX-NEXT:  .LBB1_6: # %entry
+; NOVSX-NEXT:    cmpwi 3, 0
+; NOVSX-NEXT:    bc 12, 2, .LBB1_8
+; NOVSX-NEXT:  # %bb.7: # %entry
+; NOVSX-NEXT:    fmr 2, 1
+; NOVSX-NEXT:  .LBB1_8: # %entry
+; NOVSX-NEXT:    addis 3, 2, .LCPI1_1@toc@ha
+; NOVSX-NEXT:    lfs 1, .LCPI1_1@toc@l(3)
+; NOVSX-NEXT:    fcmpu 0, 0, 1
+; NOVSX-NEXT:    bc 12, 2, .LBB1_10
+; NOVSX-NEXT:  # %bb.9: # %entry
+; NOVSX-NEXT:    fmr 2, 0
+; NOVSX-NEXT:  .LBB1_10: # %entry
+; NOVSX-NEXT:    fmr 1, 2
+; NOVSX-NEXT:    blr
+;
+; VSX-LABEL: f32_maximum:
+; VSX:       # %bb.0: # %entry
+; VSX-NEXT:    fcmpu 0, 1, 2
+; VSX-NEXT:    xscvdpspn 0, 1
+; VSX-NEXT:    xscvdpspn 3, 2
+; VSX-NEXT:    mffprwz 3, 0
+; VSX-NEXT:    bc 12, 3, .LBB1_2
+; VSX-NEXT:  # %bb.1: # %entry
+; VSX-NEXT:    xsmaxdp 0, 1, 2
+; VSX-NEXT:    b .LBB1_3
+; VSX-NEXT:  .LBB1_2:
+; VSX-NEXT:    addis 4, 2, .LCPI1_0@toc@ha
+; VSX-NEXT:    lfs 0, .LCPI1_0@toc@l(4)
+; VSX-NEXT:  .LBB1_3: # %entry
+; VSX-NEXT:    cmpwi 3, 0
+; VSX-NEXT:    mffprwz 3, 3
+; VSX-NEXT:    bc 12, 2, .LBB1_5
+; VSX-NEXT:  # %bb.4: # %entry
+; VSX-NEXT:    fmr 1, 0
+; VSX-NEXT:  .LBB1_5: # %entry
+; VSX-NEXT:    cmpwi 3, 0
+; VSX-NEXT:    bc 12, 2, .LBB1_7
+; VSX-NEXT:  # %bb.6: # %entry
+; VSX-NEXT:    fmr 2, 1
+; VSX-NEXT:  .LBB1_7: # %entry
+; VSX-NEXT:    xxlxor 1, 1, 1
+; VSX-NEXT:    fcmpu 0, 0, 1
+; VSX-NEXT:    bc 12, 2, .LBB1_9
+; VSX-NEXT:  # %bb.8: # %entry
+; VSX-NEXT:    fmr 2, 0
+; VSX-NEXT:  .LBB1_9: # %entry
+; VSX-NEXT:    fmr 1, 2
+; VSX-NEXT:    blr
+;
+; AIX-LABEL: f32_maximum:
+; AIX:       # %bb.0: # %entry
+; AIX-NEXT:    fcmpu 0, 1, 2
+; AIX-NEXT:    xscvdpspn 0, 1
+; AIX-NEXT:    xscvdpspn 3, 2
+; AIX-NEXT:    mffprwz 3, 0
+; AIX-NEXT:    bc 12, 3, L..BB1_2
+; AIX-NEXT:  # %bb.1: # %entry
+; AI...
[truncated]

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Artem-B · 2023-12-11T19:26:58Z

What are the next steps needed to move this patch forward?

It would be great if we could land it, as we already have real-world need for lowering fminimum/fmaximum on targets that do not have native support for it.

mshockwave

LGTM. It will be great if you could add RISCV test too, but please don't let this block you.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

topperc · 2023-12-12T01:28:37Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+    MinMax = DAG.getNode(IsMax ? ISD::FMAXNUM : ISD::FMINNUM, DL, VT, LHS, RHS);
+  } else {
+    SDValue Compare =
+        DAG.getSetCC(DL, CCVT, LHS, RHS, IsMax ? ISD::SETGT : ISD::SETLT);


SETGT/SETLT leave it up to the target whether it returns true or false for NaN. Is that intentional? If so it's probably worth a comment.

Yes here it assumes no NaN exists (if either operand is NaN, it will be propagated in following code)

topperc · 2023-12-12T01:29:57Z

LGTM. It will be great if you could add RISCV test too, but please don't let this block you.

RISC-V uses custom lowering. Our instructions don't propagate NaN but they do order -0.0, +0.0 correctly so we use them with a NaN fixup. We should already have tests.

RKSimon

Is there any existing vector test coverage? There don't seem to be any test changes for them.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

RKSimon · 2023-12-12T09:52:52Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+    SDValue RCmp = DAG.getSelect(
+        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
+        LCmp);
+    MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax);


random thought - can this be done better with FCOPYSIGN?

We still need isfpclass to know which is negative/positive zero and whether both are zeros, I think.

I think we should try to avoid using is_fpclass here. Additionally, I think we have under-defined the internally used IEEE nodes. As currently defined, minnum_ieee/maxnum_ieee have unspecified signed 0 order. However for AMDGPU at least, the actual hardware instructions have always appropriately ordered 0s. We could either refine the _IEEE node definitions to be IEEE -2019 and require ordered 0 behavior which doesn't require this fixup.

PowerPC implements fmaxnum_ieee into xsmaxdp which repsects zero signs. I did not see AMDGPU backend implements it.

While LoongArch (using fmax.d) ISA just says following the specification of maxNum(x,y) operation in the IEEE 754-2008 standard. I think it's unspecified. (maybe @llvm/pr-subscribers-loongarch )

I opened #85195 to just fix the definitions here. If LoongArch doesn't behave correctly, we can lower it to preserve the sign there

#85195 is landed, so we should just assume the _IEEE flavors preserve the sign

ecnelises · 2023-12-20T06:58:10Z

Is there any existing vector test coverage?

Yes, there are vector tests in PowerPC's fminimum-fmaximum.ll.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

arsenm · 2024-01-09T07:58:13Z

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

+    SDValue RCmp = DAG.getSelect(
+        DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
+        LCmp);
+    MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax);


I think we should try to avoid using is_fpclass here. Additionally, I think we have under-defined the internally used IEEE nodes. As currently defined, minnum_ieee/maxnum_ieee have unspecified signed 0 order. However for AMDGPU at least, the actual hardware instructions have always appropriately ordered 0s. We could either refine the _IEEE node definitions to be IEEE -2019 and require ordered 0 behavior which doesn't require this fixup.

jayfoad · 2024-02-13T09:05:17Z

Reverse ping! Are you still working on this?

ecnelises · 2024-02-20T02:27:45Z

Reverse ping! Are you still working on this?

Sure! Any comments I missed?

krzysz00 · 2024-03-08T20:18:42Z

Ping (as someone who's been carrying around a backport of this patch for a few months now)

arsenm · 2024-03-14T08:32:13Z

Reverse ping! Are you still working on this?

Sure! Any comments I missed?

I still think we should avoid using is_fpclass here

According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.

ecnelises · 2024-04-10T03:24:00Z

I still think we should avoid using is_fpclass here

fgetsign is not legal or expanded in every targets. To support comparison of signed zeros, should we duplicate similar logic of is_fpclass expansion? (bitcast them to int and do int comparison)

arsenm

I will need this shortly, so will be adding AMDGPU tests and will investigate avoiding is_fpclass

llvmbot added backend:ARM backend:RISC-V llvm:SelectionDAG SelectionDAGISel as well labels Sep 25, 2023

jayfoad reviewed Sep 25, 2023

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Show resolved Hide resolved

dtcxzyw requested a review from topperc September 25, 2023 09:27

jayfoad mentioned this pull request Oct 10, 2023

amdgcn failing to select llvm.vector.reduce.fmaximum.v2f32(<2 x float> intrinsic #67815

Open

arsenm reviewed Nov 2, 2023

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

Artem-B mentioned this pull request Nov 6, 2023

NVPTX backend generating instructions only available on SM80 and above by default. #64606

Open

mshockwave approved these changes Dec 12, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

topperc reviewed Dec 12, 2023

View reviewed changes

RKSimon reviewed Dec 12, 2023

View reviewed changes

arsenm reviewed Jan 9, 2024

View reviewed changes

jayfoad mentioned this pull request Feb 13, 2024

[AMDGPU] Conditionally enable DAG combines for fminimum/fmaximum #81584

Draft

krzysz00 mentioned this pull request Mar 7, 2024

Upstream merge ROCm/rocMLIR#1441

Merged

Cyanoxygen mentioned this pull request Mar 12, 2024

[MIPS] Cannot select llvm.{min,max}imum.{f32,f64} #64207

Open

ecnelises added 6 commits April 10, 2024 10:48

[Legalizer] Expand fmaximum and fminimum

aba6ae1

According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.

Add small comments and update test

b50ffdc

Address comments

604fd6e

Fix style

4c4a890

Codestyle refactor

8ca3196

Reverse checked operands

ddbc848

ecnelises force-pushed the fminimum_fmaximum branch from d83771f to ddbc848 Compare April 10, 2024 04:49

llvmbot added the backend:PowerPC label Apr 10, 2024

arsenm approved these changes Apr 26, 2024

View reviewed changes

ecnelises merged commit 4a8f2f2 into llvm:main Apr 29, 2024

ecnelises deleted the fminimum_fmaximum branch April 29, 2024 07:10

e-kud mentioned this pull request Feb 12, 2025

[i686] Cannot select llvm.{min,max}imum.{f32,f64} #64204

Open

woruyu mentioned this pull request May 27, 2025

[arm] Cannot select i32 = setcc 0x00000000, Constant:i64 #140445

Open

[Legalizer] Expand fmaximum and fminimum #67301

[Legalizer] Expand fmaximum and fminimum #67301

Uh oh!

Conversation

ecnelises commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Artem-B commented Dec 11, 2023

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc commented Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ecnelises commented Dec 20, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Feb 13, 2024

Uh oh!

ecnelises commented Feb 20, 2024

Uh oh!

krzysz00 commented Mar 8, 2024

Uh oh!

arsenm commented Mar 14, 2024

Uh oh!

ecnelises commented Apr 10, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ecnelises commented Sep 25, 2023 •

edited

Loading

llvmbot commented Sep 25, 2023 •

edited

Loading

topperc commented Dec 12, 2023 •

edited

Loading