Skip to content

Cannot select fmaximum/fminimum #64022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcaballe opened this issue Jul 22, 2023 · 11 comments
Closed

Cannot select fmaximum/fminimum #64022

dcaballe opened this issue Jul 22, 2023 · 11 comments
Assignees
Labels
backend:RISC-V bug Indicates an unexpected problem or unintended behavior

Comments

@dcaballe
Copy link
Contributor

Any plans to add support for fmaximum/fminimum and their respective reductions to the RISC-V backend? We would like to target them from MLIR since the main fmax/fmin semantics in MLIR matches these instructions. This is currently blocking: https://reviews.llvm.org/D155877

@dcaballe dcaballe added bug Indicates an unexpected problem or unintended behavior backend:RISC-V labels Jul 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2023

@llvm/issue-subscribers-backend-risc-v

@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2023

@llvm/issue-subscribers-bug

@unterumarmung
Copy link
Contributor

It looks like the RISC-V backend supports llvm.maximum and llvm.minimum intrinsics, but it's missing support for llvm.vector.reduce.fminimum and @llvm.vector.reduce.fmaximum. This caused an infinite loop when I ran trunk llc in Godbolt: https://godbolt.org/z/GfxMj41fx

I want to make changes to add support for llvm.vector.reduce.fminimum and llvm.vector.reduce.fmaximum in the RISC-V backend, but I'm new to backend development. So, I could really use some guidance on where to make the code changes and what the expected results should be for these intrinsics. I'll try to look at commits from other backends that added support for these intrinsics, but any help or tips you can give me would be super appreciated! Thanks!

P.S. Isn't there supposed to be some kind of fallback code generation for intrinsics that are not supported by a backend?

@topperc
Copy link
Collaborator

topperc commented Jul 22, 2023

@unterumarmung using an assertions build triggers an llvm_unreachable. https://godbolt.org/z/5nY98v53Y I'm going to commit a patch to turn that llvm_unreachable into a fatal error.

@topperc topperc self-assigned this Jul 24, 2023
topperc added a commit that referenced this issue Jul 24, 2023
Unlike fmaxnum and fminnum, these operations propagate nan and
consider -0.0 to be less than +0.0.

Without Zfa, we don't have a single instruction for this. The
lowering I've used forces the other input to nan if one input
is a nan. If both inputs are nan, they get swapped. Then use
the fmax or fmin instruction.

New ISD nodes are needed because fmaxnum/fminnum to not define
the order of -0.0 and +0.0.

This lowering ensures the snans are quieted though that is probably not
required in default environment). Also ensures non-canonical nans
are canonicalized, though I'm also not sure that's needed.

Another option could be to use fmax/fmin and then overwrite the
result based on the inputs being nan, but I'm not sure we can do
that with any less code.

Future work will handle nonans FMF, and handling the case where
we can prove the input isn't nan.

This does fix the crash in #64022, but we need to do more work
to avoid scalarization.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D156069
@dcaballe
Copy link
Contributor Author

I'm actually getting and error for llvm.fmaximum, not the reduction one:

LLVM ERROR: Cannot select: t21: f32 = fmaximum t105, t268
  t105: f32 = fmaximum t208, t212
    t208: f32 = extract_vector_elt t214, Constant:i32<0>
      t214: nxv2f32,ch = llvm.riscv.vle<(load (s64) from %ir.5, align 64)> t0, TargetConstant:i32<8507>, undef:nxv2f32, t14, Constant:i32<2>
        t213: i32 = TargetConstant<8507>
        t106: nxv2f32 = undef
        t14: i32 = add t12, Constant:i32<64>
          t12: i32,ch = load<(load (s32) from %ir..unpack21, align 8)> t0, t11, undef:i32
            t11: i32,ch = load<(load (s32) from %ir..elt20)> t0, t8, undef:i32
              t8: i32 = add nuw t4, Constant:i32<28>
                t4: i32,ch = CopyFromReg t0, Register:i32 %1
                  t3: i32 = Register %1
                t7: i32 = Constant<28>
              t10: i32 = undef
            t10: i32 = undef
          t13: i32 = Constant<64>
        t109: i32 = Constant<2>
      t9: i32 = Constant<0>
    t212: f32 = extract_vector_elt t211, Constant:i32<0>
      t211: nxv2f32 = RISCVISD::VSLIDEDOWN_VL undef:nxv2f32, t214, Constant:i32<1>, t209, Constant:i32<1>, TargetConstant:i32<3>
        t106: nxv2f32 = undef
        t214: nxv2f32,ch = llvm.riscv.vle<(load (s64) from %ir.5, align 64)> t0, TargetConstant:i32<8507>, undef:nxv2f32, t14, Constant:i32<2>
          t213: i32 = TargetConstant<8507>
          t106: nxv2f32 = undef
          t14: i32 = add t12, Constant:i32<64>
            t12: i32,ch = load<(load (s32) from %ir..unpack21, align 8)> t0, t11, undef:i32
              t11: i32,ch = load<(load (s32) from %ir..elt20)> t0, t8, undef:i32
                t8: i32 = add nuw t4, Constant:i32<28>
                  t4: i32,ch = CopyFromReg t0, Register:i32 %1

                  t7: i32 = Constant<28>
                t10: i32 = undef
              t10: i32 = undef
            t13: i32 = Constant<64>
          t109: i32 = Constant<2>
        t103: i32 = Constant<1>
        t209: nxv2i1 = RISCVISD::VMSET_VL Constant:i32<1>
          t103: i32 = Constant<1>
        t103: i32 = Constant<1>
        t210: i32 = TargetConstant<3>
      t9: i32 = Constant<0>
  t268: f32,ch = load<(load (s32) from constant-pool)> t0, t270, undef:i32
    t270: i32 = RISCVISD::LLA TargetConstantPool:i32<float 0xC6293E5940000000> 0
      t269: i32 = TargetConstantPool<float 0xC6293E5940000000> 0
    t10: i32 = undef

If llvm.fmaximum is supported on f32, please, let me know and I'll try to get a repro. You can see the full log report from our testing here: https://github.com/openxla/iree/actions/runs/5626828153/job/15248556223?pr=14472#logs

@topperc
Copy link
Collaborator

topperc commented Jul 25, 2023

I fixed llvm.fmaximum for scalar f32 when the F extension is enabled earlier today.

@dcaballe
Copy link
Contributor Author

Thanks a lot, Craig! What is the state or planning for the reduction ones?

@topperc
Copy link
Collaborator

topperc commented Jul 28, 2023

Need to figure out a good lowering for reductions. We don't have a hardware reduction instruction that does this. So we need to do something like vredmax, and vfclass to find all the nans, and vcpop to count if there are any nans. If there are any nans replace the vredmax result with nan. This is not going to be efficient.

@dcaballe
Copy link
Contributor Author

That should be fine. We would like to remove some hacks and fix some issues related to this in MLIR and for that we need at least the basic functionality to be implemented on x86, ARM and RISC-V. It looks like RISC-V is the only one missing.

@topperc
Copy link
Collaborator

topperc commented Jul 28, 2023

After my scalar fix, the reductions no longer crash. They just generate terrible code https://godbolt.org/z/YfzYv4o8G

@dcaballe
Copy link
Contributor Author

Awesome! Let me integrate and run more extensive benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants