-
Notifications
You must be signed in to change notification settings - Fork 13.4k
NVPTX backend generating instructions only available on SM80 and above by default. #64606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the bug report. I'll take a closer look tomorrow. I suspect the issue is not the lack of constraints (they would mostly affect automatic pattern matching), but rather improper setup of the instruction lowering. -- we should've expanded |
That is what I was suspecting. Thanks a lot for your response and for taking a look! |
It turns out that LLVM can not automatically expand
|
cc @manishucsd if he has any inputs. |
@Artem-B did you have any update on this. If this is going to take a while, we will have to revert dad9de0ae536 to avoid issues downstream. So just checking if I should just do that, or if the fix is imminent. Thanks again for taking a look! |
Not generating fminimum/fmaximum on older GPUs is probably the best option we have right now. The PTX snippet above is a bit naive. Correct implementation should look closer to this: llvm-project/llvm/include/llvm/ADT/APFloat.h Line 1401 in 64473f1
I guess I could compile that function to PTX and use resulting snippet as a templete for lowering the instruction. |
It ends up being a bit too complicated for lowering it as assembly: https://godbolt.org/z/r37a999jj |
Floating point and IEEE754 conformance issues are always deeper rabbit holes than they look like at the first glance. JuliaLang/julia#7866 (comment) I'll send the patch to properly constrain |
Thanks @Artem-B ill try to see how to fix the patch that uncovered this issue in MLIR. |
On the second thought, it appears that X86 does have a fairly generic implementation of the lowering for fminimum/fmaximum https://llvm.org/doxygen/X86ISelLowering_8cpp_source.html#l27875 I may be able to port it to NVPTX. I'll need to figure out a good way to test it for correctness. I guess Ideally I'll want to compare actual execution results between x86 host and a GPU. |
If you have a patch I can test it in IREE where things get exercised. Not a complete check though. If you have a simple example I can try that too. |
Step 1. Implement __builtin_fmaximum/fminimum, so we could use nan-propagating min/max from c/c++. |
I think we also have to check for the +0.0/-0.0 case. There's some information on how this was implemented for RISC-V: #64022 |
Related to the above, it seems like LLVM/NVPTX also emits
https://godbolt.org/z/EdWa5nv9W |
Once #67301 lands, we can use it on older GPUs. For now, the work-around is to avoid generating fmaximum/fminimum intrinsics. |
#67301 has been landed, is this done? |
Hey, we aren't actively developing the NVPTX backend and active developers right now do not have a setup to verify this. If someone wants to try this out and let us know that would be very much appreciated |
Starting with
https://github.com/llvm/llvm-project/commit/dad9de0ae536
it seems like NVPTX backend is generating instructions that are only available on SM80 and above. Specifically, starting from this inputI get this PTX for all architectures (here sm 60)
As specified here https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-max, the
max.NaN.f32
is available only of f32 or higher.The issue seems to be here
llvm-project/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
Line 1110 in 2b8542c
hasSM<80>
that is needed to guard these.I tried MaheshRavishankar@5f0385d and that just gave me a compilation error that indicated that instruction selection failed. So wondering if someone could either fix or help direct me to how to fix this.
The text was updated successfully, but these errors were encountered: