-
Notifications
You must be signed in to change notification settings - Fork 13.3k
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.0 #16221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (NOT ${FOUND_ROCWMMA}) | ||
message(FATAL_ERROR "rocwmma has not been found") | ||
endif() | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition never worked, for CHECK_INCLUDE_FILE_CXX cmake generates a cpp file that includes the header and then compiles it with the cxx compiler and checks if the compile is successful. In this case the compile can never be successful as
rocwmma.hpp includes hip extensions to c++, therefore FOUND_ROCWMMA was never set.
This was then masked by the condition NOT ${FOUND_ROCWMMA}
being wrong, it should be NOT FOUND_ROCWMMA
as NOT ${FOUND_ROCWMMA}
expands to NOT ""
when FOUND_ROCWMMA is not set, which evaluates to TRUE
fixes #16153 |
Unfortunately we cant just accumulate @fp32 in the wmma kernel on cdna to avoid this bug, even though this would be more performant, as we dont have enough shared memory for this. |
Currently this cant build on ci, as the rocwmma installation on ci is incorrect. llama.cpp/.github/workflows/build.yml Line 1072 in 63b54c8
and then use rocwmma, which is header implemented, from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI one of my long-term goals is to remove the WMMA kernel by expanding support for the mma kernel. The instructions that are still missing support are Volta tensor cores, AMD WMMA, and AMD MFMA. I'll need to think about how to organize my hardware, I'll definitely procure an RDNA GPU. For V100/Mi100 I'm not yet sure how to best obtain access.
Do not commit this until the ci is fixed by properly installing rocwmma on ci. (pr for this will follow) |
@deepsek You have previously expressed interest in adding MFMA support to the fattn mma path, it would be helpful if you could share your current plans in this direction, if any. |
@IMbackK, I was targeting a November PR to address fattn for MMA along with some other changes. But I'm currently stretched thin with other open-source projects. We might be delayed until 2026. If anyone in the community is taking up this effort, I would be happy to assist with issues! |
rocwmma 2.0.0 includes a bug in the code fakeing fp16 accumulation on CDNA
@slaren It seams @JohannesGaessler's approval is no longer sufficant, I belive due to the recent changes to codeowners, or perhaps some other configuration change i'm not aware of. |
The number of people with write access has been reduced, see #16113 for more details. Merging based on Johannes' approval. |
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
…MMA 2.0.0 (ggml-org#16221)" This reverts commit e95fec6.
rocwmma 2.0.0 includes a bug in the code faking fp16 accumulation on CDNA
Current rocwmma as released with ROCM 7.0.0 and 7.0.1 includes an embarrassing compile time bug in the code that emulates fp16 accumulation via downcast on devices which do not support this in hardware.
This pr redesigns the conditions on which the WMMA fattn kernel is selected and avoids compiling and using the kernel on the following broken configurations:
CDNA and ROCWMMA 2.0.0
RDNA4 and ROCWMMA <2.0.0