Skip to content

[deepseek][kernels][blackwell] Cutlass blackwell grouped gemm using cute dsl (forward,backward) #1276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

lessw2020
Copy link
Contributor

@lessw2020 lessw2020 commented Jun 8, 2025

This PR integrates the new cutlass dsl grouped gemm into PyTorch with the CUTLASSGroupedGemmStrategy.
This handles the various conversions and pointer and metadata arrays needed.

Testing:
verified via the benchmarking file as a standalone group gemm
verified group gemm strategyintegration with the testMoe.

Screenshot 2025-06-08 at 9 34 01 PM Screenshot 2025-06-07 at 10 37 18 PM

lessw2020 added 2 commits June 8, 2025 08:39

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 8, 2025
@drisspg
Copy link
Contributor

drisspg commented Jun 9, 2025

OOC are you not comparing against groupd_gemm because we aren't building on sm100?

@lessw2020
Copy link
Contributor Author

OOC are you not comparing against groupd_gemm because we aren't building on sm100?

yes, exactly:
"Error using torch strategy: torch._grouped_mm is only supported on CUDA devices with compute capability = 9.0"

@drisspg
Copy link
Contributor

drisspg commented Jun 9, 2025

Will open an issue for this

lessw2020 added 3 commits June 8, 2025 20:04
@lessw2020 lessw2020 changed the title [WIP][kernels][blackwell] Cutlass blackwell grouped gemm using cute dsl [deepseek][kernels][blackwell] Cutlass blackwell grouped gemm using cute dsl (forward) Jun 9, 2025
lessw2020 added 13 commits June 11, 2025 15:50
lessw2020 added 13 commits June 20, 2025 09:49
@lessw2020 lessw2020 changed the title [deepseek][kernels][blackwell] Cutlass blackwell grouped gemm using cute dsl (forward) [deepseek][kernels][blackwell] Cutlass blackwell grouped gemm using cute dsl (forward,backward) Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants