Skip to content

[deepseek][blackwell] add Cutlass cute dsl blackwell dense based looping group gemm #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lessw2020
Copy link
Contributor

This PR runs the Cutlass 4.0 dense blackwell gemms via a manual looping to run the MoE grouped gemm.
This is a stepping stone PR to start using dedicated Cutlass 4.0 blackwell gemms on blackwell.

Note this adds a requirement for pip install nvidia-cutlass-dsl.

Screenshot 2025-06-07 at 7 26 33 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 8, 2025
@lessw2020 lessw2020 changed the title [deepseek][blackwell] add cute dsl dense based looping group gemm [deepseek][blackwell] add Cutlass cute dsl blackwell dense based looping group gemm Jun 8, 2025
@lessw2020 lessw2020 requested a review from kwen2501 June 8, 2025 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants