Skip to content

[CUDA][Blackwell] Blackwell Tracking Issue #145949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
11 tasks done
eqy opened this issue Jan 29, 2025 · 10 comments
Open
11 tasks done

[CUDA][Blackwell] Blackwell Tracking Issue #145949

eqy opened this issue Jan 29, 2025 · 10 comments
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@eqy
Copy link
Collaborator

eqy commented Jan 29, 2025

🚀 The feature, motivation and pitch

Blackwell's CUDA toolkit has been released and we're working on rapidly upstream fixes/upgrades that are required to support Blackwell (e.g., SM 10.0, SM 12.0).

Build fixes (these are needed to prevent kernels from crashing or enable existing backend support):

Library upgrades (these are needed to enable Blackwell support on math libraries):

Performance upgrades (existing kernels w/ improved implementation on Blackwell):

cc @malfet @seemethere @ptrblck @msaroufim

@eqy eqy added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 29, 2025
@vince62s
Copy link

vince62s commented Feb 8, 2025

is torch.compile supposed to work with nightly / sm_120 ?

'sm_120' is not a recognized processor for this target (ignoring processor)
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
E0208 11:41:41.548000 1943928 subproc_pool.py:321] Error in subprocess
E0208 11:41:41.548000 1943928 subproc_pool.py:321] Traceback (most recent call last):
E0208 11:41:41.548000 1943928 subproc_pool.py:321]   File "/home/vincent/miniconda3/envs/pt2.5/lib/python3.11/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 319, in callback
E0208 11:41:41.548000 1943928 subproc_pool.py:321]     result = future.result()
E0208 11:41:41.548000 1943928 subproc_pool.py:321]              ^^^^^^^^^^^^^^^
E0208 11:41:41.548000 1943928 subproc_pool.py:321]   File "/home/vincent/miniconda3/envs/pt2.5/lib/python3.11/concurrent/futures/_base.py", line 449, in result
E0208 11:41:41.548000 1943928 subproc_pool.py:321]     return self.__get_result()
E0208 11:41:41.548000 1943928 subproc_pool.py:321]            ^^^^^^^^^^^^^^^^^^^
E0208 11:41:41.548000 1943928 subproc_pool.py:321]   File "/home/vincent/miniconda3/envs/pt2.5/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
E0208 11:41:41.548000 1943928 subproc_pool.py:321]     raise self._exception
E0208 11:41:41.548000 1943928 subproc_pool.py:321] concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

@drisspg
Copy link
Contributor

drisspg commented Feb 8, 2025

@vince62s please see #146518

@faruknane
Copy link

I have encountered the same issue with RTX 5090. I can't use bitsandbytes, torchao, torch compile, torch inductor. It always gives an error as it doesn't recognize the new compute capability: torch/_inductor/runtime/triton_heuristics.py:515] ptxas fatal : Value 'sm_120' is not defined for option 'gpu-name'.

@bryantbiggs
Copy link
Contributor

What is the plan to handle this statement in the CUDA docs

Application binaries that include PTX version of kernels with architecture conditional features using sm_100a or compute_100a in order to take full advantage of Blackwell GPU architecture, are not forward or backward compatible. For example, PTX compiled for compute_90a (Hopper) are not supported on the Blackwell architecture.

Reference https://docs.nvidia.com/cuda/blackwell-compatibility-guide/#application-compatibility-on-blackwell-architecture

@eqy
Copy link
Collaborator Author

eqy commented Mar 10, 2025

What is the plan to handle this statement in the CUDA docs

Application binaries that include PTX version of kernels with architecture conditional features using sm_100a or compute_100a in order to take full advantage of Blackwell GPU architecture, are not forward or backward compatible. For example, PTX compiled for compute_90a (Hopper) are not supported on the Blackwell architecture.

Reference https://docs.nvidia.com/cuda/blackwell-compatibility-guide/#application-compatibility-on-blackwell-architecture

Currently only a very small portion of the kernels in PyTorch use arch-conditional features (built with "[smXX]a")---just the rowwise scaling kernel if IIRC. The short-term plan for these is to simply implement the same functionality for each arch-conditional compute capability e.g., sm100(a) here: #148421

Longer term we would want to revisit things such as having a more generalized build process for arch-conditionals as adding specialized compilation options for each compilation unit doesn't scale very well.

@fayezsalka
Copy link

On Windows, PyTorch with RTX 5090 (latest nightly with cuda 12.8 and latest drivers) is substantially slower vs RTX 4090, in some cases half as fast as the 4090. Tested architectures: Resnet / 1D and 2D UNETs / Transformer Encoder

With Windows Linux subsystem (same machine same drivers), the RTX 5090 performs faster than the 4090, as expected.

@atalman atalman removed this from the 2.7.0 milestone Mar 31, 2025
@atalman
Copy link
Contributor

atalman commented Mar 31, 2025

Removing from 2.7.0 milestone, since work for the release has been completed

@152334H
Copy link

152334H commented Apr 7, 2025

Is the cause of #150725 known?

@JelllyS
Copy link

JelllyS commented Apr 22, 2025

hey ... its my first time posting something here . . .
so i wanted to to use my new RTX 5090 for image generation and somehow i need python PyTorch and CUDA and my version my PyTorch is not compatible with my RTX 5090 and the invoke ai
maybe i am worng here but i hope someone can help

@eqy
Copy link
Collaborator Author

eqy commented Apr 22, 2025

hey ... its my first time posting something here . . . so i wanted to to use my new RTX 5090 for image generation and somehow i need python PyTorch and CUDA and my version my PyTorch is not compatible with my RTX 5090 and the invoke ai maybe i am worng here but i hope someone can help

@JelllyS this issue is not intended for user support. Please search through existing issues (there are many opened and closed issues regarding 5090, often caused by a PyTorch install that is too old, such as using 2.6, CUDA < 12.8, or a nightly wheel before ~February). e.g., #151376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests