-
Notifications
You must be signed in to change notification settings - Fork 24k
[CUDA][Blackwell] Blackwell Tracking Issue #145949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
is torch.compile supposed to work with nightly / sm_120 ?
|
I have encountered the same issue with RTX 5090. I can't use bitsandbytes, torchao, torch compile, torch inductor. It always gives an error as it doesn't recognize the new compute capability: |
What is the plan to handle this statement in the CUDA docs
|
Currently only a very small portion of the kernels in PyTorch use arch-conditional features (built with "[smXX]a")---just the rowwise scaling kernel if IIRC. The short-term plan for these is to simply implement the same functionality for each arch-conditional compute capability e.g., sm100(a) here: #148421 Longer term we would want to revisit things such as having a more generalized build process for arch-conditionals as adding specialized compilation options for each compilation unit doesn't scale very well. |
On Windows, PyTorch with RTX 5090 (latest nightly with cuda 12.8 and latest drivers) is substantially slower vs RTX 4090, in some cases half as fast as the 4090. Tested architectures: Resnet / 1D and 2D UNETs / Transformer Encoder With Windows Linux subsystem (same machine same drivers), the RTX 5090 performs faster than the 4090, as expected. |
Removing from 2.7.0 milestone, since work for the release has been completed |
Is the cause of #150725 known? |
hey ... its my first time posting something here . . . |
@JelllyS this issue is not intended for user support. Please search through existing issues (there are many opened and closed issues regarding 5090, often caused by a PyTorch install that is too old, such as using 2.6, CUDA < 12.8, or a nightly wheel before ~February). e.g., #151376 |
🚀 The feature, motivation and pitch
Blackwell's CUDA toolkit has been released and we're working on rapidly upstream fixes/upgrades that are required to support Blackwell (e.g., SM 10.0, SM 12.0).
Build fixes (these are needed to prevent kernels from crashing or enable existing backend support):
avg_pool2d
backward for SM 10.0 #145669Library upgrades (these are needed to enable Blackwell support on math libraries):
Performance upgrades (existing kernels w/ improved implementation on Blackwell):
cc @malfet @seemethere @ptrblck @msaroufim
The text was updated successfully, but these errors were encountered: