[CUDA][Blackwell] Blackwell Tracking Issue

### 🚀 The feature, motivation and pitch

Blackwell's CUDA toolkit has been released and we're working on rapidly upstream fixes/upgrades that are required to support Blackwell (e.g., SM 10.0, SM 12.0).

Build fixes (these are needed to prevent kernels from crashing or enable existing backend support):
-------------------------------------------------------------------------------------------------------------
- [x] enable compute capabilities in build https://github.com/pytorch/pytorch/pull/145436 https://github.com/pytorch/pytorch/pull/141724
- [x] gate sm90 specific kernels to sm90 for now https://github.com/pytorch/pytorch/pull/145728
- [x] limit number of threads in avgpool_2d backward to prevent crash on launch https://github.com/pytorch/pytorch/pull/145669
- [x] SDPA kernel SM gating https://github.com/pytorch/pytorch/pull/145602
- [x] CUDA 12.8 upgrade incl. CI https://github.com/pytorch/pytorch/pull/145567

Library upgrades (these are needed to enable Blackwell support on math libraries):
-------------------------------------------------------------------------------------------
- [x] cuDNN upgrade to 9.7.0+
- [x] cuBLAS upgrade (will implicitly happen with upgrade to CUDA 12.8+)
- [x] NCCL upgrade to 2.25.1 https://github.com/pytorch/pytorch/pull/145776
- [x] CUTLASS upgrade to 3.8.0 https://github.com/pytorch/pytorch/pull/145741
- [x] Triton upgrade to main/old pin w/ Blackwell support https://github.com/pytorch/pytorch/issues/146518 CC @drisspg 

Performance upgrades (existing kernels w/ improved implementation on Blackwell):
--------------------------------------------------------------------------------------------
- [x] 128-bit vectorization https://github.com/pytorch/pytorch/pull/145746



cc @malfet @seemethere @ptrblck @msaroufim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][Blackwell] Blackwell Tracking Issue #145949

🚀 The feature, motivation and pitch

Build fixes (these are needed to prevent kernels from crashing or enable existing backend support):

Library upgrades (these are needed to enable Blackwell support on math libraries):

Performance upgrades (existing kernels w/ improved implementation on Blackwell):

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDA][Blackwell] Blackwell Tracking Issue #145949

Description

🚀 The feature, motivation and pitch

Build fixes (these are needed to prevent kernels from crashing or enable existing backend support):

Library upgrades (these are needed to enable Blackwell support on math libraries):

Performance upgrades (existing kernels w/ improved implementation on Blackwell):

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions