vulkan: Support FA with any multiple of 8 head sizes #15537

jeffbolznv · 2025-08-24T03:37:30Z

The scalar FA shader already handled multiples of 8. The coopmat1 FA shader assumed 16x16x16 and the shared memory allocations need the HSK dimensions padded to a multiple of 16. NVIDIA's coopmat2 implementation requires multiples of 16 for N and K, and needs the matrix dimensions padded and loads clamped.

Store the FA pipelines in a map, indexed by the pipeline state.

I was looking at stable-diffusion.cpp and noticed that some models (stable diffusion 1.5) use attention head sizes that we don't support (40 and 160). Enabling FA is 2.5x faster than the full attention calculation.

The scalar FA shader already handled multiples of 8. The coopmat1 FA shader assumed 16x16x16 and the shared memory allocations need the HSK dimensions padded to a multiple of 16. NVIDIA's coopmat2 implementation requires multiples of 16 for N and K, and needs the matrix dimensions padded and loads clamped. Store the FA pipelines in a map, indexed by the pipeline state.

0cc4m

LGTM

The scalar FA shader already handled multiples of 8. The coopmat1 FA shader assumed 16x16x16 and the shared memory allocations need the HSK dimensions padded to a multiple of 16. NVIDIA's coopmat2 implementation requires multiples of 16 for N and K, and needs the matrix dimensions padded and loads clamped. Store the FA pipelines in a map, indexed by the pipeline state.

jeffbolznv requested a review from 0cc4m as a code owner August 24, 2025 03:37

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 24, 2025

0cc4m approved these changes Aug 24, 2025

View reviewed changes

0cc4m merged commit c9a24fb into ggml-org:master Aug 24, 2025
87 of 89 checks passed

ggerganov mentioned this pull request Aug 25, 2025

metal : add FA kernels for HS=40 #15559

Merged

rmatif mentioned this pull request Sep 2, 2025

OpenCL: add hs=40 support to FA #15758

Merged

Green-Sky mentioned this pull request Sep 7, 2025

CUDA: flashattn head dim issues leejet/stable-diffusion.cpp#802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Support FA with any multiple of 8 head sizes #15537

vulkan: Support FA with any multiple of 8 head sizes #15537

Uh oh!

jeffbolznv commented Aug 24, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: Support FA with any multiple of 8 head sizes #15537

vulkan: Support FA with any multiple of 8 head sizes #15537

Uh oh!

Conversation

jeffbolznv commented Aug 24, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!