Skip to content

CUDA: flashattn head dim issues #802

@Green-Sky

Description

@Green-Sky

Removing this d_head check causes sd1 with fa to crash, since cuda does not support 40.

However, vulkan added support for this, so it works there just fine.
Incredibly, this makes the vulkan backend much faster for sd1+fa in comparison to cuda.

768x1024
cuda: 2.15s/it
vulkan: 1.20it/s 1.27it/s (with diffusion-conv-direct)

With it also having a better conv2d impl, it makes it now the obvious choice for sd1.
--diffusion-fa --vae-conv-direct --diffusion-conv-direct

Originally posted by @Green-Sky in cb1d975

(Commenting on a commit felt wrong.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions