[core] support flash attention through `kernels` #12387

sayakpaul · 2025-09-25T08:01:49Z

What does this PR do?

Follow-up of #12236.

Testing code:

import torch
from diffusers import FluxPipeline

model_id = "black-forest-labs/FLUX.1-dev"
pipe = FluxPipeline.from_pretrained(
    model_id, torch_dtype=torch.bfloat16
).to("cuda")

pipe.transformer.set_attention_backend("flash_hub")
pipe.transformer.compile(fullgraph=True)

prompt = "A cat holding a sign that says 'hello world'"

with torch._dynamo.config.patch(error_on_recompile=True):
    image = pipe(
        prompt, num_inference_steps=28, guidance_scale=4.0, generator=torch.manual_seed(0)
    ).images[0]
    image.save("output.png")

Tip

Works with torch.compile fullgraph compatibility.

I have tested the code on H100 and A100, and it works.

sayakpaul · 2025-09-25T08:06:28Z

src/diffusers/models/attention_dispatch.py

    # `flash-attn`
    FLASH = "flash"
    FLASH_VARLEN = "flash_varlen"
+    FLASH_HUB = "flash_hub"


Flash Attention is stable. So, we don't have to mark it private like FA3.

HuggingFaceDocBuilderDev · 2025-09-25T08:09:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Very cool integration 🔥 ! I just left some nits

MekkCyber · 2025-09-25T08:12:17Z

src/diffusers/models/attention_dispatch.py

+    fa3_interface_hub = _get_fa3_from_hub()
+    flash_attn_3_func_hub = fa3_interface_hub.flash_attn_func
+    fa_interface_hub = _get_fa_from_hub()
+    flash_attn_func_hub = fa_interface_hub.flash_attn_func


Why are we fetching both kernels here ?

Because of the way APIs for attention backends are designed and also to support torch.compile with fullgraph traceability (when possible).

We will let it grow a bit and upon feedback, we can revisit how to better deal with this.

MekkCyber · 2025-09-25T08:16:48Z

src/diffusers/models/attention_dispatch.py

    FLASH = "flash"
    FLASH_VARLEN = "flash_varlen"
+    FLASH_HUB = "flash_hub"
+    # FLASH_VARLEN_HUB = "flash_varlen_hub" # not supported yet.


is this related to the kernel or it just needs more time to be integrated ?

We don't have models that use varlen.

@sayakpaul qwen image uses varlen. also, native fused qkv+mlp attn requires varlen function.

sayakpaul added 2 commits September 25, 2025 13:05

up

c386f22

support fa (2) through kernels.

d252c02

sayakpaul requested a review from DN6 September 25, 2025 08:01

sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Sep 25, 2025

sayakpaul mentioned this pull request Sep 25, 2025

use kernels to support _flash_hub attention backend #12318

Closed

6 tasks

sayakpaul commented Sep 25, 2025

View reviewed changes

MekkCyber reviewed Sep 25, 2025

View reviewed changes

sayakpaul mentioned this pull request Sep 25, 2025

[tests] Test attention backends #12388

Open

sayakpaul added 2 commits September 26, 2025 11:10

up

1b96ed7

Merge branch 'main' into fa-hub

474b995

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] support flash attention through `kernels` #12387

[core] support flash attention through `kernels` #12387

sayakpaul commented Sep 25, 2025

Uh oh!

sayakpaul Sep 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

MekkCyber Sep 25, 2025

Uh oh!

sayakpaul Sep 25, 2025

Uh oh!

MekkCyber Sep 25, 2025

Uh oh!

sayakpaul Sep 25, 2025

Uh oh!

bghira Sep 29, 2025

Uh oh!

Uh oh!

[core] support flash attention through kernels #12387

Are you sure you want to change the base?

[core] support flash attention through kernels #12387

Conversation

sayakpaul commented Sep 25, 2025

What does this PR do?

Uh oh!

sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

bghira Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[core] support flash attention through `kernels` #12387

[core] support flash attention through `kernels` #12387