use kernels to support `_flash_hub` attention backend #12318

ParagEkbote · 2025-09-11T15:25:05Z

What does this PR do?

As discussed in the issue, this PR adds support for kernels-community/flash-attn kernel. Could you please review?

Fixes #12308

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul

sayakpaul · 2025-09-11T15:48:15Z

Thanks for this PR. Could you update it with some code examples and results?

ParagEkbote · 2025-09-11T18:11:15Z

This is the test command, but unable to generate images.

import os
os.environ["DIFFUSERS_ENABLE_HUB_KERNELS"] = "yes"

# Debug: Verify the env var is set
print(f"DIFFUSERS_ENABLE_HUB_KERNELS = {os.environ.get('DIFFUSERS_ENABLE_HUB_KERNELS')}")

import torch
from diffusers import FluxPipeline
from diffusers.quantizers import PipelineQuantizationConfig

# Debug: Check if diffusers sees the env var
from diffusers.models.attention_dispatch import DIFFUSERS_ENABLE_HUB_KERNELS
print(f"Diffusers sees DIFFUSERS_ENABLE_HUB_KERNELS = {DIFFUSERS_ENABLE_HUB_KERNELS}")

# ✅ 3. Load pipeline with quantization
model_id = "black-forest-labs/FLUX.1-dev"
pipe = FluxPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    quantization_config=PipelineQuantizationConfig(
        quant_backend="bitsandbytes_4bit",
        quant_kwargs={
            "load_in_4bit": True,
            "bnb_4bit_quant_type": "nf4",
            "bnb_4bit_compute_dtype": torch.bfloat16,
        },
        components_to_quantize=["transformer"],
    ),
).to("cuda")

pipe.transformer.set_attention_backend("_flash_hub")

prompt = "A cat holding a sign that says 'hello world'"
image = pipe(prompt, num_inference_steps=28, guidance_scale=4.0).images[0]
image.save("output.png")

…to Add-FA2

ParagEkbote · 2025-09-12T07:47:39Z

I'm having issues regarding some of the parameters with the following traceback:

Traceback (most recent call last):
  File "/teamspace/studios/this_studio/diffusers/main.py", line 34, in <module>
    image = pipe(prompt, num_inference_steps=28, guidance_scale=4.0).images[0]
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 944, in __call__
    noise_pred = self.transformer(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 720, in forward
    encoder_hidden_states, hidden_states = block(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 443, in forward
    attention_outputs = self.attn(
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 342, in forward
    return self.processor(self, hidden_states, encoder_hidden_states, attention_mask, image_rotary_emb, **kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 116, in __call__
    hidden_states = dispatch_attention_fn(
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/attention_dispatch.py", line 304, in dispatch_attention_fn
    return backend_fn(**kwargs)
  File "/teamspace/studios/this_studio/diffusers/src/diffusers/models/attention_dispatch.py", line 765, in _flash_attention_hub
    out = flash_attn_func_hub(
TypeError: flash_attn_func() got an unexpected keyword argument 'alibi_slopes'

The same error occurs with dropout_p parameter as well. WDYT?

cc: @sayakpaul

sayakpaul · 2025-09-25T08:02:54Z

@ParagEkbote I think we can close this PR in favor of #12387. You're more than welcome to test the PR and let us know of any feedback.

ParagEkbote · 2025-09-25T12:14:46Z

@sayakpaul Thanks for letting me know and being a patient reviewer. Closing the PR..

add fa kernel.

3ce378e

ParagEkbote added 2 commits September 11, 2025 17:40

update with proper handling.

2fa8e53

remove some params for now.

9264655

ParagEkbote added 5 commits September 12, 2025 07:12

Update

40b24c6

Revert

ee922c3

Merge branch 'Add-FA2' of https://github.com/ParagEkbote/diffusers in…

f576dc1

…to Add-FA2

update.

1f5373a

remove.

70ae11a

Merge branch 'main' into Add-FA2

637de14

ParagEkbote mentioned this pull request Sep 23, 2025

What kernels should we integrate in Diffusers? #12375

Open

ParagEkbote closed this Sep 25, 2025

ParagEkbote deleted the Add-FA2 branch September 27, 2025 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use kernels to support `_flash_hub` attention backend #12318

use kernels to support `_flash_hub` attention backend #12318

Uh oh!

ParagEkbote commented Sep 11, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Sep 11, 2025

Uh oh!

ParagEkbote commented Sep 11, 2025 •

edited

Loading

Uh oh!

ParagEkbote commented Sep 12, 2025

Uh oh!

sayakpaul commented Sep 25, 2025

Uh oh!

ParagEkbote commented Sep 25, 2025

Uh oh!

Uh oh!

use kernels to support _flash_hub attention backend #12318

use kernels to support _flash_hub attention backend #12318

Uh oh!

Conversation

ParagEkbote commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented Sep 11, 2025

Uh oh!

ParagEkbote commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParagEkbote commented Sep 12, 2025

Uh oh!

sayakpaul commented Sep 25, 2025

Uh oh!

ParagEkbote commented Sep 25, 2025

Uh oh!

Uh oh!

use kernels to support `_flash_hub` attention backend #12318

use kernels to support `_flash_hub` attention backend #12318

ParagEkbote commented Sep 11, 2025 •

edited

Loading

ParagEkbote commented Sep 11, 2025 •

edited

Loading