[v1] - Mamba1 Attention Metadata #21249

Josephasafg · 2025-07-20T10:08:42Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Add full v1-style attention-metadata support to Mamba-1 models. The following was added

Mamba1AttentionMetadataBuilder to create the attention metadata for mamba-1 models
MambaStateShapeCalculator aggregated mamba state shape calculation logic for all mamba models under the same static class for better readability and navigation
Updated selective_scan_fwd to support V1 memory layout

Test Plan

Updated the following tests -
test_hybrid.py - Now tests Mamba1 and Jamba in V1
test_oracle.py - Removed Mamba1 from unsupported V1 model
Running all tests in test_hybrid.py, test_oracle.py and test_mamba_ssm.py (due to kernel change) pass.

Running mamba1 in main branch would raise an error

lm_eval results with state-spaces/mamba-130m-hf

vLLM V0 with Mamba1

VLLM_USE_V1=0 HF_ALLOW_CODE_EVAL=1  lm_eval --model vllm \
    --model_args pretrained=state-spaces/mamba-130m-hf,enforce_eager=True,enable_prefix_caching=False,tensor_parallel_size=1 \
    --tasks humaneval \
    --batch_size auto \
    --confirm_run_unsafe_code
|  Tasks  |Version|  Filter   |n-shot|Metric|   |Value |   |Stderr|
|---------|------:|-----------|-----:|------|---|-----:|---|-----:|
|humaneval|      1|create_test|     0|pass@1|   |0.0244|±  |0.0121|

vLLM V1 with Mamba1

VLLM_USE_V1=1 HF_ALLOW_CODE_EVAL=1  lm_eval --model vllm \
    --model_args pretrained=state-spaces/mamba-130m-hf,enable_prefix_caching=False,tensor_parallel_size=1 \
    --tasks humaneval \
    --batch_size auto \
    --confirm_run_unsafe_code
|  Tasks  |Version|  Filter   |n-shot|Metric|   |Value |   |Stderr|
|---------|------:|-----------|-----:|------|---|-----:|---|-----:|
|humaneval|      1|create_test|     0|pass@1|   |0.0244|±  |0.0121|

This PR now enables this vLLM V1 to work with models that use Mamba1 like Mamba1 and Jamba

github-actions · 2025-07-20T10:08:49Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces v1-style attention metadata support for Mamba-1 models and refactors the Mamba state shape calculation into a centralized MambaStateShapeCalculator class. The refactoring improves code organization and maintainability. The v1 support is well-integrated, with clear logic separation based on the VLLM_USE_V1 environment variable.

vllm/model_executor/layers/mamba/mamba_mixer.py

heheda12345

Thanks for the great job.

Some questions:

Do we need enforce-eager to run mamba1? I'm OK with supporting cuda graph in a future PR.
Can you show some lm-eval result on mamba1 model?
Please update tests/models/language/generation/test_hybrid.py and tests/v1/test_oracle.py
Is Jamba supported now?
Please update the doc like https://docs.vllm.ai/en/latest/usage/v1_guide.html#mamba-models and https://docs.vllm.ai/en/latest/usage/v1_guide.html#mamba-models

vllm/v1/kv_cache_interface.py

vllm/v1/worker/gpu_model_runner.py

vllm/model_executor/layers/mamba/mamba_mixer.py

mergify · 2025-07-22T10:45:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/attention/mamba_selectors.py

vllm/model_executor/layers/mamba/mamba_mixer.py

vllm/model_executor/layers/mamba/mamba_utils.py

vllm/model_executor/layers/mamba/mamba_mixer.py

vllm/model_executor/models/mamba.py

tests/models/language/generation/test_hybrid.py

vllm/model_executor/layers/mamba/mamba_mixer.py

vllm/model_executor/layers/mamba/mamba_utils.py

vllm/model_executor/layers/mamba/mamba_mixer.py

tdoublep · 2025-07-24T15:44:00Z

vllm/v1/attention/backends/mamba1_attn.py

How different is the Mamba1AttentionMetdata to the Mamba2AttentionMetadata? Do we really need two separate classes?

Good question. I think we should keep both since Mamba2AttentionMetadata contains fields that aren't relevant for mamba1 like chunk_indices, chunk_offsets and triton kernels related fields that mamba1 doesn't need which would add overhead to the class.

I prefer to keep mamba1 and mamba2 as seperate metadata classes. Comparing with one metadata class with many optional entry and branches for different types of layers, I prefer this pluggable design. We can extract common logic to a parent class if we find some after more models like minimax are added. @tdoublep WDYT?

@heheda12345 Sorry I missed this question. Yes I agree let's keep the metadata classes separate and then factor the common things into a CommonMambaAttentionMetadata or something once it is clear what is truly common. This one is ready and we have the LFM2 and MiniMax-Text ones nearly there too, so we should be able to look at that soon.

vllm/v1/worker/gpu_model_runner.py

tests/models/language/generation/test_hybrid.py

vllm/model_executor/layers/mamba/mamba_mixer.py

vllm/model_executor/layers/mamba/mamba_utils.py

heheda12345 · 2025-07-25T07:31:36Z

vllm/v1/attention/backends/mamba1_attn.py

I prefer to keep mamba1 and mamba2 as seperate metadata classes. Comparing with one metadata class with many optional entry and branches for different types of layers, I prefer this pluggable design. We can extract common logic to a parent class if we find some after more models like minimax are added. @tdoublep WDYT?

vllm/v1/attention/backends/mamba1_attn.py

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-07-26T13:14:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-07-29T17:42:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: asafg <[email protected]>

tdoublep

I have a few (mostly minor) comments but this look nearly ready to go in my view. Great work.

docs/usage/v1_guide.md

tests/models/language/generation/test_hybrid.py

tdoublep · 2025-08-05T21:55:02Z

vllm/model_executor/layers/mamba/mamba_utils.py

-    )
-
-    return conv_state_shape, temporal_state_shape
+class MambaStateShapeCalculator:


Is there any real reason to introduce class MambaStateShapeCalculator? Couldn't these just be different utils functions? It creates a lot of diff in the other files for little benefit as far as I can see. Right now it is making the PR look more intrusive than it really is (with 20 files changed).

The reason I added the class was because I needed to add a new function to calculate mamba1 state shape. The file already had a get_mamba_state_shape function but it was mamba2 only, and I didn't want to introduce branching logic within it to handle both architectures.

I considered loose utility functions like get_mamba1_state_shape, get_mamba2_state_shape, but the class provides clearer grouping since:

These functions are conceptually related (all calculate Mamba state shapes)

It makes the API more discoverable - you know all state shape calculations are in one place

what do you think?

I think it's reasonable. My main argument against was that the change touches a lot of files, but we would have to change the function name to mamba2 anyway which would create similar level of diff.

vllm/v1/attention/backends/mamba1_attn.py

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: asafg <[email protected]>

tdoublep

LGTM - thanks for great work on this PR. Debugging the kernel-level stuff must have not so straightforward. Great that it works now and only needs to slightly extend the abstractions that were put in place for mamba2.

tdoublep · 2025-08-06T06:44:45Z

vllm/model_executor/layers/mamba/mamba_utils.py

-    )
-
-    return conv_state_shape, temporal_state_shape
+class MambaStateShapeCalculator:


I think it's reasonable. My main argument against was that the change touches a lot of files, but we would have to change the function name to mamba2 anyway which would create similar level of diff.

tlrmchlsmth · 2025-08-06T15:05:40Z

csrc/mamba/mamba_ssm/selective_scan_fwd.cu

+        params.ssm_states_batch_stride = ssm_states.stride(0);
+        params.ssm_states_dim_stride = ssm_states.stride(1);  
+        params.ssm_states_dstate_stride = ssm_states.stride(2);


can this be pulled out of the if/else?

tlrmchlsmth · 2025-08-06T15:16:52Z

PR looks good. Thanks for adding Jamba and Mamba1 to V1!

tdoublep · 2025-08-06T21:47:12Z

V1 test failure is known flaky test (see #22385), and the quantization test and blackwell tests look unrelated. This one is good to merge imo.

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Noam Gat <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

mergify bot added the v1 label Jul 20, 2025

gemini-code-assist bot reviewed Jul 20, 2025

View reviewed changes

vllm/model_executor/layers/mamba/mamba_mixer.py Outdated Show resolved Hide resolved

Josephasafg marked this pull request as ready for review July 20, 2025 11:25

Josephasafg requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners July 20, 2025 11:25

Josephasafg force-pushed the main branch from 54d3c17 to 624e8e4 Compare July 20, 2025 12:54

heheda12345 mentioned this pull request Jul 21, 2025

[RFC]: Native support for Mamba, SSM, and hybrid transformer models in vLLM V1 #17140

Open

9 tasks

heheda12345 reviewed Jul 21, 2025

View reviewed changes

vllm/v1/kv_cache_interface.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/mamba/mamba_mixer.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Jul 22, 2025

Josephasafg force-pushed the main branch from 9f08884 to f29a2b7 Compare July 22, 2025 10:51

mergify bot removed the needs-rebase label Jul 22, 2025

heheda12345 reviewed Jul 23, 2025

View reviewed changes

Josephasafg requested a review from DarkLight1337 as a code owner July 23, 2025 17:54

tdoublep reviewed Jul 24, 2025

View reviewed changes

heheda12345 reviewed Jul 25, 2025

View reviewed changes

mergify bot added the needs-rebase label Jul 26, 2025

Josephasafg force-pushed the main branch from 890abb8 to 9371c7d Compare July 26, 2025 15:47

mergify bot removed the needs-rebase label Jul 26, 2025

heheda12345 mentioned this pull request Jul 27, 2025

LFM2 #20797

Closed

4 tasks

Josephasafg mentioned this pull request Jul 28, 2025

[v1][mamba] Added mamba_type into MambaSpec #21715

Merged

4 tasks

mergify bot added the needs-rebase label Jul 29, 2025

feat: Added MambaStateShapeCalculator

83c39e5

Signed-off-by: asafg <[email protected]>

tdoublep reviewed Aug 5, 2025

View reviewed changes

fix: CR comments

9ab94d4

Signed-off-by: asafg <[email protected]>

tdoublep approved these changes Aug 6, 2025

View reviewed changes

tlrmchlsmth reviewed Aug 6, 2025

View reviewed changes

tlrmchlsmth approved these changes Aug 6, 2025

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 6, 2025

tlrmchlsmth enabled auto-merge (squash) August 6, 2025 15:17

simon-mo disabled auto-merge August 7, 2025 00:03

simon-mo merged commit 46a1394 into vllm-project:main Aug 7, 2025
79 of 84 checks passed

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

3d6ead9

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

cd4965e

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Noam Gat <[email protected]>

DarkLight1337 mentioned this pull request Aug 9, 2025

[CI Failure]: Distributed Tests (2 GPUs) - Mllama TP=2 results divergence and deadlock issue #22559

Closed

3 tasks

AlpinDale mentioned this pull request Aug 11, 2025

V1 feat: add Mamba1 support aphrodite-engine/aphrodite-engine#1410

Merged

wuhang2014 pushed a commit to wuhang2014/vllm that referenced this pull request Aug 12, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

e2d60d5

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

95a2311

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

500bdf1

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

bd61248

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

6e722ba

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[v1] - Mamba1 Attention Metadata (vllm-project#21249)

7760bfa

Signed-off-by: asafg <[email protected]> Co-authored-by: asafg <[email protected]>

Uh oh!

[v1] - Mamba1 Attention Metadata #21249

[v1] - Mamba1 Attention Metadata #21249

Uh oh!

Conversation

Josephasafg commented Jul 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Uh oh!

github-actions bot commented Jul 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdoublep Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 26, 2025

Uh oh!

mergify bot commented Jul 29, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tdoublep Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Josephasafg commented Jul 20, 2025 •

edited by github-actions bot

Loading

tdoublep Aug 6, 2025 •

edited

Loading

tdoublep Aug 5, 2025 •

edited

Loading