Skip to content

Conversation

Josephasafg
Copy link
Contributor

@Josephasafg Josephasafg commented Jul 20, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Add full v1-style attention-metadata support to Mamba-1 models. The following was added

  1. Mamba1AttentionMetadataBuilder to create the attention metadata for mamba-1 models
  2. MambaStateShapeCalculator aggregated mamba state shape calculation logic for all mamba models under the same static class for better readability and navigation
  3. Updated selective_scan_fwd to support V1 memory layout

Test Plan

Updated the following tests -
test_hybrid.py - Now tests Mamba1 and Jamba in V1
test_oracle.py - Removed Mamba1 from unsupported V1 model
Running all tests in test_hybrid.py, test_oracle.py and test_mamba_ssm.py (due to kernel change) pass.

Running mamba1 in main branch would raise an error

lm_eval results with state-spaces/mamba-130m-hf

vLLM V0 with Mamba1

VLLM_USE_V1=0 HF_ALLOW_CODE_EVAL=1  lm_eval --model vllm \
    --model_args pretrained=state-spaces/mamba-130m-hf,enforce_eager=True,enable_prefix_caching=False,tensor_parallel_size=1 \
    --tasks humaneval \
    --batch_size auto \
    --confirm_run_unsafe_code
|  Tasks  |Version|  Filter   |n-shot|Metric|   |Value |   |Stderr|
|---------|------:|-----------|-----:|------|---|-----:|---|-----:|
|humaneval|      1|create_test|     0|pass@1|   |0.0244|±  |0.0121|

vLLM V1 with Mamba1

VLLM_USE_V1=1 HF_ALLOW_CODE_EVAL=1  lm_eval --model vllm \
    --model_args pretrained=state-spaces/mamba-130m-hf,enable_prefix_caching=False,tensor_parallel_size=1 \
    --tasks humaneval \
    --batch_size auto \
    --confirm_run_unsafe_code
|  Tasks  |Version|  Filter   |n-shot|Metric|   |Value |   |Stderr|
|---------|------:|-----------|-----:|------|---|-----:|---|-----:|
|humaneval|      1|create_test|     0|pass@1|   |0.0244|±  |0.0121|

This PR now enables this vLLM V1 to work with models that use Mamba1 like Mamba1 and Jamba

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Jul 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces v1-style attention metadata support for Mamba-1 models and refactors the Mamba state shape calculation into a centralized MambaStateShapeCalculator class. The refactoring improves code organization and maintainability. The v1 support is well-integrated, with clear logic separation based on the VLLM_USE_V1 environment variable.

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great job.

Some questions:

  1. Do we need enforce-eager to run mamba1? I'm OK with supporting cuda graph in a future PR.
  2. Can you show some lm-eval result on mamba1 model?
  3. Please update tests/models/language/generation/test_hybrid.py and tests/v1/test_oracle.py
  4. Is Jamba supported now?
  5. Please update the doc like https://docs.vllm.ai/en/latest/usage/v1_guide.html#mamba-models and https://docs.vllm.ai/en/latest/usage/v1_guide.html#mamba-models

Copy link

mergify bot commented Jul 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment on lines 29 to 38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How different is the Mamba1AttentionMetdata to the Mamba2AttentionMetadata? Do we really need two separate classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think we should keep both since Mamba2AttentionMetadata contains fields that aren't relevant for mamba1 like chunk_indices, chunk_offsets and triton kernels related fields that mamba1 doesn't need which would add overhead to the class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep mamba1 and mamba2 as seperate metadata classes. Comparing with one metadata class with many optional entry and branches for different types of layers, I prefer this pluggable design. We can extract common logic to a parent class if we find some after more models like minimax are added. @tdoublep WDYT?

Copy link
Member

@tdoublep tdoublep Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 Sorry I missed this question. Yes I agree let's keep the metadata classes separate and then factor the common things into a CommonMambaAttentionMetadata or something once it is clear what is truly common. This one is ready and we have the LFM2 and MiniMax-Text ones nearly there too, so we should be able to look at that soon.

Comment on lines 29 to 38
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep mamba1 and mamba2 as seperate metadata classes. Comparing with one metadata class with many optional entry and branches for different types of layers, I prefer this pluggable design. We can extract common logic to a parent class if we find some after more models like minimax are added. @tdoublep WDYT?

Copy link

mergify bot commented Jul 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link

mergify bot commented Jul 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Josephasafg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 29, 2025
Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few (mostly minor) comments but this look nearly ready to go in my view. Great work.

)

return conv_state_shape, temporal_state_shape
class MambaStateShapeCalculator:
Copy link
Member

@tdoublep tdoublep Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real reason to introduce class MambaStateShapeCalculator? Couldn't these just be different utils functions? It creates a lot of diff in the other files for little benefit as far as I can see. Right now it is making the PR look more intrusive than it really is (with 20 files changed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added the class was because I needed to add a new function to calculate mamba1 state shape. The file already had a get_mamba_state_shape function but it was mamba2 only, and I didn't want to introduce branching logic within it to handle both architectures.

I considered loose utility functions like get_mamba1_state_shape, get_mamba2_state_shape, but the class provides clearer grouping since:

  1. These functions are conceptually related (all calculate Mamba state shapes)
  2. It makes the API more discoverable - you know all state shape calculations are in one place

what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable. My main argument against was that the change touches a lot of files, but we would have to change the function name to mamba2 anyway which would create similar level of diff.

Signed-off-by: asafg <[email protected]>
Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for great work on this PR. Debugging the kernel-level stuff must have not so straightforward. Great that it works now and only needs to slightly extend the abstractions that were put in place for mamba2.

)

return conv_state_shape, temporal_state_shape
class MambaStateShapeCalculator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable. My main argument against was that the change touches a lot of files, but we would have to change the function name to mamba2 anyway which would create similar level of diff.

Comment on lines +486 to +488
params.ssm_states_batch_stride = ssm_states.stride(0);
params.ssm_states_dim_stride = ssm_states.stride(1);
params.ssm_states_dstate_stride = ssm_states.stride(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be pulled out of the if/else?

@tlrmchlsmth
Copy link
Member

PR looks good. Thanks for adding Jamba and Mamba1 to V1!

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 6, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) August 6, 2025 15:17
@tdoublep
Copy link
Member

tdoublep commented Aug 6, 2025

V1 test failure is known flaky test (see #22385), and the quantization test and blackwell tests look unrelated. This one is good to merge imo.

@simon-mo simon-mo disabled auto-merge August 7, 2025 00:03
@simon-mo simon-mo merged commit 46a1394 into vllm-project:main Aug 7, 2025
79 of 84 checks passed
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: asafg <[email protected]>
Co-authored-by: asafg <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: asafg <[email protected]>
Co-authored-by: asafg <[email protected]>
Signed-off-by: Noam Gat <[email protected]>
wuhang2014 pushed a commit to wuhang2014/vllm that referenced this pull request Aug 12, 2025
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: asafg <[email protected]>
Co-authored-by: asafg <[email protected]>
Signed-off-by: Diego-Castan <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: asafg <[email protected]>
Co-authored-by: asafg <[email protected]>
Signed-off-by: Xiao Yu <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants