[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models #23716

tdoublep · 2025-08-27T06:49:01Z

Purpose

We would like to enable V1 by default for hybrid models (or models based on "mamba" layers, where "mamba" is a stand-in for: mamba2, mamba2, linear_attention or short_conv). However, these models do not yet support prefix caching. This PR will disable prefix caching by default for these models, ensuring that user does not experience crash using default vllm serve ....

This is just a user experience improvement until we enable prefix caching - we are aiming to put up a first PR for this later this week.

cc @heheda12345 @asafgardin

Test Plan

n/a

Test Result

n/a

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Thomas Parnell <[email protected]>

gemini-code-assist

Code Review

This pull request disables prefix caching for Mamba-based and hybrid models to prevent crashes, as this feature is not yet supported for them. This is a valuable user experience improvement. My review includes a suggestion to refine the implementation. Instead of unconditionally disabling the feature, I recommend checking if the user has explicitly enabled it and then issuing a warning before disabling. This approach enhances clarity for the user and aligns better with existing configuration handling practices in the codebase.

vllm/model_executor/models/config.py

Josephasafg · 2025-08-27T06:53:39Z

LGTM

Josephasafg · 2025-08-27T06:57:57Z

@tdoublep Should we also update v1_guide? since the users dont need to disable prefix caching after this change

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep · 2025-08-27T07:02:25Z

@Josephasafg Good catch thanks - I have updated the language accordingly.

Signed-off-by: Thomas Parnell <[email protected]>

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

Disable prefix caching by default for mamba-based models

c633350

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep mentioned this pull request Aug 27, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models #23650

Merged

5 tasks

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/model_executor/models/config.py Show resolved Hide resolved

heheda12345 approved these changes Aug 27, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) August 27, 2025 06:53

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

Update language in docs

bd1c749

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep requested a review from hmellor as a code owner August 27, 2025 07:02

mergify bot added the documentation Improvements or additions to documentation label Aug 27, 2025

Fix markdownlint

9b304ca

Signed-off-by: Thomas Parnell <[email protected]>

heheda12345 merged commit 704432a into vllm-project:main Aug 27, 2025
40 checks passed

tdoublep deleted the mamba-v1-prefix-caching-off branch August 27, 2025 13:03

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-b…

d69b094

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-b…

424f8b2

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-b…

86b3570

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-b…

22d74b8

…ased models (vllm-project#23716) Signed-off-by: Thomas Parnell <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models #23716

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models #23716

Uh oh!

tdoublep commented Aug 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

tdoublep commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models #23716

[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models #23716

Uh oh!

Conversation

tdoublep commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

tdoublep commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

tdoublep commented Aug 27, 2025 •

edited by github-actions bot

Loading