Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804)

#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different. 

As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. ([source](https://github.com/vllm-project/vllm/blob/9b945daaf1ce03b8b02d68b37c59baf28566b535/vllm/prefix.py#L84)). The fix proposed in #3263 drew inspiration from this approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions