Skip to content

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

@jacobthebanana

Description

@jacobthebanana

#2762 Provides a great way to improve efficiency when multiple requests share the same prefix through KV-cache reuse. Nevertheless, the user probably does not want to share KV-cache across two different LoRA adapters since the values would most likely be different.

As the test cases in PR #3263 suggest, the code changes in #2762 might require a bit more work to distinguish between blocks from different LoRA adapters. Previously, #1804 avoided this conflict by including adapter_id in the tuple while generating hashes for prefixes. (source). The fix proposed in #3263 drew inspiration from this approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions