kv-cache : support layer reuse #15504

ggerganov · 2025-08-22T10:49:53Z

The logic for KV cache layer reuse was hacked quickly for the Gemma-3n release. This PR refactors the implementation to provide more generic support for this functionality.

Introduce llama_memory_i::layer_reuse_cb similar to llama_memory_i::layer_filter_cb
Add bool hparams.has_kv(il)
Remove per-model special-casing in llama_kv_cache

ggml-ci

* kv-cache : support layer reuse ggml-ci * cont : update comments [no ci]

ggerganov added 2 commits August 22, 2025 13:48

kv-cache : support layer reuse

d6d5e95

ggml-ci

cont : update comments [no ci]

2c2fbbd

ggerganov mentioned this pull request Aug 22, 2025

kv-cache : remove LLAMA_SET_ROWS checks #15505

Merged

ggerganov merged commit b730706 into master Aug 24, 2025
1 check passed

ggerganov deleted the gg/kv-cache-reuse-layers branch August 24, 2025 10:07

qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 25, 2025

kv-cache : support layer reuse (ggml-org#15504)

0289568

* kv-cache : support layer reuse ggml-ci * cont : update comments [no ci]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv-cache : support layer reuse #15504

kv-cache : support layer reuse #15504

Uh oh!

ggerganov commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kv-cache : support layer reuse #15504

kv-cache : support layer reuse #15504

Uh oh!

Conversation

ggerganov commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant