Skip to content

Conversation

ggerganov
Copy link
Member

The logic for KV cache layer reuse was hacked quickly for the Gemma-3n release. This PR refactors the implementation to provide more generic support for this functionality.

  • Introduce llama_memory_i::layer_reuse_cb similar to llama_memory_i::layer_filter_cb
  • Add bool hparams.has_kv(il)
  • Remove per-model special-casing in llama_kv_cache

@ggerganov ggerganov merged commit b730706 into master Aug 24, 2025
1 check passed
@ggerganov ggerganov deleted the gg/kv-cache-reuse-layers branch August 24, 2025 10:07
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 25, 2025
* kv-cache : support layer reuse

ggml-ci

* cont : update comments [no ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant