### Name and Version 716bd6dec3e044e5c325386b5b0483392b24cefe bisected ### Operating systems Linux ### GGML backends Vulkan ### Hardware amdgpu 8g ### Models Qwen2.5-Coder-14B-Instruct-Q4_K_M or any model with similar size ### Problem description & steps to reproduce on c250ecb3157f3bae0a45f44c3c953b5414d4c2f7 . the weight part of model can fit into vram. left only context/kv cache on gtt. memory usage is 8166m vram + 2271m gtt. but on 716bd6dec3e044e5c325386b5b0483392b24cefe . memory usage is 6342m vram + 4107m gtt. significantly slowed down the tg speed. ### First Bad Commit 716bd6dec3e044e5c325386b5b0483392b24cefe ### Relevant log output ```shell no difference on log output ```