Misc. bug: Vulkan backend with 7900XTX has severe performance dropoff at some batch sizes

### Name and Version

[docker@a242c844efbf ~]$ llama-cli-vulkan --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
version: 4384 (14b699ec)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-bench

### Problem description & steps to reproduce

llama-batched-bench-vulkan -m /models/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf -ngl 99 -npp 512 -ntg 128 -npl 1,2,4,8,16 -pps
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
build: 4384 (14b699ec) with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu

main: n_kv_max = 4096, n_batch = 2048, n_ubatch = 512, flash_attn = 0, is_pp_shared = 1, n_gpu_layers = 99, n_threads = 12, n_threads_batch = 12

|    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
|   512 |    128 |    1 |    640 |    1.578 |   324.39 |    3.838 |    33.35 |    5.416 |   118.17 |
|   512 |    128 |    2 |    768 |    1.555 |   329.33 |   31.047 |     8.25 |   32.602 |    23.56 |
|   512 |    128 |    4 |   1024 |    1.570 |   326.11 |   33.209 |    15.42 |   34.779 |    29.44 |
|   512 |    128 |    8 |   1536 |    1.571 |   325.94 |   37.241 |    27.50 |   38.812 |    39.58 |
|   512 |    128 |   16 |   2560 |    1.575 |   325.05 |   28.106 |    72.87 |   29.681 |    86.25 |

I understand scaling at some batch sizes might be less than ideal. But at worst I would expect small regressions if no scaling can be achieved at all (due to overhead of batched processing). Right now, for batch sizes 2 and 4 especially there is a massive performance loss. Can anything be done to improve this situation? Poor batched performance makes speculative decoding on the vulkan backend unusable unfortunately.

### First Bad Commit

_No response_

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Vulkan backend with 7900XTX has severe performance dropoff at some batch sizes #10966

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
512	128	1	640	1.578	324.39	3.838	33.35	5.416	118.17
512	128	2	768	1.555	329.33	31.047	8.25	32.602	23.56
512	128	4	1024	1.570	326.11	33.209	15.42	34.779	29.44
512	128	8	1536	1.571	325.94	37.241	27.50	38.812	39.58
512	128	16	2560	1.575	325.05	28.106	72.87	29.681	86.25

Misc. bug: Vulkan backend with 7900XTX has severe performance dropoff at some batch sizes #10966

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions