Misc. bug: Performance degradation after attention sinks merge

### Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 6119 (cd6983d5)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server \
--timeout 3000 \
--n-gpu-layers 999 \
--host 0.0.0.0 \
--port 9999 \
--ctx_size 24576 \
--flash_attn \
--temp 0.60 \
--top_k 20 \
--top_p 0.95 \
--min_p 0 \
--presence_penalty 1.5 \
--no-mmap \
--model /Qwen_Qwen3-30B-A3B-Q5_K_L.gguf
```

### Problem description & steps to reproduce

Token generation performance degraded after building CUDA: attention sinks for mma FlashAttention #15157 
Model: Qwen_Qwen3-30B-A3B-Q5_K_L.gguf

Before: ~120 TPS
After: ~60 TPS

### First Bad Commit

CUDA: attention sinks for mma FlashAttention #15157 

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Performance degradation after attention sinks merge #15174

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Performance degradation after attention sinks merge #15174

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions