Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Aug 24, 2025

cont #13388

This PR improves the performance of MUL_MAT_ID and reduces memory usage.

  • Faster kernel_mul_mm_id_map0 kernel for preparing expert id maps
  • Avoid intermediate buffers src1 and dst
scripts/compare-commits.sh master gg/metal-mmid-opt llama-bench -m ./models/gpt-oss-20b/ggml-model-mxfp4.gguf -m ./models/gpt-oss-120b/ggml-model-mxfp4.gguf -m ./models/qwen3-30b-a3b-coder/ggml-model-q4_0 -m models/qwen3-30b-a3b-coder/ggml-model-q8_0 -m ./models/deepseek-v2-lite-chat/ggml-model-q4_k.gguf -m models/deepseek-v2-lite-chat/ggml-model-q8_0.gguf -m models/nomic-embed-text-v2-moe/ggml-model-f16.gguf -fa 1 -t 1 -ub 2048 -p 512,1024,2048 -n 0

M2 Ultra:

Model Test t/s master t/s gg/metal-mmid-opt Speedup
deepseek2 16B Q4_K_M pp512 2193.10 2619.13 1.19
deepseek2 16B Q4_K_M pp1024 2426.55 2957.49 1.22
deepseek2 16B Q4_K_M pp2048 2469.51 3055.70 1.24
deepseek2 16B Q8_0 pp512 2390.00 2921.00 1.22
deepseek2 16B Q8_0 pp1024 2632.12 3284.46 1.25
deepseek2 16B Q8_0 pp2048 2664.82 3359.66 1.26
gpt-oss 120B MXFP4 MoE pp512 1040.39 1210.99 1.16
gpt-oss 120B MXFP4 MoE pp1024 1220.41 1457.68 1.19
gpt-oss 120B MXFP4 MoE pp2048 1260.48 1594.48 1.26
gpt-oss 20B MXFP4 MoE pp512 1989.54 2354.88 1.18
gpt-oss 20B MXFP4 MoE pp1024 2161.20 2593.10 1.20
gpt-oss 20B MXFP4 MoE pp2048 2188.62 2635.97 1.20
qwen3moe 30B.A3B Q4_0 pp512 1420.95 2064.90 1.45
qwen3moe 30B.A3B Q4_0 pp1024 1547.53 2319.13 1.50
qwen3moe 30B.A3B Q4_0 pp2048 1448.73 2282.56 1.58
qwen3moe 30B.A3B Q8_0 pp512 1412.15 2046.45 1.45
qwen3moe 30B.A3B Q8_0 pp1024 1539.23 2304.05 1.50
qwen3moe 30B.A3B Q8_0 pp2048 1439.45 2274.05 1.58
nomic-bert-moe 475M F16 pp512 24385.12 28604.40 1.17
nomic-bert-moe 475M F16 pp1024 25444.10 31073.90 1.22
nomic-bert-moe 475M F16 pp2048 23050.69 28660.55 1.24

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Aug 24, 2025
@ggerganov ggerganov force-pushed the gg/metal-mmid-opt branch 3 times, most recently from 5e15f0b to 80aa73a Compare August 25, 2025 10:57
@ggerganov ggerganov merged commit 1d8d83d into master Aug 26, 2025
53 of 57 checks passed
@ggerganov ggerganov deleted the gg/metal-mmid-opt branch August 26, 2025 09:46
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
* metal : mul_mm_id remove hdst

* metal : remove mul_mm_id hsrc1

* metal : mul_mm_id simplify + add test

* metal : opt mul_mm_id map0

* metal : optimize mul_mm_id id gathering

* metal : mul/div opt

* metal : optimize mul_mm_id_map0

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant