context : restore preemptive sched reset when LLAMA_SET_ROWS=0 #14870

ggerganov · 2025-07-25T08:05:52Z

ggml-ci

slaren · 2025-07-25T09:36:37Z

Model	Test	t/s master	t/s gg/sched-reset-preemt	Speedup
llama 7B Q4_0	pp512	5676.37	5614.36	0.99
llama 7B Q4_0	tg128	123.60	148.16	1.20

…org#14870) ggml-ci

* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)

context : restore preemptive sched reset when LLAMA_SET_ROWS=0

f670c91

ggml-ci

ggerganov requested a review from slaren July 25, 2025 08:05

ggerganov mentioned this pull request Jul 25, 2025

Performance regression with multiple GPUs in commit 01612b7 #14863

Closed

slaren approved these changes Jul 25, 2025

View reviewed changes

ggerganov merged commit c1dbea7 into master Jul 25, 2025
54 of 55 checks passed

ggerganov deleted the gg/sched-reset-preemt branch July 25, 2025 11:28

taronaeo pushed a commit to taronaeo/llama.cpp-s390x that referenced this pull request Jul 25, 2025

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-…

a6357ac

…org#14870) ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 #14870

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 #14870

Uh oh!

ggerganov commented Jul 25, 2025

Uh oh!

slaren commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 #14870

context : restore preemptive sched reset when LLAMA_SET_ROWS=0 #14870

Uh oh!

Conversation

ggerganov commented Jul 25, 2025

Uh oh!

slaren commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!