llama : add getters for n_threads/n_threads_batch #7464

danbev · 2024-05-22T13:21:34Z

This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens).

The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly.

llama.h

This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens). The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly. Signed-off-by: Daniel Bevenius <[email protected]>

Rename the getters to llama_n_threads and llama_n_threads_batch. Signed-off-by: Daniel Bevenius <[email protected]>

github-actions · 2024-05-22T19:50:37Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 523 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8957.82ms p(95)=22271.11ms fails=, finish reason: stop=463 truncated=60
Prompt processing (pp): avg=107.41tk/s p(95)=498.2tk/s
Token generation (tg): avg=45.14tk/s p(95)=48.67tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=n_threads_getter commit=43bcb50f13abb9b20a0df6e718d3edf47f5f24e9

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 607.2, 607.2, 607.2, 607.2, 607.2, 706.91, 706.91, 706.91, 706.91, 706.91, 729.74, 729.74, 729.74, 729.74, 729.74, 787.91, 787.91, 787.91, 787.91, 787.91, 805.57, 805.57, 805.57, 805.57, 805.57, 802.3, 802.3, 802.3, 802.3, 802.3, 818.5, 818.5, 818.5, 818.5, 818.5, 830.44, 830.44, 830.44, 830.44, 830.44, 844.34, 844.34, 844.34, 844.34, 844.34, 842.57, 842.57, 842.57, 842.57, 842.57, 862.07, 862.07, 862.07, 862.07, 862.07, 868.13, 868.13, 868.13, 868.13, 868.13, 880.33, 880.33, 880.33, 880.33, 880.33, 896.66, 896.66, 896.66, 896.66, 896.66, 908.17, 908.17, 908.17, 908.17, 908.17, 912.08, 912.08, 912.08, 912.08, 912.08, 913.92, 913.92, 913.92, 913.92, 913.92, 911.95, 911.95, 911.95, 911.95, 911.95, 915.73, 915.73, 915.73, 915.73, 915.73, 920.36, 920.36, 920.36, 920.36, 920.36, 917.02, 917.02, 917.02, 917.02, 917.02, 918.18, 918.18, 918.18, 918.18, 918.18, 918.86, 918.86, 918.86, 918.86, 918.86, 911.2, 911.2, 911.2, 911.2, 911.2, 884.68, 884.68, 884.68, 884.68, 884.68, 886.51, 886.51, 886.51, 886.51, 886.51, 895.1, 895.1, 895.1, 895.1, 895.1, 898.33, 898.33, 898.33, 898.33, 898.33, 897.58, 897.58, 897.58, 897.58, 897.58, 898.13, 898.13, 898.13, 898.13, 898.13, 896.95, 896.95, 896.95, 896.95, 896.95, 894.62, 894.62, 894.62, 894.62, 894.62, 894.03, 894.03, 894.03, 894.03, 894.03, 891.33, 891.33, 891.33, 891.33, 891.33, 886.71, 886.71, 886.71, 886.71, 886.71, 885.74, 885.74, 885.74, 885.74, 885.74, 873.38, 873.38, 873.38, 873.38, 873.38, 871.41, 871.41, 871.41, 871.41, 871.41, 871.67, 871.67, 871.67, 871.67, 871.67, 872.84, 872.84, 872.84, 872.84, 872.84, 872.57, 872.57, 872.57, 872.57, 872.57, 872.42, 872.42, 872.42, 872.42, 872.42, 823.15, 823.15, 823.15, 823.15, 823.15, 823.43, 823.43, 823.43, 823.43, 823.43, 823.09, 823.09, 823.09, 823.09, 823.09, 822.54, 822.54, 822.54, 822.54, 822.54, 821.44, 821.44, 821.44, 821.44, 821.44, 826.2, 826.2, 826.2, 826.2, 826.2, 823.74, 823.74, 823.74, 823.74, 823.74, 822.75, 822.75, 822.75, 822.75, 822.75, 828.07, 828.07, 828.07, 828.07, 828.07, 827.64, 827.64, 827.64, 827.64, 827.64, 829.6, 829.6, 829.6, 829.6, 829.6, 830.99, 830.99, 830.99, 830.99, 830.99, 832.44, 832.44, 832.44, 832.44, 832.44, 836.34, 836.34, 836.34, 836.34, 836.34, 832.05, 832.05, 832.05, 832.05, 832.05, 831.91, 831.91, 831.91, 831.91, 831.91, 831.36, 831.36, 831.36, 831.36, 831.36, 830.71, 830.71, 830.71, 830.71, 830.71, 830.57, 830.57]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 50.24, 50.24, 50.24, 50.24, 50.24, 23.81, 23.81, 23.81, 23.81, 23.81, 26.62, 26.62, 26.62, 26.62, 26.62, 28.03, 28.03, 28.03, 28.03, 28.03, 28.55, 28.55, 28.55, 28.55, 28.55, 29.99, 29.99, 29.99, 29.99, 29.99, 31.77, 31.77, 31.77, 31.77, 31.77, 32.28, 32.28, 32.28, 32.28, 32.28, 32.48, 32.48, 32.48, 32.48, 32.48, 32.33, 32.33, 32.33, 32.33, 32.33, 32.27, 32.27, 32.27, 32.27, 32.27, 31.68, 31.68, 31.68, 31.68, 31.68, 31.29, 31.29, 31.29, 31.29, 31.29, 31.23, 31.23, 31.23, 31.23, 31.23, 29.98, 29.98, 29.98, 29.98, 29.98, 29.56, 29.56, 29.56, 29.56, 29.56, 28.63, 28.63, 28.63, 28.63, 28.63, 28.75, 28.75, 28.75, 28.75, 28.75, 28.95, 28.95, 28.95, 28.95, 28.95, 28.69, 28.69, 28.69, 28.69, 28.69, 28.46, 28.46, 28.46, 28.46, 28.46, 28.3, 28.3, 28.3, 28.3, 28.3, 28.59, 28.59, 28.59, 28.59, 28.59, 28.73, 28.73, 28.73, 28.73, 28.73, 28.56, 28.56, 28.56, 28.56, 28.56, 28.94, 28.94, 28.94, 28.94, 28.94, 28.99, 28.99, 28.99, 28.99, 28.99, 28.92, 28.92, 28.92, 28.92, 28.92, 29.09, 29.09, 29.09, 29.09, 29.09, 29.17, 29.17, 29.17, 29.17, 29.17, 29.56, 29.56, 29.56, 29.56, 29.56, 29.65, 29.65, 29.65, 29.65, 29.65, 29.99, 29.99, 29.99, 29.99, 29.99, 30.07, 30.07, 30.07, 30.07, 30.07, 29.98, 29.98, 29.98, 29.98, 29.98, 29.79, 29.79, 29.79, 29.79, 29.79, 29.74, 29.74, 29.74, 29.74, 29.74, 29.32, 29.32, 29.32, 29.32, 29.32, 29.39, 29.39, 29.39, 29.39, 29.39, 29.55, 29.55, 29.55, 29.55, 29.55, 29.69, 29.69, 29.69, 29.69, 29.69, 29.9, 29.9, 29.9, 29.9, 29.9, 29.93, 29.93, 29.93, 29.93, 29.93, 29.63, 29.63, 29.63, 29.63, 29.63, 29.42, 29.42, 29.42, 29.42, 29.42, 29.37, 29.37, 29.37, 29.37, 29.37, 28.38, 28.38, 28.38, 28.38, 28.38, 28.34, 28.34, 28.34, 28.34, 28.34, 28.31, 28.31, 28.31, 28.31, 28.31, 28.33, 28.33, 28.33, 28.33, 28.33, 28.37, 28.37, 28.37, 28.37, 28.37, 28.39, 28.39, 28.39, 28.39, 28.39, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.26, 28.26, 28.26, 28.26, 28.26, 28.21, 28.21, 28.21, 28.21, 28.21, 28.18, 28.18, 28.18, 28.18, 28.18, 28.19, 28.19, 28.19, 28.19, 28.19, 28.26, 28.26, 28.26, 28.26, 28.26, 28.34, 28.34, 28.34, 28.34, 28.34, 28.44, 28.44]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.33, 0.33, 0.33, 0.33, 0.33, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.34, 0.34, 0.34, 0.34, 0.34, 0.38, 0.38, 0.38, 0.38, 0.38, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.33, 0.33, 0.33, 0.33, 0.33, 0.41, 0.41, 0.41, 0.41, 0.41, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.06, 0.06, 0.06, 0.06, 0.06, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.33, 0.33, 0.33, 0.33, 0.33, 0.39, 0.39, 0.39, 0.39, 0.39, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.44, 0.44, 0.44, 0.44, 0.44, 0.46, 0.46, 0.46, 0.46, 0.46, 0.5, 0.5, 0.5, 0.5, 0.5, 0.56, 0.56, 0.56, 0.56, 0.56, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.25, 0.25, 0.25, 0.25, 0.25, 0.32, 0.32, 0.32, 0.32, 0.32, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

cebtenzzre reviewed May 22, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 22, 2024

github-actions bot added the devops improvements to build systems and github actions label May 22, 2024

danbev added 2 commits May 22, 2024 20:33

squash! llama : add getters for n_threads/n_threads_batch

43bcb50

Rename the getters to llama_n_threads and llama_n_threads_batch. Signed-off-by: Daniel Bevenius <[email protected]>

danbev force-pushed the n_threads_getter branch from 425465c to 43bcb50 Compare May 22, 2024 18:34

slaren approved these changes May 22, 2024

View reviewed changes

ggerganov merged commit 3015851 into ggml-org:master May 23, 2024

danbev deleted the n_threads_getter branch August 13, 2025 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add getters for n_threads/n_threads_batch #7464

llama : add getters for n_threads/n_threads_batch #7464

Uh oh!

danbev commented May 22, 2024

Uh oh!

Uh oh!

github-actions bot commented May 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llama : add getters for n_threads/n_threads_batch #7464

llama : add getters for n_threads/n_threads_batch #7464

Uh oh!

Conversation

danbev commented May 22, 2024

Uh oh!

Uh oh!

github-actions bot commented May 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants