Skip to content

Conversation

danbev
Copy link
Member

@danbev danbev commented May 22, 2024

This commit adds two new functions to the llama API. The functions can be used to get the number of threads used for generating a single token and the number of threads used for prompt and batch processing (multiple tokens).

The motivation for this is that we want to be able to get the number of threads that the a context is using. The main use case is for a testing/verification that the number of threads is set correctly.

@mofosyne mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 22, 2024
@github-actions github-actions bot added the devops improvements to build systems and github actions label May 22, 2024
danbev added 2 commits May 22, 2024 20:33
This commit adds two new functions to the llama API. The functions
can be used to get the number of threads used for generating a single
token and the number of threads used for prompt and batch processing
(multiple tokens).

The motivation for this is that we want to be able to get the number of
threads that the a context is using. The main use case is for a
testing/verification that the number of threads is set correctly.

Signed-off-by: Daniel Bevenius <[email protected]>
Rename the getters to llama_n_threads and llama_n_threads_batch.

Signed-off-by: Daniel Bevenius <[email protected]>
@danbev danbev force-pushed the n_threads_getter branch from 425465c to 43bcb50 Compare May 22, 2024 18:34
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 523 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8957.82ms p(95)=22271.11ms fails=, finish reason: stop=463 truncated=60
  • Prompt processing (pp): avg=107.41tk/s p(95)=498.2tk/s
  • Token generation (tg): avg=45.14tk/s p(95)=48.67tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=n_threads_getter commit=43bcb50f13abb9b20a0df6e718d3edf47f5f24e9

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 607.2, 607.2, 607.2, 607.2, 607.2, 706.91, 706.91, 706.91, 706.91, 706.91, 729.74, 729.74, 729.74, 729.74, 729.74, 787.91, 787.91, 787.91, 787.91, 787.91, 805.57, 805.57, 805.57, 805.57, 805.57, 802.3, 802.3, 802.3, 802.3, 802.3, 818.5, 818.5, 818.5, 818.5, 818.5, 830.44, 830.44, 830.44, 830.44, 830.44, 844.34, 844.34, 844.34, 844.34, 844.34, 842.57, 842.57, 842.57, 842.57, 842.57, 862.07, 862.07, 862.07, 862.07, 862.07, 868.13, 868.13, 868.13, 868.13, 868.13, 880.33, 880.33, 880.33, 880.33, 880.33, 896.66, 896.66, 896.66, 896.66, 896.66, 908.17, 908.17, 908.17, 908.17, 908.17, 912.08, 912.08, 912.08, 912.08, 912.08, 913.92, 913.92, 913.92, 913.92, 913.92, 911.95, 911.95, 911.95, 911.95, 911.95, 915.73, 915.73, 915.73, 915.73, 915.73, 920.36, 920.36, 920.36, 920.36, 920.36, 917.02, 917.02, 917.02, 917.02, 917.02, 918.18, 918.18, 918.18, 918.18, 918.18, 918.86, 918.86, 918.86, 918.86, 918.86, 911.2, 911.2, 911.2, 911.2, 911.2, 884.68, 884.68, 884.68, 884.68, 884.68, 886.51, 886.51, 886.51, 886.51, 886.51, 895.1, 895.1, 895.1, 895.1, 895.1, 898.33, 898.33, 898.33, 898.33, 898.33, 897.58, 897.58, 897.58, 897.58, 897.58, 898.13, 898.13, 898.13, 898.13, 898.13, 896.95, 896.95, 896.95, 896.95, 896.95, 894.62, 894.62, 894.62, 894.62, 894.62, 894.03, 894.03, 894.03, 894.03, 894.03, 891.33, 891.33, 891.33, 891.33, 891.33, 886.71, 886.71, 886.71, 886.71, 886.71, 885.74, 885.74, 885.74, 885.74, 885.74, 873.38, 873.38, 873.38, 873.38, 873.38, 871.41, 871.41, 871.41, 871.41, 871.41, 871.67, 871.67, 871.67, 871.67, 871.67, 872.84, 872.84, 872.84, 872.84, 872.84, 872.57, 872.57, 872.57, 872.57, 872.57, 872.42, 872.42, 872.42, 872.42, 872.42, 823.15, 823.15, 823.15, 823.15, 823.15, 823.43, 823.43, 823.43, 823.43, 823.43, 823.09, 823.09, 823.09, 823.09, 823.09, 822.54, 822.54, 822.54, 822.54, 822.54, 821.44, 821.44, 821.44, 821.44, 821.44, 826.2, 826.2, 826.2, 826.2, 826.2, 823.74, 823.74, 823.74, 823.74, 823.74, 822.75, 822.75, 822.75, 822.75, 822.75, 828.07, 828.07, 828.07, 828.07, 828.07, 827.64, 827.64, 827.64, 827.64, 827.64, 829.6, 829.6, 829.6, 829.6, 829.6, 830.99, 830.99, 830.99, 830.99, 830.99, 832.44, 832.44, 832.44, 832.44, 832.44, 836.34, 836.34, 836.34, 836.34, 836.34, 832.05, 832.05, 832.05, 832.05, 832.05, 831.91, 831.91, 831.91, 831.91, 831.91, 831.36, 831.36, 831.36, 831.36, 831.36, 830.71, 830.71, 830.71, 830.71, 830.71, 830.57, 830.57]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 50.24, 50.24, 50.24, 50.24, 50.24, 23.81, 23.81, 23.81, 23.81, 23.81, 26.62, 26.62, 26.62, 26.62, 26.62, 28.03, 28.03, 28.03, 28.03, 28.03, 28.55, 28.55, 28.55, 28.55, 28.55, 29.99, 29.99, 29.99, 29.99, 29.99, 31.77, 31.77, 31.77, 31.77, 31.77, 32.28, 32.28, 32.28, 32.28, 32.28, 32.48, 32.48, 32.48, 32.48, 32.48, 32.33, 32.33, 32.33, 32.33, 32.33, 32.27, 32.27, 32.27, 32.27, 32.27, 31.68, 31.68, 31.68, 31.68, 31.68, 31.29, 31.29, 31.29, 31.29, 31.29, 31.23, 31.23, 31.23, 31.23, 31.23, 29.98, 29.98, 29.98, 29.98, 29.98, 29.56, 29.56, 29.56, 29.56, 29.56, 28.63, 28.63, 28.63, 28.63, 28.63, 28.75, 28.75, 28.75, 28.75, 28.75, 28.95, 28.95, 28.95, 28.95, 28.95, 28.69, 28.69, 28.69, 28.69, 28.69, 28.46, 28.46, 28.46, 28.46, 28.46, 28.3, 28.3, 28.3, 28.3, 28.3, 28.59, 28.59, 28.59, 28.59, 28.59, 28.73, 28.73, 28.73, 28.73, 28.73, 28.56, 28.56, 28.56, 28.56, 28.56, 28.94, 28.94, 28.94, 28.94, 28.94, 28.99, 28.99, 28.99, 28.99, 28.99, 28.92, 28.92, 28.92, 28.92, 28.92, 29.09, 29.09, 29.09, 29.09, 29.09, 29.17, 29.17, 29.17, 29.17, 29.17, 29.56, 29.56, 29.56, 29.56, 29.56, 29.65, 29.65, 29.65, 29.65, 29.65, 29.99, 29.99, 29.99, 29.99, 29.99, 30.07, 30.07, 30.07, 30.07, 30.07, 29.98, 29.98, 29.98, 29.98, 29.98, 29.79, 29.79, 29.79, 29.79, 29.79, 29.74, 29.74, 29.74, 29.74, 29.74, 29.32, 29.32, 29.32, 29.32, 29.32, 29.39, 29.39, 29.39, 29.39, 29.39, 29.55, 29.55, 29.55, 29.55, 29.55, 29.69, 29.69, 29.69, 29.69, 29.69, 29.9, 29.9, 29.9, 29.9, 29.9, 29.93, 29.93, 29.93, 29.93, 29.93, 29.63, 29.63, 29.63, 29.63, 29.63, 29.42, 29.42, 29.42, 29.42, 29.42, 29.37, 29.37, 29.37, 29.37, 29.37, 28.38, 28.38, 28.38, 28.38, 28.38, 28.34, 28.34, 28.34, 28.34, 28.34, 28.31, 28.31, 28.31, 28.31, 28.31, 28.33, 28.33, 28.33, 28.33, 28.33, 28.37, 28.37, 28.37, 28.37, 28.37, 28.39, 28.39, 28.39, 28.39, 28.39, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.42, 28.26, 28.26, 28.26, 28.26, 28.26, 28.21, 28.21, 28.21, 28.21, 28.21, 28.18, 28.18, 28.18, 28.18, 28.18, 28.19, 28.19, 28.19, 28.19, 28.19, 28.26, 28.26, 28.26, 28.26, 28.26, 28.34, 28.34, 28.34, 28.34, 28.34, 28.44, 28.44]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.25, 0.25, 0.25, 0.25, 0.35, 0.35, 0.35, 0.35, 0.35, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.2, 0.2, 0.2, 0.2, 0.2, 0.33, 0.33, 0.33, 0.33, 0.33, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.34, 0.34, 0.34, 0.34, 0.34, 0.38, 0.38, 0.38, 0.38, 0.38, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.33, 0.33, 0.33, 0.33, 0.33, 0.41, 0.41, 0.41, 0.41, 0.41, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.06, 0.06, 0.06, 0.06, 0.06, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.33, 0.33, 0.33, 0.33, 0.33, 0.39, 0.39, 0.39, 0.39, 0.39, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.44, 0.44, 0.44, 0.44, 0.44, 0.46, 0.46, 0.46, 0.46, 0.46, 0.5, 0.5, 0.5, 0.5, 0.5, 0.56, 0.56, 0.56, 0.56, 0.56, 0.17, 0.17, 0.17, 0.17, 0.17, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.24, 0.24, 0.24, 0.24, 0.24, 0.25, 0.25, 0.25, 0.25, 0.25, 0.32, 0.32, 0.32, 0.32, 0.32, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 523 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716406800 --> 1716407432
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
                    
Loading

@ggerganov ggerganov merged commit 3015851 into ggml-org:master May 23, 2024
@danbev danbev deleted the n_threads_getter branch August 13, 2025 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants