server: add exceed_context_size_error type #15780

ngxson · 2025-09-03T23:27:31Z

Prepare for #14839

Added a new error type exceed_context_size_error

When a request exceed slot context size, error response will now include the token count:

{
    "error": {
        "code": 500,
        "message": "the request exceeds the available context size. try increasing the context size or enable context shift",
        "type": "exceed_context_size_error",
        "n_prompt_tokens": 1407,
        "n_ctx": 256
    }
}

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

…g-model-disabled-agent-prefill * origin/master: (84 commits) CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802) tests : add --list-ops and --show-coverage options (ggml-org#15745) gguf: gguf_writer refactor (ggml-org#15691) kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811) model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801) chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639) chat : nemotron thinking & toolcalling support (ggml-org#15676) scripts : add Jinja tester PySide6 simple app (ggml-org#15756) llama : add support for EmbeddingGemma 300m (ggml-org#15798) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) ...

* server: add exceed_context_size_error type * change error code to 400

server: add exceed_context_size_error type

6051d69

ngxson requested review from ggerganov and allozaur September 3, 2025 23:27

github-actions bot added examples python python script changes server labels Sep 3, 2025

change error code to 400

6aaf4f3

ggerganov approved these changes Sep 4, 2025

View reviewed changes

allozaur approved these changes Sep 4, 2025

View reviewed changes

ngxson merged commit a68d914 into ggml-org:master Sep 4, 2025
50 checks passed

walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025

server: add exceed_context_size_error type (ggml-org#15780)

8c1c4ea

* server: add exceed_context_size_error type * change error code to 400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: add exceed_context_size_error type #15780

server: add exceed_context_size_error type #15780

Uh oh!

ngxson commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

server: add exceed_context_size_error type #15780

server: add exceed_context_size_error type #15780

Uh oh!

Conversation

ngxson commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!