Skip to content

Misc. bug: "response_format" on the OpenAI compatible "v1/chat/completions" issue #11847

@tulang3587

Description

@tulang3587

Name and Version

>llama-server --version
version: 4689 (90e4dba4)
built with MSVC 19.42.34436.0 for x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m Hermes-3-Llama-3.1-8B.Q4_K_M.gguf -a hermes-3-llama-3.1-8b --port 1234 --jinja -fa

Problem description & steps to reproduce

Using "response_format" to get the structured output doesn't seem to work properly when using the OpenAI compatible "v1/chat/completions" API.
It keeps returning the "Either "json_schema" or "grammar" can be specified, but not both" error message.

I've tried using several different models from HF, and this issue happens no matter which model I loaded.
The model that I used in the below samples are this one https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B

Request:

curl --location 'http://localhost:1234/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Cookie: frontend_lang=en_US' \
--data '{
    "model": "hermes-3-llama-3.1-8b",
    "messages": [
        {
            "role": "user",
            "content": "hello"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "chat_response",
            "strict": true,
            "schema": {
                "type": "object",
                "properties": {
                    "response": {
                        "type": "string"
                    }
                },
                "required": [
                    "response"
                ],
                "additionalProperties": false
            }
        }
    }
}'

Response:

{
    "error": {
        "code": 400,
        "message": "Either \"json_schema\" or \"grammar\" can be specified, but not both",
        "type": "invalid_request_error"
    }
}

I've tried changing the response_format with various values like below but it keeps returning that same error.

"response_format": {
    "type": "json_schema", // either "json_schema" or "json_object" shows the same error
    "json_schema": {
        "name": "chat_response",
        "strict": true,
        "schema": {
            "type": "object",
            "properties": {
                "response": {
                    "type": "string"
                }
            },
            "required": [
                "response"
            ],
            "additionalProperties": false
        }
    }
}
"response_format": {
    "type": "json_schema", // either "json_schema" or "json_object" shows the same error
    "schema": {
        "type": "object",
        "properties": {
            "response": {
                "type": "string"
            }
        },
        "required": [
            "response"
        ],
        "additionalProperties": false
    }
}

Even using the one in the documentation ({"type": "json_object"}) returns the same error:

{
    "model": "hermes-3-llama-3.1-8b",
    "messages": [
        {
            "role": "user",
            "content": "hello"
        }
    ],
    "response_format": {"type": "json_object"}
}

To add, I tried using the POST /completion API and using the same GGUF model it's capable of returning using the defined JSON schema:

Request:

curl --location 'http://localhost:1234/completions' \
--header 'Content-Type: application/json' \
--header 'Cookie: frontend_lang=en_US' \
--data '{
    "prompt": "<|im_start|>user\nhello<|im_end|>",
    "json_schema": {
        "type": "object",
        "properties": {
            "response": {
                "type": "string"
            }
        },
        "required": [
            "response"
        ],
        "additionalProperties": false
    }
}'

Response:

{
    "index": 0,
    "content": "{\n    \"response\": \"Hello! How can I assist you today?\"\n}",
    "tokens": [],
    "id_slot": 0,
    "stop": true,
    "model": "hermes-3-llama-3.1-8b",
    "tokens_predicted": 17,
    "tokens_evaluated": 6,
    "generation_settings": {
        "n_predict": -1,
        "seed": 4294967295,
        "temperature": 0.800000011920929,
        "dynatemp_range": 0.0,
        "dynatemp_exponent": 1.0,
        "top_k": 40,
        "top_p": 0.949999988079071,
        "min_p": 0.05000000074505806,
        "xtc_probability": 0.0,
        "xtc_threshold": 0.10000000149011612,
        "typical_p": 1.0,
        "repeat_last_n": 64,
        "repeat_penalty": 1.0,
        "presence_penalty": 0.0,
        "frequency_penalty": 0.0,
        "dry_multiplier": 0.0,
        "dry_base": 1.75,
        "dry_allowed_length": 2,
        "dry_penalty_last_n": 4096,
        "dry_sequence_breakers": [
            "\n",
            ":",
            "\"",
            "*"
        ],
        "mirostat": 0,
        "mirostat_tau": 5.0,
        "mirostat_eta": 0.10000000149011612,
        "stop": [],
        "max_tokens": -1,
        "n_keep": 0,
        "n_discard": 0,
        "ignore_eos": false,
        "stream": false,
        "logit_bias": [],
        "n_probs": 0,
        "min_keep": 0,
        "grammar": "char ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\nresponse-kv ::= \"\\\"response\\\"\" space \":\" space string\nroot ::= \"{\" space response-kv \"}\" space\nspace ::= | \" \" | \"\\n\" [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\n",
        "grammar_trigger_words": [],
        "grammar_trigger_tokens": [],
        "preserved_tokens": [],
        "samplers": [
            "penalties",
            "dry",
            "top_k",
            "typ_p",
            "top_p",
            "min_p",
            "xtc",
            "temperature"
        ],
        "speculative.n_max": 16,
        "speculative.n_min": 5,
        "speculative.p_min": 0.8999999761581421,
        "timings_per_token": false,
        "post_sampling_probs": false,
        "lora": []
    },
    "prompt": "<|begin_of_text|><|im_start|>user\nhello<|im_end|>",
    "has_new_line": true,
    "truncated": false,
    "stop_type": "eos",
    "stopping_word": "",
    "tokens_cached": 22,
    "timings": {
        "prompt_n": 6,
        "prompt_ms": 1098.932,
        "prompt_per_token_ms": 183.15533333333335,
        "prompt_per_second": 5.459846469117288,
        "predicted_n": 17,
        "predicted_ms": 7322.017,
        "predicted_per_token_ms": 430.7068823529412,
        "predicted_per_second": 2.3217646175910276
    }
}

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions