-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Closed
Labels
Description
Name and Version
version: 6497 (cd08fc3)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m models/gemma-3-1b-it-q4_0.gguf
Problem description & steps to reproduce
The server is always sending a special chunk with empty list of chat completion choices that contains usage statistics, e.g.:
...
data: {"choices":[],"created":1758108431,"id":"chatcmpl-Ca3q6j8NlTmm9h34TBbwr3wjgO47kWpz","model":"google/gemma-3-1b-it-qat-q4_0","system_fingerprint":"b6497-cd08fc3e","object":"chat.completion.chunk","usage":{"completion_tokens":31,"prompt_tokens":75,"total_tokens":106},"timings":{"cache_n":0,"prompt_n":75,"prompt_ms":180.025,"prompt_per_token_ms":2.400333333333333,"prompt_per_second":416.60880433273155,"predicted_n":31,"predicted_ms":875.429,"predicted_per_token_ms":28.239645161290323,"predicted_per_second":35.41120981827196}}
data: [DONE]
This was introduced with PR #15444 which makes the server to always send them at the end. While the spec says they should be sent if "stream_options": {"include_usage": true}
is set in the request.
We should change the server to send stats only when user request them, not always.
First Bad Commit
No response