Skip to content

Misc. bug: server is always sending usage statistic #16048

@rgerganov

Description

@rgerganov

Name and Version

version: 6497 (cd08fc3)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m models/gemma-3-1b-it-q4_0.gguf

Problem description & steps to reproduce

The server is always sending a special chunk with empty list of chat completion choices that contains usage statistics, e.g.:

...

data: {"choices":[],"created":1758108431,"id":"chatcmpl-Ca3q6j8NlTmm9h34TBbwr3wjgO47kWpz","model":"google/gemma-3-1b-it-qat-q4_0","system_fingerprint":"b6497-cd08fc3e","object":"chat.completion.chunk","usage":{"completion_tokens":31,"prompt_tokens":75,"total_tokens":106},"timings":{"cache_n":0,"prompt_n":75,"prompt_ms":180.025,"prompt_per_token_ms":2.400333333333333,"prompt_per_second":416.60880433273155,"predicted_n":31,"predicted_ms":875.429,"predicted_per_token_ms":28.239645161290323,"predicted_per_second":35.41120981827196}}

data: [DONE]

This was introduced with PR #15444 which makes the server to always send them at the end. While the spec says they should be sent if "stream_options": {"include_usage": true} is set in the request.

We should change the server to send stats only when user request them, not always.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions