Misc. bug: `server`: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API

### Name and Version

$./llama-server --version
version: 6210 (a094f381)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.5.0

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
curl -X POST -H "Content-Type: application/json"  http://localhost:8080/v1/chat/completions  \
-d '{
"stream": true,
"stream_options": {"include_usage": true}, 
"model": "LiquidAI/LFM2-1.2",
"messages": [{"role": "user", "content": "What is an interesting example for this GitHub issue?"}]
}'
```

### Problem description & steps to reproduce

The `llama-server` streaming response differs from the OpenAI Streaming API spec. From the [OpenAI API docs on `choices` for completion chunk objects](https://platform.openai.com/docs/api-reference/chat_streaming/streaming#chat_streaming/streaming-choices) (emphasis mine): 
> choices [array]
> A list of chat completion choices. Can contain more than one elements if n is greater than 1. **Can also be empty for the last chunk if you set stream_options: {"include_usage": true}**.

### `llama-server` streaming response
Currently, `usage` (and `timings`) are included in the final `llama-server` `chat.completion.chunk` which contains the singleton `choices` array containing the `stop` finish_reason and an empty `delta` object.

<details> <summary> Example </summary>

```json lines
...
data: {
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "delta": {
        "content": "?"
      }
    }
  ],
  "created": 1755667673,
  "id": "chatcmpl-DwWJriZJ4TnyeMBHO3WmhUJhYUqQg1RP",
  "model": "LiquidAI/LFM2-1.2",
  "system_fingerprint": "b6210-a094f381",
  "object": "chat.completion.chunk"
}
data: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "delta": {}
    }
  ],
  "created": 1755667673,
  "id": "chatcmpl-DwWJriZJ4TnyeMBHO3WmhUJhYUqQg1RP",
  "model": "LiquidAI/LFM2-1.2",
  "system_fingerprint": "b6210-a094f381",
  "object": "chat.completion.chunk",
  "usage": {
    "completion_tokens": 11,
    "prompt_tokens": 14,
    "total_tokens": 25
  },
  "timings": {
    "prompt_n": 14,
    "prompt_ms": 53.734,
    "prompt_per_token_ms": 3.838142857142857,
    "prompt_per_second": 260.5426731678267,
    "predicted_n": 11,
    "predicted_ms": 44.285,
    "predicted_per_token_ms": 4.02590909090909,
    "predicted_per_second": 248.3911030823078
  }
}
data: [DONE]
```
</details>

### `OpenAI` streaming response

`usage` sent in a final chunk with an empty `choices` array after the chunk containing the `finish_reason` 

<details> <summary> Example </summary>

```json lines
data: {
  "id": "chatcmpl-C6XHQgbtRbhg8LRSyYGGXixdAGuZn",
  "object": "chat.completion.chunk",
  "created": 1755673932,
  "model": "gpt-4.1-mini-2025-04-14",
  "service_tier": "default",
  "system_fingerprint": "fp_37c45ea698",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "?"
      },
      "logprobs": null,
      "finish_reason": null
    }
  ],
  "usage": null,
  "obfuscation": "A9QOO2mww"
}
data: {
  "id": "chatcmpl-C6XHQgbtRbhg8LRSyYGGXixdAGuZn",
  "object": "chat.completion.chunk",
  "created": 1755673932,
  "model": "gpt-4.1-mini-2025-04-14",
  "service_tier": "default",
  "system_fingerprint": "fp_37c45ea698",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": null,
  "obfuscation": "dZtx"
}
data: {
  "id": "chatcmpl-C6XHQgbtRbhg8LRSyYGGXixdAGuZn",
  "object": "chat.completion.chunk",
  "created": 1755673932,
  "model": "gpt-4.1-mini-2025-04-14",
  "service_tier": "default",
  "system_fingerprint": "fp_37c45ea698",
  "choices": [],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 9,
    "total_tokens": 28,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "obfuscation": "JjMN5JASlm"
}

data: [DONE]
```
 </details>

### First Bad Commit

I don't think there was a bad commit per-se, this looks to be how it was implemented from the beginning (this took me quite a while to track down). 

a0a08eedb6a23b31d8783bbb91ede583cbe7933a ([lines 2329-2350](https://github.com/ggml-org/llama.cpp/pull/4198/commits/a0a08eedb6a23b31d8783bbb91ede583cbe7933a#diff-87355a1a297a9f0fdc86af5e2a59cae153290f58d68822cd10c30fee4f7f7076R2329-R2350))
- #4198

### Relevant log output

```shell
(from `Problem description & steps to reproduce`)

...
data: {
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "delta": {
        "content": "?"
      }
    }
  ],
  "created": 1755667673,
  "id": "chatcmpl-DwWJriZJ4TnyeMBHO3WmhUJhYUqQg1RP",
  "model": "LiquidAI/LFM2-1.2",
  "system_fingerprint": "b6210-a094f381",
  "object": "chat.completion.chunk"
}
data: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "delta": {}
    }
  ],
  "created": 1755667673,
  "id": "chatcmpl-DwWJriZJ4TnyeMBHO3WmhUJhYUqQg1RP",
  "model": "LiquidAI/LFM2-1.2",
  "system_fingerprint": "b6210-a094f381",
  "object": "chat.completion.chunk",
  "usage": {
    "completion_tokens": 11,
    "prompt_tokens": 14,
    "total_tokens": 25
  },
  "timings": {
    "prompt_n": 14,
    "prompt_ms": 53.734,
    "prompt_per_token_ms": 3.838142857142857,
    "prompt_per_second": 260.5426731678267,
    "predicted_n": 11,
    "predicted_ms": 44.285,
    "predicted_per_token_ms": 4.02590909090909,
    "predicted_per_second": 248.3911030823078
  }
}
data: [DONE]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: `server`: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API #15443

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

`llama-server` streaming response

`OpenAI` streaming response

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: server: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API #15443

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

llama-server streaming response

OpenAI streaming response

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Misc. bug: `server`: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API #15443

`llama-server` streaming response

`OpenAI` streaming response