Skip to content

Conversation

slekkala1
Copy link
Contributor

@slekkala1 slekkala1 commented Sep 9, 2025

What does this PR do?

Fix fireworks chat completion broken due to telemetry expecting response.usage
Closes #3391

Test Plan

  1. uv run --with llama-stack llama stack build --distro starter --image-type venv --run
    Try
curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
{"id":"chatcmpl-ee922a08-0df0-4974-b0d3-b322113e8bc0","choices":[{"message":{"role":"assistant","content":"Hello! How can I assist you today?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1757456375,"model":"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"}%   

Without fix fails as mentioned in #3391

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 9, 2025
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@slekkala1 slekkala1 merged commit 935b8e2 into main Sep 10, 2025
22 checks passed
@slekkala1 slekkala1 deleted the fix-fireworks branch September 10, 2025 15:48
@mattf
Copy link
Collaborator

mattf commented Sep 10, 2025

$ curl https://api.fireworks.ai/inference/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ... \
-d '{
  "model": "accounts/fireworks/models/kimi-k2-instruct-0905",                  
  "messages": [{
      "role": "user",
      "content": "Explain the importance of fast language models"
  }]
}' | jq
{
  "id": "c07ea231-a59d-4828-a169-a8e4243f907f",
  "object": "chat.completion",
  "created": 1757520350,
  "model": "accounts/fireworks/models/kimi-k2-instruct-0905",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "total_tokens": 727,
    "completion_tokens": 704
  }
}

the fireworks endpoint returns usage information. before making a project wide change, especially one that may result in quietly inconsistent results, a fix must be attempted for the fireworks provider.

@raghotham @franciscojavierarceo i recommend reverting and proceeding w/ a fix in the fireworks provider.

@franciscojavierarceo
Copy link
Collaborator

@mattf I've created a revert PR here: #3402

@slekkala1
Copy link
Contributor Author

slekkala1 commented Sep 10, 2025

$ curl https://api.fireworks.ai/inference/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ... \
-d '{
  "model": "accounts/fireworks/models/kimi-k2-instruct-0905",                  
  "messages": [{
      "role": "user",
      "content": "Explain the importance of fast language models"
  }]
}' | jq
{
  "id": "c07ea231-a59d-4828-a169-a8e4243f907f",
  "object": "chat.completion",
  "created": 1757520350,
  "model": "accounts/fireworks/models/kimi-k2-instruct-0905",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "total_tokens": 727,
    "completion_tokens": 704
  }
}

the fireworks endpoint returns usage information. before making a project wide change, especially one that may result in quietly inconsistent results, a fix must be attempted for the fireworks provider.

@raghotham @franciscojavierarceo i recommend reverting and proceeding w/ a fix in the fireworks provider.

@mattf Thanks for the suggestion!

Well, it didnt return usage in my test with fireworks provider. (May be I miss something in provider impl, I can have a second look)

Would it ok for the api to be broken because the telemetry depends on response to have certain fields?

@mattf
Copy link
Collaborator

mattf commented Sep 11, 2025

$ curl https://api.fireworks.ai/inference/v1/chat/completions -s
...
the fireworks endpoint returns usage information. before making a project wide change, especially one that may result in quietly inconsistent results, a fix must be attempted for the fireworks provider.
@raghotham @franciscojavierarceo i recommend reverting and proceeding w/ a fix in the fireworks provider.

@mattf Thanks for the suggestion!

Well, it didnt return usage in my test with fireworks provider. (May be I miss something in provider impl, I can have a second look)

Would it ok for the api to be broken because the telemetry depends on response to have certain fields?

api.fireworks.ai returns the info, if it doesn't get propagated then check the fireworks provider.

iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
# What does this PR do?
Fix fireworks chat completion broken due to telemetry expecting
response.usage
 Closes llamastack#3391

## Test Plan
1. `uv run --with llama-stack llama stack build --distro starter
--image-type venv --run`
Try 

```
curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
```
```
{"id":"chatcmpl-ee922a08-0df0-4974-b0d3-b322113e8bc0","choices":[{"message":{"role":"assistant","content":"Hello! How can I assist you today?","name":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","created":1757456375,"model":"fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"}%   
```

Without fix fails as mentioned in
llamastack#3391

Co-authored-by: Francisco Arceo <[email protected]>
iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fireworks model chat completion broken with telemetry
4 participants