Skip to content

Conversation

slekkala1
Copy link
Contributor

@slekkala1 slekkala1 commented Sep 11, 2025

What does this PR do?

Tried to fix earlier, however since fireworks api returns usage in response, this propagation can fix the end to end response to have the usage
Context in #3392
Closes #3391

Test Plan

Build-run server then test with curl

(llama-stack) (base) swapna942@swapna942-mac llama-stack % curl -X POST http://localhost:8321/v1/openai/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "X-LlamaStack-Provider-Data: {\"fireworks_api_key\": \"$FIREWORKS_API_KEY\"}" \
      -d '{
        "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
        "messages": [{"role": "user", "content": "Hello!"}]
      }' | jq
{
  "metrics": [
    {
      "metric": "prompt_tokens",
      "value": 35,
      "unit": "tokens"
    },
    {
      "metric": "completion_tokens",
      "value": 10,
      "unit": "tokens"
    },
    {
      "metric": "total_tokens",
      "value": 45,
      "unit": "tokens"
    }
  ],
  "id": "chatcmpl-0646bc72-cf4a-4afd-8fa0-030275b452f6",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?",
        "name": null,
        "tool_calls": null
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "created": 1757632496,
  "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct",
  "usage": {
    "prompt_tokens": 35,
    "completion_tokens": 10,
    "total_tokens": 45
  }
} 

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 11, 2025
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattf
Copy link
Collaborator

mattf commented Sep 15, 2025

@slekkala1 we're moving all the adapter we can to the OpenAIMixin, will you convert the fireworks adapter to use OpenAIMixin and fix usage for everyone?

@slekkala1
Copy link
Contributor Author

this needs to also revert https://github.com/llamastack/llama-stack/pull/3392/files

We revert this already @mattf #3402

@slekkala1
Copy link
Contributor Author

slekkala1 commented Sep 15, 2025

@slekkala1 we're moving all the adapter we can to the OpenAIMixin, will you convert the fireworks adapter to use OpenAIMixin and fix usage for everyone?

@mattf Yes I can do that, let me look into that instead then. Please link any related task you have

I see another similar bug got reported #3420 for the response.usage

@mattf
Copy link
Collaborator

mattf commented Sep 15, 2025

@slekkala1 we're moving all the adapter we can to the OpenAIMixin, will you convert the fireworks adapter to use OpenAIMixin and fix usage for everyone?

@mattf Yes I can do that, let me look into that instead then. Please link any related task you have

I see another similar bug got reported #3420 for the response.usage

#3387 has a list of other provider updates.

i expect Fireworks can end up looking like https://github.com/llamastack/llama-stack/blob/main/llama_stack/providers/remote/inference/gemini/gemini.py

however, don't bother updating the embedding/completion/chat_completion methods, they're on their way out #2365

@mattf
Copy link
Collaborator

mattf commented Sep 16, 2025

this may help for testing -

diff --git a/tests/integration/suites.py b/tests/integration/suites.py
index bacd7ef5..4e653605 100644
--- a/tests/integration/suites.py
+++ b/tests/integration/suites.py
@@ -90,6 +90,15 @@ SETUP_DEFINITIONS: dict[str, Setup] = {
             "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
         },
     ),
+    "fireworks": Setup(
+        name="fireworks",
+        description="Fireworks provider with a text model",
+        defaults={
+            "text_model": "accounts/fireworks/models/llama-v3p2-3b-instruct",
+            "vision_model": "accounts/fireworks/models/llama-v3p2-11b-vision-instruct",
+            "embedding_model": "accounts/fireworks/models/qwen3-embedding-8b",
+        },
+    ),
 }

@slekkala1
Copy link
Contributor Author

this may help for testing -

diff --git a/tests/integration/suites.py b/tests/integration/suites.py
index bacd7ef5..4e653605 100644
--- a/tests/integration/suites.py
+++ b/tests/integration/suites.py
@@ -90,6 +90,15 @@ SETUP_DEFINITIONS: dict[str, Setup] = {
             "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
         },
     ),
+    "fireworks": Setup(
+        name="fireworks",
+        description="Fireworks provider with a text model",
+        defaults={
+            "text_model": "accounts/fireworks/models/llama-v3p2-3b-instruct",
+            "vision_model": "accounts/fireworks/models/llama-v3p2-11b-vision-instruct",
+            "embedding_model": "accounts/fireworks/models/qwen3-embedding-8b",
+        },
+    ),
 }

Thanks for this, yes looking into some test failures for fireworks. Running into rate limit execeeded.

Thinking of having a separate diff once I fix some of these tests. Will close this diff soon.

@slekkala1
Copy link
Contributor Author

@mattf Closing this diff, Opened new draft at #3480 with test and OpenAIMixin refactoring.
However, since firework provider uses OpenAIChatCompletionToLlamaStackMixin for llama_models, we continue fix this bug with response.usage in openai_compat.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fireworks model chat completion broken with telemetry
2 participants