Skip to content

Fireworks model chat completion broken with telemetry #3391

@slekkala1

Description

@slekkala1

System Info

(llama-stack) (base) swapna942@swapna942-mac llama-stack % python -m "torch.utils.collect_env"
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.8.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.6.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: version 4.0.3
Libc version: N/A

Python version: 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime)
Python platform: macOS-15.6.1-arm64-arm-64bit-Mach-O
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M4 Max

Versions of relevant libraries:
[pip3] Could not collect
[conda] numpy 2.3.1 pypi_0 pypi
[conda] torch 2.7.1 pypi_0 pypi

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Steps to reproduce:

  1. uv run --with llama-stack llama stack build --distro starter --image-type venv --run
  2. Try curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}] }' {"detail":"Internal server error: An unexpected error occurred."}% and fails with 500

Error logs

In server side logs we see

INFO     2025-09-09 15:14:25,353 console_span_processor:28 telemetry: 22:14:25.353 [START] /v1/openai/v1/chat/completions                             
INFO     2025-09-09 15:14:25,359 console_span_processor:39 telemetry: 22:14:25.355 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.22ms)        
INFO     2025-09-09 15:14:25,360 console_span_processor:48 telemetry:     output: {'identifier':                                                      
         'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',    
         'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}           
INFO     2025-09-09 15:14:25,362 console_span_processor:39 telemetry: 22:14:25.361 [END] ModelsRoutingTable.get_provider_impl [StatusCode.OK] (0.20ms)
INFO     2025-09-09 15:14:25,362 console_span_processor:48 telemetry:     output:                                                                     
         <llama_stack.providers.remote.inference.fireworks.fireworks.FireworksInferenceAdapter object at 0x1143e56a0>                                 
INFO     2025-09-09 15:14:25,364 console_span_processor:39 telemetry: 22:14:25.363 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.21ms)        
INFO     2025-09-09 15:14:25,365 console_span_processor:48 telemetry:     output: {'identifier':                                                      
         'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',    
         'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}           
INFO     2025-09-09 15:14:25,367 console_span_processor:39 telemetry: 22:14:25.366 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.17ms)        
INFO     2025-09-09 15:14:25,367 console_span_processor:48 telemetry:     output: {'identifier':                                                      
         'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',    
         'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}           
ERROR    2025-09-09 15:14:25,634 __main__:257 core::server: Error executing endpoint route='/v1/openai/v1/chat/completions' method='post':            
         'OpenAIChatCompletion' object has no attribute 'usage'                                                                                       
INFO     2025-09-09 15:14:25,635 uvicorn.access:473 uncategorized: 127.0.0.1:65526 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 500               
INFO     2025-09-09 15:14:25,639 console_span_processor:39 telemetry: 22:14:25.636 [END] FireworksInferenceAdapter.chat_completion [StatusCode.OK]    
         (270.81ms)                                                                                                                                   
INFO     2025-09-09 15:14:25,640 console_span_processor:48 telemetry:     output: {'metrics': None, 'completion_message': {'role': 'assistant',       
         'content': 'Hello! How can I assist you today?', 'stop_reason': 'end_of_turn', 'tool_calls': []}, 'logprobs': None}                          
INFO     2025-09-09 15:14:25,642 console_span_processor:39 telemetry: 22:14:25.641 [END] FireworksInferenceAdapter.openai_chat_completion             
         [StatusCode.OK] (277.90ms)                                                                                                                   
INFO     2025-09-09 15:14:25,643 console_span_processor:48 telemetry:     output: {'id': 'chatcmpl-8bfeb3b1-9a09-468f-9347-d55f1debe3b7', 'choices':  
         [{'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'name': None, 'tool_calls': None}, 'finish_reason':      
         'stop', 'index': 0, 'logprobs': None}], 'object': 'chat.completion', 'created': 1757456065, 'model':                                         
         'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct'}                                                                                
INFO     2025-09-09 15:14:25,645 console_span_processor:39 telemetry: 22:14:25.643 [END] InferenceRouter.openai_chat_completion [StatusCode.OK]       
         (289.42ms)                                                                                                                                   
INFO     2025-09-09 15:14:25,646 console_span_processor:48 telemetry:     error: 'OpenAIChatCompletion' object has no attribute 'usage'               
INFO     2025-09-09 15:14:25,648 console_span_processor:39 telemetry: 22:14:25.647 [END] /v1/openai/v1/chat/completions [StatusCode.OK] (293.99ms)    
INFO     2025-09-09 15:14:25,649 console_span_processor:48 telemetry:     raw_path: /v1/openai/v1/chat/completions       

Expected behavior

Shouldnt fail and chat completion need to work, telemetry may not work is ok

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions