-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
System Info
(llama-stack) (base) swapna942@swapna942-mac llama-stack % python -m "torch.utils.collect_env"
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.8.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.6.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: version 4.0.3
Libc version: N/A
Python version: 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime)
Python platform: macOS-15.6.1-arm64-arm-64bit-Mach-O
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4 Max
Versions of relevant libraries:
[pip3] Could not collect
[conda] numpy 2.3.1 pypi_0 pypi
[conda] torch 2.7.1 pypi_0 pypi
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
Steps to reproduce:
uv run --with llama-stack llama stack build --distro starter --image-type venv --run
- Try
curl -X POST http://0.0.0.0:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct", "messages": [{"role": "user", "content": "Hello!"}] }' {"detail":"Internal server error: An unexpected error occurred."}%
and fails with 500
Error logs
In server side logs we see
INFO 2025-09-09 15:14:25,353 console_span_processor:28 telemetry: 22:14:25.353 [START] /v1/openai/v1/chat/completions
INFO 2025-09-09 15:14:25,359 console_span_processor:39 telemetry: 22:14:25.355 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.22ms)
INFO 2025-09-09 15:14:25,360 console_span_processor:48 telemetry: output: {'identifier':
'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',
'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}
INFO 2025-09-09 15:14:25,362 console_span_processor:39 telemetry: 22:14:25.361 [END] ModelsRoutingTable.get_provider_impl [StatusCode.OK] (0.20ms)
INFO 2025-09-09 15:14:25,362 console_span_processor:48 telemetry: output:
<llama_stack.providers.remote.inference.fireworks.fireworks.FireworksInferenceAdapter object at 0x1143e56a0>
INFO 2025-09-09 15:14:25,364 console_span_processor:39 telemetry: 22:14:25.363 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.21ms)
INFO 2025-09-09 15:14:25,365 console_span_processor:48 telemetry: output: {'identifier':
'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',
'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}
INFO 2025-09-09 15:14:25,367 console_span_processor:39 telemetry: 22:14:25.366 [END] ModelsRoutingTable.get_model [StatusCode.OK] (0.17ms)
INFO 2025-09-09 15:14:25,367 console_span_processor:48 telemetry: output: {'identifier':
'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct', 'provider_resource_id': 'accounts/fireworks/models/llama-v3p1-8b-instruct',
'provider_id': 'fireworks', 'type': 'model', 'owner': None, 'source': 'listed_from_provider', 'metadata': {}, 'model_type': 'llm'}
ERROR 2025-09-09 15:14:25,634 __main__:257 core::server: Error executing endpoint route='/v1/openai/v1/chat/completions' method='post':
'OpenAIChatCompletion' object has no attribute 'usage'
INFO 2025-09-09 15:14:25,635 uvicorn.access:473 uncategorized: 127.0.0.1:65526 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 500
INFO 2025-09-09 15:14:25,639 console_span_processor:39 telemetry: 22:14:25.636 [END] FireworksInferenceAdapter.chat_completion [StatusCode.OK]
(270.81ms)
INFO 2025-09-09 15:14:25,640 console_span_processor:48 telemetry: output: {'metrics': None, 'completion_message': {'role': 'assistant',
'content': 'Hello! How can I assist you today?', 'stop_reason': 'end_of_turn', 'tool_calls': []}, 'logprobs': None}
INFO 2025-09-09 15:14:25,642 console_span_processor:39 telemetry: 22:14:25.641 [END] FireworksInferenceAdapter.openai_chat_completion
[StatusCode.OK] (277.90ms)
INFO 2025-09-09 15:14:25,643 console_span_processor:48 telemetry: output: {'id': 'chatcmpl-8bfeb3b1-9a09-468f-9347-d55f1debe3b7', 'choices':
[{'message': {'role': 'assistant', 'content': 'Hello! How can I assist you today?', 'name': None, 'tool_calls': None}, 'finish_reason':
'stop', 'index': 0, 'logprobs': None}], 'object': 'chat.completion', 'created': 1757456065, 'model':
'fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct'}
INFO 2025-09-09 15:14:25,645 console_span_processor:39 telemetry: 22:14:25.643 [END] InferenceRouter.openai_chat_completion [StatusCode.OK]
(289.42ms)
INFO 2025-09-09 15:14:25,646 console_span_processor:48 telemetry: error: 'OpenAIChatCompletion' object has no attribute 'usage'
INFO 2025-09-09 15:14:25,648 console_span_processor:39 telemetry: 22:14:25.647 [END] /v1/openai/v1/chat/completions [StatusCode.OK] (293.99ms)
INFO 2025-09-09 15:14:25,649 console_span_processor:48 telemetry: raw_path: /v1/openai/v1/chat/completions
Expected behavior
Shouldnt fail and chat completion need to work, telemetry may not work is ok