Response API Error 500 when telemetry enabled and using gemini models

### System Info

  - LlamaStack Version: 0.2.18 (distribution-starter)

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

Summary

  LlamaStack's responses API fails with 'ModelResponseStream' object has no attribute 'usage' when using Gemini models,
  preventing the API from functioning properly.

  Environment

  - LlamaStack Version: 0.2.18 (distribution-starter)
  - Affected Models: All Gemini models (tested with gemini-2.0-flash, gemini-2.5-flash, gemini-2.5-pro)
  - Provider: remote::vertexai
  - API Endpoint: /v1/openai/v1/responses
  - Deployment: Kubernetes/OpenShift

  Steps to Reproduce

  1. Configure LlamaStack with a Gemini model using the vertexai provider
  2. Enable telemetry (default configuration)
  3. Make a request to the responses API:
  ``` 
  import openai
  client = openai.OpenAI(base_url="http://llamastack:8321/v1/openai/v1")
  response = client.responses.create(
      input=[{"role": "user", "content": "Hello", "type": "message"}],
      model="appeng-ai-quickstarts-vertexai/vertex_ai/gemini-2.0-flash",
      stream=False
  )
  ```

  Expected Behavior

  - Responses API should return a successful response object
  - Telemetry should handle missing usage attributes gracefully

  Actual Behavior

  - HTTP 500 Internal Server Error
  - Server logs show: 'ModelResponseStream' object has no attribute 'usage'
  - API request fails completely

  Root Cause Analysis

  The issue occurs in llama_stack/core/routers/inference.py in the openai_chat_completion method (lines ~532-536):
  ```
  if self.telemetry:
      metrics = self._construct_metrics(
          prompt_tokens=response.usage.prompt_tokens,      # ← FAILS HERE
          completion_tokens=response.usage.completion_tokens,
          total_tokens=response.usage.total_tokens,
          model=model_obj,
      )
  ```
  Problem: The code unconditionally accesses response.usage attributes for telemetry logging, but Gemini's ModelResponseStream objects do not provide a usage attribute.

  Additional locations with same issue:
  - Line ~428-430 in openai_completion method
  - Line ~801-805 in streaming section

  Impact

  - Severity: High - Completely blocks responses API usage with Gemini models
  - Scope: Affects all Gemini model variants when telemetry is enabled
  - Workaround: Disable telemetry entirely (removes observability)

  Proposed Fix

  Add defensive checks before accessing usage attributes:

  ```
  if self.telemetry and hasattr(response, 'usage') and response.usage is not None:
      metrics = self._construct_metrics(
          prompt_tokens=response.usage.prompt_tokens,
          completion_tokens=response.usage.completion_tokens,
          total_tokens=response.usage.total_tokens,
          model=model_obj,
      )
      # ... rest of telemetry logic
  ```

  Apply this pattern to all locations that access response.usage or chunk.usage.

  Additional Context

  - OpenAI chat completions API works fine with the same Gemini models
  - Issue is specific to responses API internal implementation
  - Both LlamaStack and OpenAI clients hit the same server-side error
  - Telemetry disabling workaround confirmed - removing telemetry from config resolves the issue

  Test Case

  ```
  # This should work without throwing AttributeError
  response = client.responses.create(
      input=[{"role": "user", "content": "test", "type": "message"}],
      model="appeng-ai-quickstarts-vertexai/vertex_ai/gemini-2.0-flash"
  )
  assert response.status == "completed"
  ```

  Files to Modify

  - llama_stack/core/routers/inference.py (primary fix location)
  - Any other files that unconditionally access .usage attributes

### Error logs

```

  INFO     2025-09-11 16:52:02,421 console_span_processor:62 telemetry: 16:52:02.365 [INFO]

           LiteLLM completion() model= gemini-2.0-flash; provider = vertex_ai

  INFO     2025-09-11 16:52:02,429 console_span_processor:39 telemetry: 16:52:02.422 [END]
  InferenceRouter.openai_chat_completion [StatusCode.OK]
           (55.76ms)

  INFO     2025-09-11 16:52:02,430 console_span_processor:48 telemetry: output: <async_generator object

           InferenceRouter.stream_tokens_and_compute_metrics_openai_chat at 0x7f0c3c29f4c0>

  ERROR    2025-09-11 16:52:02,991 __main__:253 server: Error executing endpoint route='/v1/openai/v1/responses' method='post':
   'ModelResponseStream'
           object has no attribute 'usage'

  INFO     2025-09-11 16:52:02,992 uvicorn.access:473 uncategorized: 10.131.0.115:50470 - "POST /v1/openai/v1/responses
  HTTP/1.1" 500
  INFO     2025-09-11 16:52:03,001 console_span_processor:39 telemetry: 16:52:02.994 [END]
  InferenceRouter.stream_tokens_and_compute_metrics_openai_chat
           [StatusCode.OK] (561.84ms)

  INFO     2025-09-11 16:52:03,002 console_span_processor:48 telemetry: chunk_count: 4

  INFO     2025-09-11 16:52:03,009 console_span_processor:39 telemetry: 16:52:03.004 [END] /v1/openai/v1/responses
  [StatusCode.OK] (689.96ms)
  INFO     2025-09-11 16:52:03,010 console_span_processor:48 telemetry: raw_path: /v1/openai/v1/responses

  INFO     2025-09-11 16:52:03,011 console_span_processor:62 telemetry: 16:52:02.992 [ERROR] Error executing endpoint
  route='/v1/openai/v1/responses'
           method='post': 'ModelResponseStream' object has no attribute 'usage'

  INFO     2025-09-11 16:52:03,012 console_span_processor:62 telemetry: 16:52:02.993 [INFO] 10.131.0.115:50470 - "POST
  /v1/openai/v1/responses
           HTTP/1.1" 500
```

### Expected behavior

  - Responses API should return a successful response object
  - Telemetry should handle missing usage attributes gracefully


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Response API Error 500 when telemetry enabled and using gemini models #3420

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Response API Error 500 when telemetry enabled and using gemini models #3420

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions