Skip to content

Conversation

dinmukhamedm
Copy link
Collaborator

@dinmukhamedm dinmukhamedm commented Aug 11, 2025

Fixes #3242

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Dynamically import types for OpenAI 1.99 compatibility and update tests and dependencies accordingly.

  • Behavior:
    • Dynamically import ChatCompletionMessageFunctionToolCall and FunctionCall in chat_wrappers.py to support OpenAI 1.99 and older versions.
    • Update _parse_tool_calls() in chat_wrappers.py to use pydantic.BaseModel for compatibility.
    • Modify responses_get_or_create_wrapper() and async_responses_get_or_create_wrapper() in responses_wrappers.py to handle output_text extraction.
  • Dependencies:
    • Update openai version to 1.99.7 in pyproject.toml.
  • Tests:
    • Update tests in test_chat.py and test_responses.py to reflect changes in type handling and ensure compatibility with OpenAI 1.99.

This description was created by Ellipsis for b920f6d. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features
    • Broader compatibility with newer and older OpenAI SDKs, including support for pydantic-based tool calls.
  • Bug Fixes
    • Prevented crashes when output_text is missing by safely extracting text from nested response content.
    • Improved streaming error handling to ensure spans are properly ended and metrics recorded.
  • Refactor
    • More robust, thread-safe streaming cleanup and consistent metrics/attributes across streaming and non-streaming paths.
  • Chores
    • Pinned OpenAI dependency to version 1.99.7 for consistent behavior.
  • Tests
    • Updated tests to new tool-call types and response shapes; adjusted async execution strategy.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to b920f6d in 2 minutes and 45 seconds. Click for details.
  • Reviewed 297 lines of code in 5 files
  • Skipped 1 files when reviewing.
  • Skipped posting 7 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:999
  • Draft comment:
    Consider narrowing the broad exception catch in _is_chat_message_function_tool_call to specific exceptions (e.g. ImportError, AttributeError) to avoid masking unexpected errors.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The broad exception handling here appears intentional and necessary. The code is dealing with version compatibility across different OpenAI SDK versions, where import paths and type structures may vary. The comments explicitly explain this is for version compatibility. Narrowing to specific exceptions could miss some edge cases or break compatibility with different versions. The suggestion to narrow exceptions is generally a good practice. Could there be version-specific errors we're not considering that should be caught explicitly? In this case, the broad exception handling is justified because we need to gracefully handle any issues with imports or type checking across multiple versions of the OpenAI SDK. The code's behavior (returning False) is appropriate for any exception. The broad exception handling should be kept as is since it's intentionally catching all possible errors for version compatibility, and the failure mode (returning False) is appropriate regardless of the specific exception type.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py:1018
  • Draft comment:
    Similarly, in _is_function_call, consider catching specific exceptions rather than a blanket 'except Exception'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The broad Exception catch is actually intentional here as this is a compatibility layer. The function needs to handle any possible error that could occur when checking the type, not just AttributeError. The import could fail, the class could be missing, or other version compatibility issues could arise. The function's purpose is to safely check the type and return False for any failure case. The suggestion to catch AttributeError would make the code more precise, but what if the import fails with ImportError? What if there are other version compatibility issues? The broad Exception catch is appropriate here because this is a compatibility layer that needs to handle any possible failure gracefully. Making it more specific could cause version compatibility issues. The broad Exception catch should be kept as is since this is a compatibility layer that needs to handle all possible failure cases gracefully.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:451
  • Draft comment:
    The extraction of output text using parsed_response.output[0].content[0].text is in a try/except that silently ignores exceptions. Consider logging a debug message if extraction fails to aid future troubleshooting.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While adding debug logging could help with troubleshooting, this appears to be an intentional fallback pattern where the code first tries one way to get the text, then another, and it's okay if both fail. The code continues execution either way and the output_text field is optional in the TracedData model. Silent failure seems to be the intended behavior here since this is part of telemetry/instrumentation code that shouldn't interfere with the main application flow. The comment identifies a potential debugging improvement, but may not fully understand that this is instrumentation code where silent failure is often preferable to avoid impacting the main application. Given this is instrumentation code and output_text is optional, the current silent failure approach is likely intentional and appropriate. Adding debug logging could add noise without providing significant value. The comment should be deleted as the current silent failure approach is appropriate for this instrumentation code context where output_text is optional.
4. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py:25
  • Draft comment:
    Tests have been updated to assert against the new nested structure (response.output[0].content[0].text). Ensure that this structure is consistent and documented in the updated OpenAI SDK so future changes do not break these assertions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the structure is consistent and documented, which violates the rule against asking the author to ensure things. It also suggests documenting in the SDK, which is not a specific code suggestion or request for a test. Therefore, this comment should be removed.
5. packages/opentelemetry-instrumentation-openai/pyproject.toml:41
  • Draft comment:
    Pinning openai version to 1.99.7 ensures compatibility with new types, but be aware that future changes in the OpenAI SDK might require further adjustments.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is about a dependency change, specifically pinning the OpenAI version. The comment is purely informative and does not provide a specific suggestion or question about the code. It also does not ask for a test or a specific confirmation about the change. Therefore, it violates the rules about commenting on dependencies and making purely informative comments.
6. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:467
  • Draft comment:
    There appears to be a typographical error on line 467. The variable parsed_response_output_text is used, but likely it should be parsed_response.output_text (using a dot to access the attribute).
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
7. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:570
  • Draft comment:
    Typo: It appears that 'parsed_response_output_text' is missing a dot. Should this be 'parsed_response.output_text'?
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_75pk65H575cGs2Nm

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

coderabbitai bot commented Aug 11, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Implements compatibility with OpenAI SDK 1.99.7 by replacing static type imports/checks with dynamic detection, broadening tool-call parsing to accept pydantic models, and updating response text extraction. Refactors ChatStream for robust cleanup and error handling across sync/async iterations. Updates tests to new SDK types/structures and pins test dependency to 1.99.7.

Changes

Cohort / File(s) Summary
Type compatibility and tool-call parsing
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
Replaced direct OpenAI type imports with runtime detection helpers; expanded _parse_tool_calls to accept dict or pydantic.BaseModel; adjusted parsing logic for Function/ToolCall types across SDK versions; added comments about SDK variations.
Streaming lifecycle and metrics
packages/.../openai/shared/chat_wrappers.py
Introduced thread-safe cleanup in ChatStream (del, context managers, locks/flags); ensured spans/metrics are finalized on errors and partial data; standardized streaming attributes and backward-compatibility paths.
Responses wrappers output text extraction
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Safely derive output_text via parsed_response.output_text or nested parsed_response.output[0].content[0].text with guards; applied to sync/async paths without other flow changes.
Tests updated for new SDK types/structures
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py, packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Switched to ChatCompletionMessageFunctionToolCall with dynamic imports and Function fallback; adjusted assertions to nested response content; executed async calls via asyncio.run in sync tests.
Dependency pin for tests
packages/opentelemetry-instrumentation-openai/pyproject.toml
Pinned openai test dependency from >=1.66.0 to 1.99.7 (extras unchanged).

Sequence Diagram(s)

sequenceDiagram
  participant UserCode
  participant ChatStream
  participant OpenAI.SDK
  participant Tracing
  participant Metrics

  UserCode->>ChatStream: iterate()/async iterate()
  ChatStream->>OpenAI.SDK: request stream
  loop chunks
    OpenAI.SDK-->>ChatStream: chunk
    ChatStream->>Metrics: record partial usage/tokens
    ChatStream->>Tracing: update span attributes
    ChatStream-->>UserCode: yield chunk
  end
  OpenAI.SDK-->>ChatStream: end-of-stream
  ChatStream->>Metrics: record final usage
  ChatStream->>Tracing: set status, end span
  ChatStream-->>UserCode: stop iteration
  Note over ChatStream: On error: cleanup, set span status, end span
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Assessment against linked issues

Objective Addressed Explanation
Update imports and isinstance checks to handle new Annotated/Union SDK types (#3242)
Ensure instrumentation works after upgrading to openai 1.99.7 (init/instrument succeeds) (#3242)
Adjust tests to new SDK types and structures (tool calls, response shapes) (#3242)
Handle response text extraction under new response schema (no AttributeError) (#3242)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Switching async tests from pytest.mark.asyncio to asyncio.run (packages/.../tests/traces/test_chat.py) Potentially test-infra style change; likely supportive, but not explicitly required by the issue’s objectives. Unclear if necessary strictly for SDK compatibility.

Possibly related PRs

Suggested reviewers

  • nirga

Poem

A rabbit taps keys with a vigilant ear,
Tool-calls now shapeshift—yet nothing to fear.
Streams clean their burrows, spans softly end,
Text found in tunnels that twist and bend.
With 1.99.7 the paths realign—
Hippity-hop, our traces still shine! 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

377-384: Dynamic import handles version compatibility well

The dynamic import with fallback ensures tests work across different OpenAI SDK versions. While repetitive across three test functions, this approach is acceptable for test code.

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai/pyproject.toml (1)

41-41: Consider using a version range instead of exact pinning

Pinning to exact version 1.99.7 in test dependencies could cause issues with security updates and bug fixes. Since the PR aims to support the 1.99 series, consider using ">=1.99.0,<2.0.0" or "~=1.99.0" to allow patch updates while maintaining compatibility.

-openai = { extras = ["datalib"], version = "1.99.7" }
+openai = { extras = ["datalib"], version = "~=1.99.0" }
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

450-458: Extract duplicated output text extraction logic

The output text extraction logic is duplicated between sync (lines 450-458) and async (lines 552-560) functions. Consider extracting this into a helper function to improve maintainability.

Add a helper function before line 450:

def _extract_output_text(parsed_response):
    """Extract output text from parsed response, handling different response structures."""
    if hasattr(parsed_response, "output_text"):
        return parsed_response.output_text
    try:
        if (parsed_response.output 
            and len(parsed_response.output) > 0 
            and parsed_response.output[0].content 
            and len(parsed_response.output[0].content) > 0):
            return parsed_response.output[0].content[0].text
    except Exception:
        pass
    return None

Then simplify both occurrences:

-parsed_response_output_text = None
-if hasattr(parsed_response, "output_text"):
-    parsed_response_output_text = parsed_response.output_text
-else:
-    try:
-        parsed_response_output_text = parsed_response.output[0].content[0].text
-    except Exception:
-        pass
+parsed_response_output_text = _extract_output_text(parsed_response)

Also applies to: 552-560

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between feb570b and 0d569da.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/pyproject.toml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

454-457: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


556-559: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

🔇 Additional comments (8)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py (1)

26-26: LGTM! Test updates correctly reflect new response structure

The test assertions have been properly updated to access the response text via the new nested structure response.output[0].content[0].text, aligning with OpenAI SDK 1.99.7 changes.

Also applies to: 49-49, 72-72, 81-81

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

7-8: LGTM! Import updated for new SDK type

The import correctly uses ChatCompletionMessageFunctionToolCall which is the new type name in OpenAI SDK 1.99.7.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py (6)

10-10: LGTM! Required import for type checking

The pydantic import is necessary for handling the new BaseModel-based tool call types in OpenAI SDK 1.99.7.


962-995: API change maintains backward compatibility

The _parse_tool_calls signature change to accept Union[dict, pydantic.BaseModel] maintains backward compatibility since dict is still supported. The implementation correctly handles both legacy dict format and new pydantic BaseModel representations.


998-1024: Well-designed type detection with version compatibility

The type detection helpers (_is_chat_message_function_tool_call and _is_function_call) are well-implemented with:

  • Dynamic imports that handle missing types gracefully
  • Clear comments explaining version compatibility
  • Proper fallback logic for different SDK versions

603-611: Robust cleanup implementation with proper safeguards

The cleanup state tracking and __del__ implementation include proper safeguards:

  • Thread-safe cleanup with _cleanup_lock
  • Defensive hasattr check in __del__ to handle interpreter shutdown scenarios
  • Prevention of duplicate cleanup operations

644-674: Excellent error handling in stream iterators

The enhanced error handling in __next__ and __anext__ properly:

  • Distinguishes between normal completion (StopIteration) and errors
  • Ensures cleanup happens in all scenarios
  • Sets appropriate span status on errors
  • Re-raises exceptions after cleanup

784-815: Excellent addition for partial stream metrics

The _record_partial_metrics method ensures observability even when streams are partially consumed or interrupted. It safely records all available metrics without throwing exceptions, which is crucial for maintaining telemetry integrity.

Comment on lines +450 to 458
parsed_response_output_text = None
if hasattr(parsed_response, "output_text"):
parsed_response_output_text = parsed_response.output_text
else:
try:
parsed_response_output_text = parsed_response.output[0].content[0].text
except Exception:
pass
traced_data = TracedData(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add array bounds validation for safer nested access

The fallback logic parsed_response.output[0].content[0].text could raise IndexError if either array is empty. Consider adding bounds checking for more robust error handling.

 parsed_response_output_text = None
 if hasattr(parsed_response, "output_text"):
     parsed_response_output_text = parsed_response.output_text
 else:
     try:
-        parsed_response_output_text = parsed_response.output[0].content[0].text
+        if (parsed_response.output 
+            and len(parsed_response.output) > 0 
+            and parsed_response.output[0].content 
+            and len(parsed_response.output[0].content) > 0):
+            parsed_response_output_text = parsed_response.output[0].content[0].text
     except Exception:
         pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parsed_response_output_text = None
if hasattr(parsed_response, "output_text"):
parsed_response_output_text = parsed_response.output_text
else:
try:
parsed_response_output_text = parsed_response.output[0].content[0].text
except Exception:
pass
traced_data = TracedData(
parsed_response_output_text = None
if hasattr(parsed_response, "output_text"):
parsed_response_output_text = parsed_response.output_text
else:
try:
if (parsed_response.output
and len(parsed_response.output) > 0
and parsed_response.output[0].content
and len(parsed_response.output[0].content) > 0):
parsed_response_output_text = parsed_response.output[0].content[0].text
except Exception:
pass
traced_data = TracedData(
🧰 Tools
🪛 Ruff (0.12.2)

454-457: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)

🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
around lines 450 to 458, the fallback access
parsed_response.output[0].content[0].text can raise IndexError if output or
content arrays are empty; add explicit bounds checks (e.g. verify
parsed_response.output is a list-like and len(parsed_response.output) > 0 and
parsed_response.output[0].content is list-like and len(...) > 0) before
indexing, and narrow the exception handling to IndexError/AttributeError (or use
conditional attribute/getattr checks) so parsed_response_output_text safely
falls back to None without swallowing unrelated exceptions.

@nirga nirga merged commit 3a50021 into traceloop:main Aug 11, 2025
9 checks passed
@nirga nirga deleted the fix/openai-types branch August 11, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: OpenAI's new SDK types break most of the instrumentation

2 participants