-
Notifications
You must be signed in to change notification settings - Fork 804
fix(langchain): span attrs and metrics missing of langchain third party integration #3391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(langchain): span attrs and metrics missing of langchain third party integration #3391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 15aff50 in 1 minute and 28 seconds. Click for details.
- Reviewed
1337
lines of code in7
files - Skipped
0
files when reviewing. - Skipped posting
6
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:72
- Draft comment:
New histograms for TTFT and streaming time are added. Consider using a monotonic clock (e.g. time.monotonic()) for duration measurements to avoid issues on systems with clock adjustments, and add inline documentation for these metrics. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The suggestion about monotonic clock could be valid for precise timing measurements. However, we don't see the actual timing measurement code in this file - it's likely in the callback handler. The documentation suggestion is not needed since there are already clear descriptions in the create_histogram calls. Without seeing the actual timing code, we can't be certain the monotonic clock suggestion is relevant. I might be wrong about the monotonic clock suggestion - it could be a critical best practice for time measurements in distributed systems. The current descriptions might also be insufficient for other developers. While the monotonic clock suggestion might be valid, we can't verify its relevance without seeing the timing implementation. The existing metric descriptions are clear and sufficient. Delete the comment since we can't verify if the monotonic clock suggestion is applicable, and the documentation request is already satisfied by existing descriptions.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:584
- Draft comment:
The on_llm_new_token method correctly tracks TTFT on the first token. Ensure that, in high-concurrency scenarios, potential race conditions are handled if multiple tokens are processed concurrently. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the author to ensure that race conditions are handled in high-concurrency scenarios. This falls under asking the author to ensure behavior, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue in the code.
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:668
- Draft comment:
The fallback logic for obtaining the model name in on_llm_end is robust. Consider adding a brief inline comment to clarify the priority order of model name extraction for future maintainability. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py:155
- Draft comment:
The model inference logic in _infer_model_from_class_name handles known third-party integrations. Consider documenting the expected format of the serialized input to improve maintainability. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While documenting input formats can be helpful, in this case the function is already well-structured and has clear error handling. The serialized dict format is fairly simple - it just needs a potential "kwargs" field that may contain model fields. The code handles missing or malformed input gracefully by falling back to unknown model names. Additional documentation would be nice-to-have but not critical. The comment has merit since documenting expected input formats is generally good practice. However, the current code handles all edge cases gracefully and the format is fairly straightforward. While documentation could be improved, this seems more like a nice-to-have suggestion rather than a critical issue requiring immediate action. The code is defensive and handles malformed input well. Delete this comment as it suggests a nice-to-have documentation improvement rather than a critical code change. The code already handles inputs robustly.
5. packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py:118
- Draft comment:
The tests for ChatDeepSeek integration effectively cover both standard extraction and fallback cases. Ensure that if model naming conventions change in the future, these tests are updated accordingly. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is purely informative and suggests ensuring future updates, which violates the rules. It doesn't provide a specific code suggestion or ask for a specific test to be written.
6. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py:231
- Draft comment:
There appears to be an extra closing parenthesis before the '-> None:' in the function signature. Please check if this is a typographical error. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_bfKB5E9v4FYJHRmv
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
self.handler.spans[run_id] = span_holder | ||
|
||
# Simulate first token arrival after a small delay | ||
time.sleep(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests simulate delays using time.sleep(), which can be fragile in CI environments. Consider using a time mocking approach to simulate passage of time in a more stable way.
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds GenAI incubating metrics (TTFT, streaming time, choices counter, exception counter) to LangChain instrumentation, wires them through instrumentor and TraceloopCallbackHandler, enhances model-name extraction and span metadata handling, adds DeepSeek dependency and streaming test cassette, and expands tests for streaming/third‑party models and metrics. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant LangChain
participant Callback as TraceloopCallbackHandler
participant Span as Span/SpanHolder
participant OTel as OTel Metrics
App->>LangChain: invoke LLM/chat (streaming)
LangChain->>Callback: on_chat_model_start / on_llm_start(kwargs, serialized, metadata)
Callback->>Span: set_request_params(span_holder, kwargs, serialized, metadata)
Note right of Callback: resolve model_name via kwargs/serialized/metadata/class-name
loop stream tokens
LangChain->>Callback: on_llm_new_token(token, run_id)
alt first token
Callback->>Span: record first_token_time
Callback->>OTel: ttft_histogram.record(value, attributes)
else subsequent token
Callback-->>OTel: (no TTFT)
end
end
LangChain->>Callback: on_llm_end(result, run_id)
Callback->>OTel: duration_histogram.record(...)
Callback->>OTel: token_histogram.record(...)
opt streaming
Callback->>OTel: streaming_time_histogram.record(...)
Callback->>OTel: choices_counter.add(n, attributes)
end
Callback->>Span: end span & cleanup
sequenceDiagram
autonumber
participant LangChain
participant Callback
participant Span
participant OTel as OTel Metrics
LangChain->>Callback: on_llm_error(error, run_id)
Callback->>Span: resolve model_name & error type
Callback->>OTel: exception_counter.add(1, attributes: error.type, model, system, server.address)
Callback-->>LangChain: propagate/return
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
150-174
: Harden cassette scrubbing: handle bytes/JSON, nested secrets, and response bodies; expand header/query filtering.Reduce risk of leaking keys in recorded cassettes and make body-type handling robust.
Apply:
@@ def vcr_config(): - def before_record_request(request): - if hasattr(request, "body") and request.body: - import json - - try: - if isinstance(request.body, (str, bytes)): - body_str = ( - request.body.decode("utf-8") - if isinstance(request.body, bytes) - else request.body - ) - body_data = json.loads(body_str) - if "api_key" in body_data: - body_data["api_key"] = "FILTERED" - request.body = json.dumps(body_data) - except (json.JSONDecodeError, UnicodeDecodeError, AttributeError): - pass - return request + def before_record_request(request): + import json, re + SENSITIVE_KEYS = {"api_key", "apikey", "key", "token", "access_token", "authorization", "password"} + + def _scrub(obj): + if isinstance(obj, dict): + for k, v in list(obj.items()): + if k.lower() in SENSITIVE_KEYS: + obj[k] = "FILTERED" + else: + _scrub(v) + elif isinstance(obj, list): + for i in obj: + _scrub(i) + + if hasattr(request, "body") and request.body: + body = request.body + try: + body_str = body.decode("utf-8") if isinstance(body, (bytes, bytearray)) else str(body) + try: + data = json.loads(body_str) + _scrub(data) + body_scrubbed = json.dumps(data) + except json.JSONDecodeError: + # Fallback: redact bearer/API keys in raw text + body_scrubbed = re.sub(r'(?i)(bearer\s+)[A-Za-z0-9._-]+', r'\1FILTERED', body_str) + body_scrubbed = re.sub(r'(?i)("?(api_?key|token|access_token)"?\s*:\s*)"[^"]+"', r'\1"FILTERED"', body_scrubbed) + request.body = body_scrubbed.encode("utf-8") if isinstance(body, (bytes, bytearray)) else body_scrubbed + except (UnicodeDecodeError, AttributeError, TypeError): + pass + return request + + def before_record_response(response): + # VCRpy response is a dict-like structure + import json, re + try: + body = response.get("body", {}).get("string") + if not body: + return response + body_str = body.decode("utf-8") if isinstance(body, (bytes, bytearray)) else str(body) + try: + data = json.loads(body_str) + # Best-effort scrub + def _scrub(obj): + if isinstance(obj, dict): + for k in list(obj.keys()): + if str(k).lower() in {"api_key", "apikey", "key", "token", "access_token", "authorization", "password"}: + obj[k] = "FILTERED" + else: + _scrub(obj[k]) + elif isinstance(obj, list): + for i in obj: + _scrub(i) + _scrub(data) + body_scrubbed = json.dumps(data) + except json.JSONDecodeError: + body_scrubbed = re.sub(r'(?i)(bearer\s+)[A-Za-z0-9._-]+', r'\1FILTERED', body_str) + response["body"]["string"] = body_scrubbed.encode("utf-8") if isinstance(body, (bytes, bytearray)) else body_scrubbed + except Exception: + pass + return response @@ - return { - "filter_headers": ["authorization", "x-api-key"], - "match_on": ["method", "scheme", "host", "port", "path", "query"], - "before_record_request": before_record_request, - } + return { + "filter_headers": ["authorization", "Authorization", "x-api-key", "X-API-KEY"], + "filter_query_parameters": ["api_key", "key", "token", "access_token"], + "match_on": ["method", "scheme", "host", "port", "path", "query"], + "before_record_request": before_record_request, + "before_record_response": before_record_response, + }packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
334-341
: Incorrect type check: use isinstance instead of identity comparisonComparison to the str type is always False; causes JSON-quoted strings in completions.
- if generation.message.content is str: + if isinstance(generation.message.content, str):
🧹 Nitpick comments (11)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (5)
93-95
: Uninstrument in reverse order of instrumentation.Reduces chance of teardown interdependencies.
- openai_instrumentor.uninstrument() - langchain_instrumentor.uninstrument() - bedrock_instrumentor.uninstrument() + bedrock_instrumentor.uninstrument() + langchain_instrumentor.uninstrument() + openai_instrumentor.uninstrument()
88-90
: Pass meter_provider to Bedrock instrumentor too.Keeps metrics pipeline consistent across instrumentors.
- bedrock_instrumentor = BedrockInstrumentor() - bedrock_instrumentor.instrument(tracer_provider=tracer_provider) + bedrock_instrumentor = BedrockInstrumentor() + bedrock_instrumentor.instrument(tracer_provider=tracer_provider, meter_provider=meter_provider)
68-73
: Set explicit service.name on the MeterProvider resource.Helps disambiguate telemetry from parallel test runs.
- resource = Resource.create() + resource = Resource.create({"service.name": "langchain-instrumentation-tests"})
137-147
: Add DEEPSEEK_API_KEY default for third‑party tests.Prevents accidental skipping/failure if DeepSeek-dependent tests are added.
if not os.environ.get("TAVILY_API_KEY"): os.environ["TAVILY_API_KEY"] = "test" + if not os.environ.get("DEEPSEEK_API_KEY"): + os.environ["DEEPSEEK_API_KEY"] = "test"
106-109
: Remove unused local variable in fixtures.Minor cleanup; yield the session instrumentor directly.
- instrumentor = instrument_legacy - - yield instrumentor + yield instrument_legacy- instrumentor = instrument_legacy - - yield instrumentor + yield instrument_legacyAlso applies to: 123-126
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (1)
72-99
: New streaming and completion metrics properly configured.The new metrics (TTFT histogram, streaming time histogram, choices counter, and exception counter) are correctly configured with appropriate names, units, and descriptions. The metric names align with OpenTelemetry semantic conventions.
One minor inconsistency: the exception counter uses a custom name
"llm.langchain.completions.exceptions"
instead of following the pattern from existingMeters
constants.Consider using a constant from the
Meters
class for consistency:- exception_counter = meter.create_counter( - name="llm.langchain.completions.exceptions", + exception_counter = meter.create_counter( + name="llm.langchain.completions.exceptions", # Consider adding to Meters classpackages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (1)
412-450
: Exception metrics testing handles error scenarios appropriately.The test properly validates that exception metrics are recorded when LLM calls fail, including correct error type attribution.
However, consider the static analysis suggestion about blind exception handling:
The test could be improved by catching a more specific exception type instead of the generic
Exception
:- except Exception: - pass # Expected to fail + except Exception as e: + # Expected to fail - verify it's the mocked exception + assert "API Error" in str(e)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
583-622
: New token callback enables TTFT tracking.The
on_llm_new_token
method properly implements TTFT tracking by recording the time when the first token arrives. The model name resolution with fallbacks ensures reliable attribution.However, there are several unused method arguments flagged by static analysis:
Consider using underscore prefixes for intentionally unused parameters:
def on_llm_new_token( self, - token: str, + _token: str, *, - chunk: Optional[Union[GenerationChunk, ChatGenerationChunk]] = None, + chunk: Optional[Union[GenerationChunk, ChatGenerationChunk]] = None, # Keep for future use run_id: UUID, - parent_run_id: Optional[UUID] = None, + _parent_run_id: Optional[UUID] = None, - **kwargs: Any, + **_kwargs: Any, ) -> Any:packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
90-93
: Remove unused span_holder argument from _extract_model_name_from_requestIt isn’t used. Trim the param and update the call to reduce coupling and silence ARG001.
-def _extract_model_name_from_request( - kwargs, span_holder: SpanHolder, serialized: Optional[dict] = None, metadata: Optional[dict] = None -) -> str: +def _extract_model_name_from_request( + kwargs, serialized: Optional[dict] = None, metadata: Optional[dict] = None +) -> str: @@ - model = _extract_model_name_from_request(kwargs, span_holder, serialized, metadata) + model = _extract_model_name_from_request(kwargs, serialized, metadata)Also applies to: 181-183
129-134
: Avoid bare except/quiet pass around association_propertiesNarrow the exception; silent broad catches hinder debugging and trigger BLE001/S110.
- except Exception: - pass + except (ImportError, AttributeError): + # Best-effort lookup; ignore missing context API or attribute + pass
136-147
: Clean up dict key checks (RUF019) and simplify accessUse dict.get to reduce branching and improve clarity.
- if serialized: - if "kwargs" in serialized: - ser_kwargs = serialized["kwargs"] - for model_tag in ("model", "model_id", "model_name"): - if (model := ser_kwargs.get(model_tag)) is not None: - return model + if serialized: + ser_kwargs = serialized.get("kwargs", {}) + for model_tag in ("model", "model_id", "model_name"): + if (model := ser_kwargs.get(model_tag)) is not None: + return model @@ - if "id" in serialized and serialized["id"]: - class_name = serialized["id"][-1] if isinstance(serialized["id"], list) else str(serialized["id"]) + sid = serialized.get("id") + if sid: + class_name = sid[-1] if isinstance(sid, list) else str(sid) return _infer_model_from_class_name(class_name, serialized)- if "ChatDeepSeek" in class_name: - # Check if serialized contains the actual model name - if "kwargs" in serialized: - ser_kwargs = serialized["kwargs"] + if "ChatDeepSeek" in class_name: + # Check if serialized contains the actual model name + ser_kwargs = serialized.get("kwargs", {}) # ChatDeepSeek might store model in different fields for model_field in ("model", "_model", "model_name"): if model_field in ser_kwargs and ser_kwargs[model_field]: return ser_kwargs[model_field]Also applies to: 148-151, 161-166
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-langchain/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (10)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py
(2 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
(8 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-langchain/pyproject.toml
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/conftest.py
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_streaming_with_ttft_and_generation_time_metrics.yaml
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
(3 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_streaming_with_ttft_and_generation_time_metrics.yaml
🧬 Code graph analysis (6)
packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (4)
_extract_model_name_from_request
(90-152)_infer_model_from_class_name
(155-172)extract_model_name_from_response_metadata
(474-505)SpanHolder
(27-37)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (2)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
Meters
(36-61)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
TraceloopCallbackHandler
(147-1031)
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (4)
TraceloopCallbackHandler
(147-1031)on_llm_new_token
(584-621)on_llm_end
(624-748)on_llm_error
(881-890)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
SpanHolder
(27-37)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-261)
packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
TraceloopCallbackHandler
(147-1031)on_chat_model_start
(522-550)on_llm_end
(624-748)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
SpanHolder
(27-37)_extract_model_name_from_request
(90-152)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-261)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (7)
SpanHolder
(27-37)set_request_params
(175-222)set_llm_request
(225-245)_get_unified_unknown_model
(58-87)_set_span_attribute
(53-55)set_chat_response_usage
(387-471)set_chat_response
(313-384)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (1)
dont_throw
(55-77)
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
Meters
(36-61)SpanAttributes
(64-261)packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
instrument_legacy
(77-95)packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
reader
(37-41)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
171-171: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
193-194: try
-except
-pass
detected, consider logging the exception
(S110)
193-193: Do not catch blind exception: Exception
(BLE001)
586-586: Unused method argument: token
(ARG002)
588-588: Unused method argument: chunk
(ARG002)
590-590: Unused method argument: parent_run_id
(ARG002)
591-591: Unused method argument: kwargs
(ARG002)
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py
309-309: Unused function argument: instrument_legacy
(ARG001)
412-412: Unused function argument: instrument_legacy
(ARG001)
429-430: try
-except
-pass
detected, consider logging the exception
(S110)
429-429: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
58-58: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
58-58: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
91-91: Unused function argument: span_holder
(ARG001)
133-134: try
-except
-pass
detected, consider logging the exception
(S110)
133-133: Do not catch blind exception: Exception
(BLE001)
148-148: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
165-165: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
🔇 Additional comments (27)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
29-29
: pytest-recording declared in test deps — no action required.Confirmed pytest-recording = "^0.13.2" in packages/opentelemetry-instrumentation-langchain/pyproject.toml and pytest_plugins = ["pytest_recording"] in packages/opentelemetry-instrumentation-langchain/tests/conftest.py.
packages/opentelemetry-instrumentation-langchain/pyproject.toml (2)
57-57
: LGTM!The addition of
langchain-deepseek ^0.1.4
as a test dependency is appropriate for supporting the third-party model integration tests in this PR.
57-57
: Keep dependency as-is — langchain-deepseek ^0.1.4 is available on PyPI.
Verified via pip index (available versions include 0.1.4); no change required.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (2)
20-20
: Import added for new AI metrics.The import of
GenAIMetrics
from the incubating metrics module enables access to standardized metric names for TTFT tracking.
107-109
: New metrics properly wired into callback handler.The expanded constructor call correctly passes all six metric instruments to
TraceloopCallbackHandler
, enabling the streaming TTFT and completion metrics functionality.packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py (4)
12-30
: Setup method properly initializes all required mock metrics.The test setup correctly mocks all six metric instruments that are now required by the updated
TraceloopCallbackHandler
constructor.
31-66
: ChatDeepSeek model extraction test is comprehensive.The test properly validates model extraction from serialized kwargs, verifies span attributes are set correctly, and confirms the request_model is captured in the SpanHolder.
74-93
: Fallback behavior correctly tested.The test confirms that when no model is provided in serialized kwargs, the system falls back to
"deepseek-unknown"
as expected.
108-117
: Token metrics validation ensures correct model attribution.The test properly verifies that token usage metrics use the correct model name from the request fallback, ensuring consistent attribution across the instrumentation.
packages/opentelemetry-instrumentation-langchain/tests/metrics/test_langchain_metrics.py (3)
8-9
: New imports support extended metrics validation.The imports of
ERROR_TYPE
andGenAIMetrics
enable testing of the new streaming metrics and exception handling.
126-161
: Streaming metrics validation is thorough.The test properly validates that generation choices, TTFT, and streaming time metrics are recorded with correct attributes and values during streaming operations.
308-410
: ChatDeepSeek streaming test validates third-party integration fixes.The test comprehensively validates that the ChatDeepSeek integration properly records all streaming metrics (TTFT, generation time, choices) with correct model name attribution. The test confirms the fixes for third-party model handling work as expected.
packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py (3)
15-27
: Test setup properly initializes SpanHolder.The setup method correctly creates a SpanHolder instance for testing model extraction functions.
46-73
: DeepSeek model extraction tests cover key scenarios.The tests properly validate both successful model extraction from serialized kwargs and fallback behavior when model information is missing.
127-147
: Priority order testing ensures correct precedence.The test confirms that model extraction follows the correct priority order (kwargs > invocation_params > nested structures), which is crucial for reliable model identification.
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py (4)
14-33
: Test setup correctly mocks all required metrics instruments.The setup properly initializes all six metric mocks needed for the updated callback handler.
54-56
: Time-based testing approach with potential CI fragility.The test uses
time.sleep(0.1)
to simulate delays, which could be unreliable in CI environments with varying system load.Consider using time mocking for more stable tests:
from unittest.mock import patch import time @patch('time.time') def test_ttft_metric_recorded_on_first_token(self, mock_time): # Control time progression explicitly mock_time.side_effect = [start_time, start_time + 0.1, start_time + 0.15] # ... rest of test
34-67
: TTFT metric test properly validates first token timing.The test correctly verifies that the TTFT metric is recorded when the first token arrives and includes proper attribute validation.
131-168
: Streaming time metric test validates end-to-end timing.The test properly simulates the complete streaming flow from first token to completion and validates that streaming time is calculated correctly.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (5)
57-57
: Counter import added for new metrics.The import enables usage of Counter metrics alongside existing Histogram metrics.
148-157
: Constructor signature expanded for new metrics.The constructor correctly accepts all six metric instruments needed for comprehensive LLM observability, including TTFT, streaming time, choices, and exception metrics.
663-676
: Enhanced model name resolution with comprehensive fallbacks.The improved model name resolution ensures that a valid model name is always available for metric attribution, preventing None values from breaking downstream processing.
706-747
: Metrics recording uses shared attributes for consistency.The updated metrics recording in
on_llm_end
properly uses the shared attributes helper to ensure consistent attribution across all metrics. The streaming time calculation and choices counting are correctly implemented.
861-877
: Exception metrics properly track error events.The enhanced error handling in
_handle_error
correctly records exception metrics with proper model attribution and error type information.packages/opentelemetry-instrumentation-langchain/tests/metrics/cassettes/test_langchain_metrics/test_streaming_with_ttft_and_generation_time_metrics.yaml (1)
1-220
: Test cassette correctly captures DeepSeek streaming — no secrets found.grep matched only 'token' occurrences in the response "usage" fields (prompt_tokens/completion_tokens); no api_key, secret, authorization/bearer header, or other credentials found in headers or body.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
37-37
: New SpanHolder.first_token_time: confirm monotonic clock sourceFor TTFT correctness, ensure producers set this using a monotonic clock (time.perf_counter() or time.monotonic()), and that start_time uses the same clock when computing deltas.
54-55
: Attribute guard looks goodRejecting None/empty-string values before setting span attributes is correct.
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Show resolved
Hide resolved
...pentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
Outdated
Show resolved
Hide resolved
...pentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
Show resolved
Hide resolved
"""Enhanced model name extraction from response metadata with third-party support.""" | ||
|
||
# Standard extraction from response metadata | ||
for generations in response.generations: | ||
for generation in generations: | ||
if ( | ||
getattr(generation, "message", None) | ||
and getattr(generation.message, "response_metadata", None) | ||
and (model_name := generation.message.response_metadata.get("model_name")) | ||
): | ||
metadata = generation.message.response_metadata | ||
# Try multiple possible model name fields | ||
for model_field in ("model_name", "model", "model_id"): | ||
if (model_name := metadata.get(model_field)): | ||
return model_name | ||
|
||
# Enhanced extraction for third-party models | ||
# Check if llm_output contains model information | ||
if response.llm_output: | ||
for model_field in ("model", "model_name", "model_id"): | ||
if (model_name := response.llm_output.get(model_field)): | ||
return model_name | ||
|
||
# Check generation_info for model information | ||
for generations in response.generations: | ||
for generation in generations: | ||
if hasattr(generation, "generation_info") and generation.generation_info: | ||
for model_field in ("model", "model_name", "model_id"): | ||
if (model_name := generation.generation_info.get(model_field)): | ||
return model_name | ||
|
||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
🧩 Analysis chain
Type mismatch: function returns None but annotated as str
Change return type to Optional[str] to reflect behavior and prevent downstream assumptions.
-def extract_model_name_from_response_metadata(response: LLLMResult) -> str:
+def extract_model_name_from_response_metadata(response: LLMResult) -> Optional[str]:
Run to check callsites expecting str:
🏁 Script executed:
#!/bin/bash
rg -nP -C2 '\bextract_model_name_from_response_metadata\s*\('
Length of output: 3287
Change return type to Optional[str]
Function can return None when no model field is found — update the signature to:
def extract_model_name_from_response_metadata(response: LLMResult) -> Optional[str]
and add "from typing import Optional".
Call sites found: packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py (asserts returned string) and packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (already checks for None). No other callers found.
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
around lines 475 to 506, the function can return None but the signature
currently does not reflect that; update the signature to def
extract_model_name_from_response_metadata(response: LLMResult) -> Optional[str],
add from typing import Optional at the top of the file, and run/update the test
at
packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
to accept an Optional[str] return (adjust assertions accordingly) so callers
that already handle None continue to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2)
677-715
: Guard token_usage math against None to avoid TypeError.If a field exists with value 0, the current “or”-chain can yield None and break addition.
- if token_usage is not None: - prompt_tokens = ( - token_usage.get("prompt_tokens") - or token_usage.get("input_token_count") - or token_usage.get("input_tokens") - ) - completion_tokens = ( - token_usage.get("completion_tokens") - or token_usage.get("generated_token_count") - or token_usage.get("output_tokens") - ) - total_tokens = token_usage.get("total_tokens") or ( - prompt_tokens + completion_tokens - ) + if token_usage is not None: + def _first_present(d, keys): + for k in keys: + v = d.get(k) + if v is not None: + return v + return 0 + prompt_tokens = int(_first_present(token_usage, ("prompt_tokens", "input_token_count", "input_tokens"))) + completion_tokens = int(_first_present(token_usage, ("completion_tokens", "generated_token_count", "output_tokens"))) + total_tokens = int(token_usage.get("total_tokens") or (prompt_tokens + completion_tokens))
270-288
: Avoid private _RUNTIME_CONTEXT; use public detach and suppress expected errors.Private API may change across OTel versions.
- try: - # Use the runtime context directly to avoid error logging from context_api.detach() - from opentelemetry.context import _RUNTIME_CONTEXT - - _RUNTIME_CONTEXT.detach(token) - except Exception: - # Context detach can fail in async scenarios when tokens are created in different contexts - # This includes ValueError, RuntimeError, and other context-related exceptions - # This is expected behavior and doesn't affect the correct span hierarchy - # - # Common scenarios where this happens: - # 1. Token created in one async task/thread, detached in another - # 2. Context was already detached by another process - # 3. Token became invalid due to context switching - # 4. Race conditions in highly concurrent scenarios - # - # This is safe to ignore as the span itself was properly ended - # and the tracing data is correctly captured. - pass + from contextlib import suppress + with suppress(Exception): + context_api.detach(token)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
315-373
: Bug: type check usesis str
; should use isinstance. Also guard generation_info access.Prevents misclassification and AttributeError.
- if generation.message.content is str: + if isinstance(generation.message.content, str): @@ - if generation.generation_info.get("finish_reason"): + if getattr(generation, "generation_info", {}) and generation.generation_info.get("finish_reason"): _set_span_attribute(
🧹 Nitpick comments (6)
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py (1)
72-97
: Optional: mock time here too for consistency.This test only checks call count, but mocking time like other tests keeps timing fully deterministic across the suite.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
170-207
: Type: avoid implicit Optional and add light debug on metadata lookup failures.operation_type should be Optional[str]; keep narrow excepts but don’t silently swallow everything forever.
-def _create_shared_attributes( - self, span, model_name: str, operation_type: str = None, is_streaming: bool = False - ) -> dict: +def _create_shared_attributes( + self, span, model_name: str, operation_type: Optional[str] = None, is_streaming: bool = False + ) -> dict: @@ - except (AttributeError, KeyError, TypeError): - pass + except (AttributeError, KeyError, TypeError): + # Expected when context/association_properties is unavailable + server_address = None
583-622
: Silence unused-arg warnings without breaking LangChain’s callback API; move import to module scope.Keep parameter names to retain compatibility, but mark them as intentionally unused and avoid per-call import.
-from opentelemetry.instrumentation.langchain.span_utils import _get_unified_unknown_model +from opentelemetry.instrumentation.langchain.span_utils import _get_unified_unknown_model @@ - def on_llm_new_token( + def on_llm_new_token( self, - token: str, + token: str, # noqa: ARG002 *, - chunk: Optional[Union[GenerationChunk, ChatGenerationChunk]] = None, + chunk: Optional[Union[GenerationChunk, ChatGenerationChunk]] = None, # noqa: ARG002 run_id: UUID, - parent_run_id: Optional[UUID] = None, - **kwargs: Any, + parent_run_id: Optional[UUID] = None, # noqa: ARG002 + **kwargs: Any, # noqa: ARG002 ) -> Any: @@ - from opentelemetry.instrumentation.langchain.span_utils import _get_unified_unknown_model - model_name = (
637-676
: Include 'model' in the first llm_output extraction to avoid unnecessary fallbacks.Saves a pass when providers set llm_output["model"].
- if response.llm_output is not None: - model_name = response.llm_output.get( - "model_name" - ) or response.llm_output.get("model_id") + if response.llm_output is not None: + model_name = ( + response.llm_output.get("model_name") + or response.llm_output.get("model_id") + or response.llm_output.get("model") + )packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (2)
58-90
: Case-insensitive class-name heuristics.Lowercase class_name to avoid missing vendor due to casing differences.
- # Fallback to class name-based inference - if class_name: - if "ChatDeepSeek" in class_name: + # Fallback to class name-based inference + if class_name: + class_lower = class_name.lower() + if "chatdeepseek" in class_lower: return "deepseek-unknown" - elif "ChatOpenAI" in class_name: + elif "chatopenai" in class_lower: return "gpt-unknown" - elif "ChatAnthropic" in class_name: + elif "chatanthropic" in class_lower: return "claude-unknown" - elif "ChatCohere" in class_name: + elif "chatcohere" in class_lower: return "command-unknown" - elif "ChatOllama" in class_name: + elif "chatollama" in class_lower: return "ollama-unknown"
92-155
: _extract_model_name_from_request: remove unused param and simplify dict access.span_holder isn’t used; replace key checks with .get to satisfy Ruff and readability.
-def _extract_model_name_from_request( - kwargs, span_holder: SpanHolder, serialized: Optional[dict] = None, metadata: Optional[dict] = None -) -> str: +def _extract_model_name_from_request( + kwargs, serialized: Optional[dict] = None, metadata: Optional[dict] = None +) -> str: @@ - if "kwargs" in kwargs: - nested_kwargs = kwargs["kwargs"] + nested_kwargs = kwargs.get("kwargs") or {} for model_tag in ("model", "model_id", "model_name"): - if (model := nested_kwargs.get(model_tag)) is not None: + if (model := nested_kwargs.get(model_tag)) is not None: return model @@ - if serialized: - if "kwargs" in serialized: - ser_kwargs = serialized["kwargs"] + if serialized: + ser_kwargs = serialized.get("kwargs") or {} for model_tag in ("model", "model_id", "model_name"): - if (model := ser_kwargs.get(model_tag)) is not None: + if (model := ser_kwargs.get(model_tag)) is not None: return model @@ - if "id" in serialized and serialized["id"]: - class_name = serialized["id"][-1] if isinstance(serialized["id"], list) else str(serialized["id"]) + if serialized.get("id"): + class_name = serialized["id"][-1] if isinstance(serialized["id"], list) else str(serialized["id"]) return _infer_model_from_class_name(class_name, serialized)And update callers:
- model = _extract_model_name_from_request(kwargs, span_holder, serialized, metadata) + model = _extract_model_name_from_request(kwargs, serialized, metadata)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
(9 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
(5 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-langchain/tests/test_model_extraction.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (4)
TraceloopCallbackHandler
(147-1030)on_llm_new_token
(584-621)on_llm_end
(624-747)on_llm_error
(880-889)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
SpanHolder
(27-37)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (4)
SpanHolder
(27-37)set_request_params
(177-224)_get_unified_unknown_model
(58-89)_set_span_attribute
(53-55)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes
(64-261)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/utils.py (2)
should_emit_events
(80-87)dont_throw
(55-77)
🪛 Ruff (0.13.1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
171-171: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
586-586: Unused method argument: token
(ARG002)
588-588: Unused method argument: chunk
(ARG002)
590-590: Unused method argument: parent_run_id
(ARG002)
591-591: Unused method argument: kwargs
(ARG002)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py
93-93: Unused function argument: span_holder
(ARG001)
150-150: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
167-167: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (10)
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py (4)
34-71
: Good: time is mocked; resolves prior flakiness concern.Mocking time in both callback_handler and span_utils modules stabilizes TTFT assertions. This addresses the earlier CI fragility raised about time.sleep().
98-134
: Choices metric coverage looks good.Checks count and core attrs; no issues.
183-213
: Exception metric assertion is correct.Covers error.type and model fallback.
215-242
: Good negative test for streaming metric.Ensures no record without first token.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
546-547
: Fix validated: set_request_params now receives real kwargs.This resolves the earlier bug where serialized['kwargs'] dropped invocation_params.
Also applies to: 576-577
846-876
: Exception metric path: solid.Includes vendor/model attrs and error.type; aligns with tests.
147-166
: Constructor surface expansion — call sites updated.
All TraceloopCallbackHandler(...) instantiations pass the seven parameters in the correct order: tracer, duration_histogram, token_histogram, ttft_histogram, streaming_time_histogram, choices_counter, exception_counter.
Verified locations:
- packages/opentelemetry-instrumentation-langchain/tests/test_third_party_models.py:21–23
- packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py:24–26
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/init.py:106–108
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (3)
37-37
: SpanHolder: first_token_time addition LGTM.Enables TTFT/streaming metrics.
476-508
: Return type corrected to Optional[str].Matches behavior and downstream checks.
510-514
: Association metadata extractor: concise and correct.No issues.
def test_streaming_time_to_generate_metric(self): | ||
"""Test that streaming time to generate metric is recorded.""" | ||
run_id = uuid4() | ||
mock_span = Mock(spec=Span) | ||
mock_span.attributes = {SpanAttributes.LLM_SYSTEM: "Langchain"} | ||
|
||
with patch('opentelemetry.instrumentation.langchain.callback_handler.time.time') as mock_time, \ | ||
patch('opentelemetry.instrumentation.langchain.span_utils.time.time') as mock_span_time: | ||
|
||
start_time = 1000.0 | ||
mock_time.return_value = start_time | ||
mock_span_time.return_value = start_time | ||
|
||
span_holder = SpanHolder( | ||
span=mock_span, | ||
token=None, | ||
context=None, | ||
children=[], | ||
workflow_name="test", | ||
entity_name="test", | ||
entity_path="test", | ||
start_time=start_time | ||
) | ||
self.handler.spans[run_id] = span_holder | ||
|
||
first_token_time = start_time + 0.05 | ||
mock_time.return_value = first_token_time | ||
mock_span_time.return_value = first_token_time | ||
self.handler.on_llm_new_token("Hello", run_id=run_id) | ||
|
||
completion_time = first_token_time + 0.05 | ||
mock_time.return_value = completion_time | ||
mock_span_time.return_value = completion_time | ||
llm_result = LLMResult( | ||
generations=[[Generation(text="Hello world")]], | ||
llm_output={"model_name": "test-model"} | ||
) | ||
|
||
self.handler.on_llm_end(llm_result, run_id=run_id) | ||
|
||
self.streaming_time_histogram.record.assert_called_once() | ||
args = self.streaming_time_histogram.record.call_args | ||
streaming_time = args[0][0] | ||
assert abs(streaming_time - 0.05) < 0.001, ( | ||
f"Streaming time should be approximately 0.05 seconds, " | ||
f"got {streaming_time}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Also assert streaming flag in metric attributes.
Verify attributes include stream=True to ensure streaming path is exercised.
Apply this minimal assertion:
@@
- self.streaming_time_histogram.record.assert_called_once()
- args = self.streaming_time_histogram.record.call_args
+ self.streaming_time_histogram.record.assert_called_once()
+ args = self.streaming_time_histogram.record.call_args
streaming_time = args[0][0]
@@
- )
+ )
+ assert args[1]["attributes"].get("stream") is True
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def test_streaming_time_to_generate_metric(self): | |
"""Test that streaming time to generate metric is recorded.""" | |
run_id = uuid4() | |
mock_span = Mock(spec=Span) | |
mock_span.attributes = {SpanAttributes.LLM_SYSTEM: "Langchain"} | |
with patch('opentelemetry.instrumentation.langchain.callback_handler.time.time') as mock_time, \ | |
patch('opentelemetry.instrumentation.langchain.span_utils.time.time') as mock_span_time: | |
start_time = 1000.0 | |
mock_time.return_value = start_time | |
mock_span_time.return_value = start_time | |
span_holder = SpanHolder( | |
span=mock_span, | |
token=None, | |
context=None, | |
children=[], | |
workflow_name="test", | |
entity_name="test", | |
entity_path="test", | |
start_time=start_time | |
) | |
self.handler.spans[run_id] = span_holder | |
first_token_time = start_time + 0.05 | |
mock_time.return_value = first_token_time | |
mock_span_time.return_value = first_token_time | |
self.handler.on_llm_new_token("Hello", run_id=run_id) | |
completion_time = first_token_time + 0.05 | |
mock_time.return_value = completion_time | |
mock_span_time.return_value = completion_time | |
llm_result = LLMResult( | |
generations=[[Generation(text="Hello world")]], | |
llm_output={"model_name": "test-model"} | |
) | |
self.handler.on_llm_end(llm_result, run_id=run_id) | |
self.streaming_time_histogram.record.assert_called_once() | |
args = self.streaming_time_histogram.record.call_args | |
streaming_time = args[0][0] | |
assert abs(streaming_time - 0.05) < 0.001, ( | |
f"Streaming time should be approximately 0.05 seconds, " | |
f"got {streaming_time}" | |
) | |
def test_streaming_time_to_generate_metric(self): | |
"""Test that streaming time to generate metric is recorded.""" | |
run_id = uuid4() | |
mock_span = Mock(spec=Span) | |
mock_span.attributes = {SpanAttributes.LLM_SYSTEM: "Langchain"} | |
with patch('opentelemetry.instrumentation.langchain.callback_handler.time.time') as mock_time, \ | |
patch('opentelemetry.instrumentation.langchain.span_utils.time.time') as mock_span_time: | |
start_time = 1000.0 | |
mock_time.return_value = start_time | |
mock_span_time.return_value = start_time | |
span_holder = SpanHolder( | |
span=mock_span, | |
token=None, | |
context=None, | |
children=[], | |
workflow_name="test", | |
entity_name="test", | |
entity_path="test", | |
start_time=start_time | |
) | |
self.handler.spans[run_id] = span_holder | |
first_token_time = start_time + 0.05 | |
mock_time.return_value = first_token_time | |
mock_span_time.return_value = first_token_time | |
self.handler.on_llm_new_token("Hello", run_id=run_id) | |
completion_time = first_token_time + 0.05 | |
mock_time.return_value = completion_time | |
mock_span_time.return_value = completion_time | |
llm_result = LLMResult( | |
generations=[[Generation(text="Hello world")]], | |
llm_output={"model_name": "test-model"} | |
) | |
self.handler.on_llm_end(llm_result, run_id=run_id) | |
self.streaming_time_histogram.record.assert_called_once() | |
args = self.streaming_time_histogram.record.call_args | |
streaming_time = args[0][0] | |
assert abs(streaming_time - 0.05) < 0.001, ( | |
f"Streaming time should be approximately 0.05 seconds, " | |
f"got {streaming_time}" | |
) | |
assert args[1]["attributes"].get("stream") is True |
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/tests/test_streaming_metrics.py
around lines 135 to 181, the test verifies the recorded streaming time but does
not assert that the metric attributes indicate streaming; add a minimal
assertion after retrieving the histogram.record call args to check that the
recorded metric's attributes include stream=True (i.e., extract the attributes
from the record call kwargs and assert attributes.get("stream") is True) so the
test verifies the streaming path is exercised.
@nirga Please review this PR related to the LangChain instrumentation bugs. |
…ty integration
feat(instrumentation): ...
orfix(instrumentation): ...
.This PR mainly fixes two issues related to langchain third party integration, specifically the API
ChatDeepSeek
:TTFT
and streaming generation timeRepoduce code
Important
Fixes model name detection and adds missing metrics for Langchain's
ChatDeepSeek
integration, including TTFT and streaming generation time.TTFT
and streaming generation time forChatDeepSeek
incallback_handler.py
._create_shared_attributes()
inTraceloopCallbackHandler
to create shared attributes for metrics.on_llm_new_token()
andon_llm_end()
to track TTFT and streaming metrics.TTFT
and streaming time, counters for generation choices and exceptions in__init__.py
.set_request_params()
inspan_utils.py
to enhance model extraction.test_model_extraction.py
,test_streaming_metrics.py
, andtest_third_party_models.py
.test_langchain_metrics.py
.This description was created by
for 15aff50. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit