-
Notifications
You must be signed in to change notification settings - Fork 808
fix(langchain): report token usage histogram #3059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(langchain): report token usage histogram #3059
Conversation
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to bed4136 in 1 minute and 52 seconds. Click for details.
- Reviewed
72
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
4
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:207
- Draft comment:
In _extract_model_name_from_response_metadata, consider assigning the result of generation.message.response_metadata.get('model_name') to a variable to avoid calling get() twice. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The suggestion is valid - calling get() twice is slightly inefficient. However, this is a very minor optimization that would have negligible performance impact. The code is also more readable in its current form since it clearly shows what's being checked and returned. The change would require introducing a new variable just to save one dictionary lookup. Am I being too dismissive of small optimizations? In a high-performance system, every small inefficiency can add up. While micro-optimizations can matter at scale, this particular case is not in a hot path and the overhead of an extra dictionary lookup is trivial compared to the network/API calls involved in LLM operations. The comment should be deleted. While technically correct, the suggested change would make the code more verbose for a negligible performance benefit.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:219
- Draft comment:
Document the new parameters (token_histogram, record_token_usage, model_name) in _set_chat_response for clarity and future maintenance. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Documentation can be helpful, but we need to consider if this comment meets our criteria. The parameters have fairly clear purposes from their names and usage. token_histogram is for recording metrics, record_token_usage is a flag for whether to record tokens, and model_name is used as a label. The function is internal (starts with _) which typically means detailed docs aren't required. The parameter names are quite self-explanatory and the function is internal. Documentation might be redundant here. However, record_token_usage's relationship with token_usage being None is a bit subtle. While there is some subtle logic, the parameter names and usage are clear enough for an internal function. Adding documentation would not substantially improve code quality or maintainability. The comment should be deleted. The parameters are sufficiently self-documenting for an internal function, and requesting documentation does not meet our bar for required code changes.
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:704
- Draft comment:
When checking for token_usage, an empty dict (which is falsy) might be passed through. Consider normalizing token_usage (e.g., usingtoken_usage = token_usage or None
) so that the fallback mechanism (record_token_usage) triggers correctly if token_usage is empty. - Reason this comment was not posted:
Comment was on unchanged code.
4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:755
- Draft comment:
Update the _set_chat_response call to ensure fallback token usage recording is applied only when appropriate. Verify that passing 'token_usage is None' correctly handles cases where token_usage might be empty rather than None. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the author to verify the handling of 'token_usage is None', which is not allowed as per the rules. It is not making a specific code suggestion or asking for a test to be written. It is more of a request for confirmation, which is not permitted.
Workflow ID: wflow_rITvd9hvpQKqpGrB
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LakshmiPriyaSujith! Can you add a test for this? (similar to metrics tests we have in other packages)
Hi @nirga, I have added the test. Could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LakshmiPriyaSujith and sorry for the delay here!
@LakshmiPriyaSujith can you rebase this? |
Hi @nirga I have updated the branch. Could you please help merge the PR? |
WalkthroughThe changes introduce a fallback mechanism for extracting the model name from LLM responses, update metric recording logic to include additional attributes and conditions, and add comprehensive tests—including a new test cassette—to ensure metrics are correctly reported even when the LLM output is Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LLMChain
participant CallbackHandler
participant SpanUtils
User->>LLMChain: Run chain
LLMChain->>CallbackHandler: on_llm_end(response)
CallbackHandler->>SpanUtils: set_chat_response_usage(span, response, histogram, record_token_usage, model_name)
SpanUtils->>SpanUtils: Extract token counts & model name (with fallback)
SpanUtils->>SpanUtils: Record metrics if applicable
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🧰 Additional context used🧬 Code Graph Analysis (2)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (11)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Nir Gazit <[email protected]>
feat(instrumentation): ...
orfix(instrumentation): ...
.This PR introduces an alternative approach for calculating token usage with Langchain. The current instrumentation processes LLM responses differently to set span attributes and record token usage metrics via a histogram. We encountered cases where the
llm_output
attribute used to derive token usage metrics, was missing from the LLM response. The new implementation includes a fallback mechanism that computes token usage following the logic used to populate span attributes for token usage.Important
Introduces a fallback mechanism for token usage calculation and model name extraction in Langchain instrumentation when
llm_output
is missing._set_chat_response()
to calculate token usage whenllm_output
is missing.token_histogram
in_set_chat_response()
._extract_model_name_from_response_metadata()
if not present inllm_output
._extract_model_name_from_response_metadata()
to extract model name from response metadata._set_chat_response()
to includetoken_histogram
,record_token_usage
, andmodel_name
parameters.on_llm_end()
to use new token usage calculation and model name extraction logic.This description was created by
for bed4136. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Bug Fixes
Tests