Skip to content

Conversation

grs
Copy link
Contributor

@grs grs commented Oct 7, 2025

What does this PR do?

Adds traces around tool execution and mcp tool listing for better observability.

Closes #3108

Test Plan

Manually examined traces in jaeger to verify the added information was available.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 7, 2025
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should hold off on this given we might be deprecating the entire telemetry API cc @iamemilio @ashwinb

@grs
Copy link
Contributor Author

grs commented Oct 7, 2025

I think we should hold off on this given we might be deprecating the entire telemetry API cc @iamemilio @ashwinb

I believe this is independent of the telemetry API. Allowing tracing to be collected from Llama Stack seems always likely to be a requirement. I believe most users will use the otel standard to collect them and not use Llama Stack's own telemetry API. I do believe they will want tracing though.

This PR adds more detail to the tracing from the responses API as described. This is similar to what was emitted by the old agents API.

@cdoern
Copy link
Contributor

cdoern commented Oct 7, 2025

makes sense @grs , I just had to refactor the agents code though for this reason (#3705) -- the tracing utils depend on some of the telemetry provider functionality/code. I'd be wary of merging something depending on that code if we know we are going to delete it all and just depend on an OTEL middleware.

I could be wrong here though, so I will let others chime in.

@cdoern
Copy link
Contributor

cdoern commented Oct 7, 2025

especially with pieces like #3723 for example

@grs
Copy link
Contributor Author

grs commented Oct 7, 2025

@cdoern the additions here only use tracing.span(). I believe this is safe whether telemetry is enabled or not. Unlike tracing.get_current_span(), span() will always return an object, and if telemetry is not configured, the context manager will simply not do anything.

Of course if tracing.span() is removed in favour of some other call, then the additions in this PR would also need to be swapped over to the new function. To me that seems like something that would be done more generally across the codebase at the point a change is made. Migrating the additions from this PR will be simple as they all use only the span() function, simpler than the agent_instance.py which uses more parts of the current tracing package.

If there is an imminent change to tracing about to merge, then merging that and requiring this to be rebased and fixed would certainly make sense. It doesn't seem that there is though, #3723 is not yet complete, and so to my way of thinking, the very small addition from this current PR does not make a subsequent change any harder.

@iamemilio
Copy link
Contributor

Yes, sorry. The old telemetry API is going to be deprecated and I am going to replace manual instrumentation with the OTEL python SDK

@iamemilio
Copy link
Contributor

That said, if you want to merge this, I plan to replace all instances of manual instrumentation with open telemetry native code, so I can replace this as well.

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this is good. when the telemetry rewrite arrives this can be migrated to use direct OTEL. we cannot let new observability additions stop

@ashwinb ashwinb merged commit 26fd5db into llamastack:main Oct 9, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add traces for tool calling when using responses API

4 participants