-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: remove broken tracing middleware #3723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
674f63e
to
88c1687
Compare
441a0bf
to
556bdd5
Compare
556bdd5
to
b3b9c93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we want to remove the broken tracing middleware. Can you clarify with what we should replace it? Can you explain how do you intend to split your work and PRs that will follow?
Thanks!
Yeah, I am trying to find a way to make this change that makes sense, but its kinda a headache. The middleware we have right now interferes with other tracing. Would you prefer that I just replace all the tracing all at once? I did a lot of testing, and discovered that we can use the auto instrumentation, but we need to do it programmatically due to a known quirk of using otel with uvicorn. This would mean that we would need telemetry installed and enabled by default, but we can disable it with environment variables. I am beginning to stage those WIP changes here: #3733 How do we feel about this design pattern? I made this comment in community the discord as well, I am happy to link you. My goal with this PR is to make the telemetry we have work well enough. Then we can migrate services to the new pattern we want one service at a time. Once that is done, we can deprecate the telemetry API. Once we merge this, and I finish implementing what is in the next PR, I can file tickets upstream for each place we capture custom instrumentation, and let you all help me with the migration. Its also an opportunity to go over what we capture with scrutiny to make sure what custom info we capture makes sense and isn't duplicated elsewhere. |
# 2. If it has no parent (implicit root span from FastAPI instrumentation) | ||
is_root_span = span.attributes.get(LOCAL_ROOT_SPAN_MARKER) or parent_span_id is None | ||
root_span_id_value = span_id if is_root_span else None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehhuang take a look at this. I was able to get the integration test to work by doing this, but I am not 100% sure its right. I'd appreciate if you took a look and confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't either. Can we just kill this sqlite span processor alltogether and add tests analogous to those in test_*_telemetry but against OTEL?
48e23c7
to
6d92d69
Compare
6d92d69
to
f051458
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is looking good, but +1 on leaving out unrelated changes.
What does this PR do?
Removes the broken tracing middleware from llama stack core. This middleware duplicates what otel already does for fast api by default, but breaks tracing by incorrectly handling w3 trace headers.
Test Plan
Telemetry is currently not working. An attempt to run this by hand was made, but this is not the only thing that needs to change to make the telemetry work in this project, so traces did not show up. More changes to come to address this.