Skip to content

feat: add additional tracing on websockets #13332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

quinna-h
Copy link
Contributor

@quinna-h quinna-h commented May 5, 2025

Add websocket tracing according to RFC

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@quinna-h quinna-h changed the title add tracing on websockets add additional tracing on websockets May 5, 2025
Copy link
Contributor

github-actions bot commented May 5, 2025

CODEOWNERS have been resolved as:

tests/snapshots/tests.contrib.fastapi.test_fastapi.test_long_running_websocket_session.json  @DataDog/apm-python
tests/snapshots/tests.contrib.fastapi.test_fastapi.test_websocket_not_separate_traces.json  @DataDog/apm-python
tests/snapshots/tests.contrib.fastapi.test_fastapi.test_websocket_sampling_not_inherited.json  @DataDog/apm-python
ddtrace/contrib/internal/asgi/middleware.py                             @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/django/patch.py                                @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/fastapi/patch.py                               @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/starlette/patch.py                             @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/fastapi/app.py                                            @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/fastapi/test_fastapi.py                                   @DataDog/apm-core-python @DataDog/apm-idm-python
tests/snapshots/tests.contrib.fastapi.test_fastapi.test_traced_websocket.json  @DataDog/apm-python

Copy link
Contributor

github-actions bot commented May 5, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 276 ± 4 ms.

The average import time from base is: 282 ± 5 ms.

The import time difference between this PR and base is: -6.0 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.828 ms (1.03%)
ddtrace.bootstrap.sitecustomize 1.894 ms (0.69%)
ddtrace.bootstrap.preload 1.841 ms (0.67%)
ddtrace.internal.remoteconfig.client 0.744 ms (0.27%)
multiprocessing.sharedctypes 0.105 ms (0.04%)
multiprocessing.heap 0.105 ms (0.04%)
mmap 0.105 ms (0.04%)
ddtrace.internal.products 0.099 ms (0.04%)
importlib.metadata 0.099 ms (0.04%)
importlib.abc 0.099 ms (0.04%)
importlib.resources 0.099 ms (0.04%)
ddtrace._trace.trace_handlers 0.054 ms (0.02%)
ddtrace 0.934 ms (0.34%)
ddtrace.settings._config 0.146 ms (0.05%)
ddtrace.internal._file_queue 0.146 ms (0.05%)
secrets 0.146 ms (0.05%)
hmac 0.146 ms (0.05%)
_hashlib 0.146 ms (0.05%)
ddtrace._logger 0.105 ms (0.04%)
ddtrace.internal.telemetry 0.105 ms (0.04%)
ddtrace.internal.telemetry.writer 0.105 ms (0.04%)
http.client 0.105 ms (0.04%)
email.parser 0.105 ms (0.04%)
email.feedparser 0.105 ms (0.04%)
email._policybase 0.105 ms (0.04%)
email.utils 0.105 ms (0.04%)
datetime 0.105 ms (0.04%)
_datetime 0.105 ms (0.04%)
ddtrace.internal._unpatched 0.024 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented May 5, 2025

Benchmarks

Benchmark execution time: 2025-06-04 18:10:21

Comparing candidate commit c8daade in PR branch quinna.halim/websockets-tracing-asgi with baseline commit 71dd7f3 in branch main.

Found 6 performance improvements and 4 performance regressions! Performance is the same for 502 metrics, 4 unstable metrics.

scenario:djangosimple-profiler

  • 🟩 execution_time [-2.165ms; -2.049ms] or [-12.264%; -11.608%]

scenario:djangosimple-tracer-and-profiler

  • 🟩 execution_time [-2.758ms; -2.578ms] or [-11.181%; -10.452%]

scenario:flasksimple-profiler

  • 🟩 execution_time [-179.894µs; -172.154µs] or [-8.353%; -7.993%]

scenario:iastdjangostartup-appsec

  • 🟩 execution_time [-1.095s; -0.998s] or [-55.575%; -50.646%]

scenario:iastdjangostartup-iast

  • 🟩 execution_time [-1.476s; -1.316s] or [-61.990%; -55.262%]

scenario:iastdjangostartup-tracer

  • 🟩 execution_time [-875.659ms; -774.581ms] or [-49.762%; -44.018%]

scenario:startup-ddtrace_run

  • 🟥 execution_time [+72.520ms; +76.225ms] or [+11.912%; +12.520%]

scenario:startup-import_ddtrace_auto

  • 🟥 execution_time [+61.566ms; +64.871ms] or [+16.155%; +17.022%]

scenario:startup-import_ddtrace_auto_django

  • 🟥 execution_time [+88.114ms; +91.989ms] or [+17.974%; +18.765%]

scenario:startup-import_ddtrace_auto_flask

  • 🟥 execution_time [+81.519ms; +85.992ms] or [+15.056%; +15.882%]

ws_span.set_metric("_dd.dm.inherited", 1)
parent_span = self.tracer.current_root_span()
if parent_span is not None:
ws_span.set_tag_str("_dd.dm.service", parent_span.service)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be the handshake span's service and resource

@quinna-h quinna-h force-pushed the quinna.halim/websockets-tracing-asgi branch from 47131b0 to 2cb44a0 Compare May 8, 2025 18:58
ipaddress.ip_address(client_ip) # validate ip address
span.set_tag_str("network.client.ip", client_ip)
except ValueError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Code Quality Violation

silent exception (...read more)

Using the pass statement in an exception block ignores the exception. Exceptions should never be ignored. Instead, the user must add code to notify an exception occurred and attempt to handle it or recover from it.

The exception to this rule is the use of StopIteration or StopAsyncIteration when implementing a custom iterator (as those errors are used to acknowledge the end of a successful iteration).

View in Datadog  Leave us feedback  Documentation

ipaddress.ip_address(client_ip) # validate ip address
span.set_tag_str("network.client.ip", client_ip)
except ValueError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Code Quality Violation

silent exception (...read more)

Using the pass statement in an exception block ignores the exception. Exceptions should never be ignored. Instead, the user must add code to notify an exception occurred and attempt to handle it or recover from it.

The exception to this rule is the use of StopIteration or StopAsyncIteration when implementing a custom iterator (as those errors are used to acknowledge the end of a successful iteration).

View in Datadog  Leave us feedback  Documentation

@quinna-h quinna-h changed the title add additional tracing on websockets feat: add increased tracing on websockets Jun 2, 2025
@quinna-h quinna-h changed the title feat: add increased tracing on websockets feat: add additional tracing on websockets Jun 2, 2025
@quinna-h quinna-h marked this pull request as ready for review June 2, 2025 20:49
@quinna-h quinna-h requested review from a team as code owners June 2, 2025 20:49
@quinna-h quinna-h requested review from wantsui, Yun-Kim and juanjux June 2, 2025 20:49
self.integration_config._websocket_messages_separate
and self.integration_config._asgi_websockets_inherit_sampling
):
recv_span.set_metric("_dd.dm.inherited", 1)
Copy link

@amarziali amarziali Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the need, in this case, to lock the decision maker of the recv_span to the one of the handshake. This translates in words to have the same _dd.p.dm tag. I see that's handled kind of here (

if not span.context._meta.get(SAMPLING_DECISION_TRACE_TAG_KEY):
)
Ideally we should set the same sampling mechanism and priority also in order not to have inconsistencies. The quick thing is just to set that meta like recv_span._meta["_dd.p.dm"] = global_root_span._meta["_dd.p.dm"]

ipaddress.ip_address(client_ip) # validate ip address
span.set_tag_str("network.client.ip", client_ip)
except ValueError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Code Quality Violation

silent exception (...read more)

Using the pass statement in an exception block ignores the exception. Exceptions should never be ignored. Instead, the user must add code to notify an exception occurred and attempt to handle it or recover from it.

The exception to this rule is the use of StopIteration or StopAsyncIteration when implementing a custom iterator (as those errors are used to acknowledge the end of a successful iteration).

View in Datadog  Leave us feedback  Documentation

"resource": "websocket /ws",
"trace_id": 2,
"span_id": 1,
"parent_id": 0,
"type": "web",
"error": 0,
"meta": {
"_dd.base_service": "tests.contrib.fastapi",
"_dd.p.dm": "-0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the fastapi.request span also have the sampling decision maker set here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means default (unset). Why it differs from the value that has the websocket.receive above? If you want to set it either it has to be manually kept in the test either you can set a rule for it

@quinna-h quinna-h force-pushed the quinna.halim/websockets-tracing-asgi branch from d02b4a1 to ccdb2bf Compare June 4, 2025 17:14

@pytest.mark.subprocess(env=dict(DD_TRACE_WEBSOCKET_MESSAGES_ENABLED="true"))
@snapshot(ignores=["meta._dd.span_links", "metrics.websocket.message.length"])
# TODO: look into why one message is only 26 chars
def test_traced_websocket(test_spans, snapshot_app):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: consolidate these next three tests using pytest.mark.parametrize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants