Skip to content

Conversation

rithikanarayan
Copy link
Contributor

@rithikanarayan rithikanarayan commented Sep 12, 2025

What does this PR do?

This PR adds logic to extract the trace context of an event that comes from a traced service through AppSync to a Python lambda. When a lambda is invoked from an AppSync API which was called in a RUM-instrumented front-end, the datadog trace context is located under event["request"]["headers"] rather than other locations from different event types. The extraction logic is placed directly in the extract_dd_trace_context function in tracing.py, indexing into the request key in the event, after which the extract_context_from_http_event_or_context function will get the header field and extract datadog context properly.

Motivation

Currently, if a customer has a setup where they use RUM to start a trace that goes through AppSync and triggers a Lambda function, we ask them to write a custom function to extract the trace context from the AppSync event that is passed to the Lambda. See an example of such a function here. It would be simpler for customers and for ourselves to extract the trace context in the lambda tracer layers ourselves, as we do for invocations from other sources (ex. SQS, API Gateway, etc.). This PR does so for the Python tracer.

Based on this ticket, will reduce amount of code changes a customer has to make in order connect traces between RUM and a Lambda function when there is an AWS AppSync API in between them.

Testing Guidelines

Unit tested in test_tracing.py by adding events in tests/event_samples and including the tests in _test_extract_dd_trace_context. The events added, rum-appsync.json, rum-appsync-no-headers.json, and rum-appsync-request-not-dict.json, are based on a sample request from the Datadog APM page of a trace that followed RUM -> AppSync -> Lambda. Some of the sample events are malformed/formatted differently than expected to ensure that exceptions are not raised if we encounter an event with a different format than anticipated. I ran a coverage test using pytest-cov to ensure that all new lines of code from this PR were tested.

Ran integration tests using scripts/run_integration_tests.sh. Added a new input event called appsync.json and updated the snapshot so that integration tests also cover this new supported case.

Uploaded my changes as a layer to AWS and tested whether a trace that followed RUM -> AppSync -> Lambda was shown as connected in the Datadog UI without needing a custom extractor, which is the goal of this PR. A successfully connected trace can be found here. The ARN for the testing version of the Python Lambda layer is arn:aws:lambda:us-east-1:425362996713:layer:Python39-RITHIKA:3 Also used this layer to check distributed tracing in an API Gateway -> Lambda -> SQS -> Lambda setup to ensure that other tracing functionality was not broken by change.

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@rithikanarayan rithikanarayan marked this pull request as ready for review September 24, 2025 18:19
@rithikanarayan rithikanarayan requested review from a team as code owners September 24, 2025 18:19
span_id=67890,
sampling_priority=1,
),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot more testing that we're going to need. When thinking about test coverage, I think about two things.

  1. Coverage. If I were to run a coverage report on our tests, would I see that we've hit every line of code? In the current case, the answer is no.

  2. Logic. If I were to make a change to any line of your code, would a test fail? For example, if I change a in to not in, I would expect to see a failing test. In the current case, this could mean testing that request is a dict but headers is not in it and vice versa.

We also would need a test that makes sure that the authorizer context is never decoded.

decode_authorizer_context=False,
)
else:
context = extract_context_from_lambda_context(lambda_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we still need a test to cover this portion of the logic. We know that the function does not error when the request is not a dict or that there are no headers in it. But we have not confirmed that we'll instead attempt to extract from the lambda context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants