-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Problem Statment
When using the sentry Crons feature errors produced during execution of a monitored task should be associated to the monitor ID. We should also associate errors and transactions to the specific check-in that is reported.
What exists now?
Currently monitors support linking to errors that occur during execution via the monitor.id
tag.
This tag is promoted from the MonitorContext
.
Right now we are manually setting up this context in a few different places.
-
When a checkin fails it sets the context on the created “checkin failed” error event
🤔 Note that it also manually sets the
monitor.id
in the tags. I believe this may be a mistake since the context promotes theid
to a tag -
The sentry monitor utils that are used in our celery tasks set the monitor context to associate errors to the monitor
So what’s the problem
This works but it has a few issues
- Checkin failure events do not track which particular checkin triggered the failure
- Users have to manually set the monitors context to associate an error to a particular monitor
- Even when the
monitor.id
context is manually set, there is no association of error to checkin.
Proposed checkin association strategy
We should use our Trace ID to associate checkins to errors and transactions, realistically this means doing the following
-
A new
trace_id
UUID column should be added to theMonitorCheckin
table. This should be indexed as we will use it to lookup associated checkins -
The
monitor
context should continue to exist on errors so that we can easily query for errors that we know are part of a monitor -
The
trace_id
should be provided as part of the checkin API request. It will not be required🤔 Open Question: Should we generate a trace ID if they no not pass it, and if so should we return that as part of the result API response.
Proposed integration with sentry-cli
and SDKs
Both the checkin and SDKs producing errors and traces must be aware of the monitor context and the Trace ID being used for the monitor run. Here’s what needs to happen
- The
sentry-cli
should generate a Trace ID and send that with the checkins. - The
sentry-cli monitor run <monitor_id> -- <command>
should set two environment variables during execution of<command>
SENTRY_TRACE_ID
: The Trace ID generated by the CLI
feat(monitors): Pass SENTRY_TRACE_ID down execution path sentry-cli#1441SENTRY_MONITOR_ID
: The Monitor UUID
feat(monitors): Pass SENTRY_MONITOR_ID to executing process sentry-cli#1438
- Sentry SDKs should be updated to understand both of these environment variables
SENTRY_TRACE_ID
should be used in place of the SDK generating it’s own trace IDSENTRY_MONITOR_ID
should be used to setup the monitor context of any events that occur during execution
feat(crons): Add CronMonitorIntegration sentry-python#1866
Updates
- At the moment propagating a
SENTRY_TRACE_ID
is more involved than the scope of this ticket, see Crons: Linking monitors to errors #43647 (comment)
Metadata
Metadata
Assignees
Projects
Status
Status