Crons: Linking monitors to errors

## Problem Statment

When using the sentry Crons feature errors produced during execution of a monitored task should be associated to the monitor ID. We should also associate errors and transactions to the **specific** check-in that is reported.

## What exists now?

Currently monitors support linking to errors that occur during execution via the **`monitor.id`** tag.

This tag is promoted from the [`MonitorContext`](https://github.com/getsentry/sentry/blob/0e919bd4d1f6fb583f574fdd600cf90762f201d2/src/sentry/interfaces/contexts.py#L129-L132).

Right now we are **manually** setting up this context in a few different places. 

- When a checkin fails it [sets the context](https://github.com/getsentry/sentry/blob/0e919bd4d1f6fb583f574fdd600cf90762f201d2/src/sentry/models/monitor.py#L199) on the created “checkin failed” error event
    


    🤔 Note that it also manually sets the `monitor.id` in the tags. I believe this may be a mistake since the context promotes the `id` to a tag
    

    
- The sentry monitor utils that are used in our celery tasks [set the monitor context](https://github.com/getsentry/sentry/blob/f0577e3ade2a07bf469736ac400f89bbd609aa80/src/sentry/utils/monitors.py#L70-L72) to associate errors to the monitor

## So what’s the problem

This works but it has a few issues

1. Checkin failure events do not track which particular checkin triggered the failure
2. Users have to manually set the monitors context to associate an error to a particular monitor
3. Even when the `monitor.id` context is manually set, there is no association of error to checkin.

### Proposed checkin association strategy

We should use our Trace ID to associate checkins to errors and transactions, realistically this means doing the following

1. A new `trace_id` UUID column should be added to the `MonitorCheckin` table. This should be indexed as we will use it to lookup associated checkins
2. The `monitor` context should continue to exist on errors so that we can easily query for errors that we know are part of a monitor
3. The `trace_id` should be provided as part of the checkin API request. It will not be required 

    🤔 **Open Question:** Should we generate a trace ID if they no not pass it, and if so should we return that as part of the result API response.
    

    

### Proposed integration with `sentry-cli` and SDKs

Both the checkin and SDKs producing errors and traces must be aware of the monitor context and the Trace ID being used for the monitor run. Here’s what needs to happen

1. The `sentry-cli` should generate a Trace ID and send that with the checkins.
2. The `sentry-cli monitor run <monitor_id> -- <command>` should set two environment variables during execution of `<command>`
    1. `SENTRY_TRACE_ID`: The Trace ID generated by the CLI
       https://github.com/getsentry/sentry-cli/pull/1441
    3. `SENTRY_MONITOR_ID`: The Monitor UUID
       https://github.com/getsentry/sentry-cli/pull/1438
4. Sentry SDKs should be updated to understand both of these environment variables
    1. `SENTRY_TRACE_ID` should be used in place of the SDK generating it’s own trace ID
    3. `SENTRY_MONITOR_ID` should be used to setup the monitor context of any events that occur during execution
    https://github.com/getsentry/sentry-python/pull/1866

### Updates

1. At the moment propagating a `SENTRY_TRACE_ID` is more involved than the scope of this ticket, see https://github.com/getsentry/sentry/issues/43647#issuecomment-1411099364

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Crons: Linking monitors to errors #43647

Problem Statment

What exists now?

So what’s the problem

Proposed checkin association strategy

Proposed integration with `sentry-cli` and SDKs

Updates

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Crons: Linking monitors to errors #43647

Description

Problem Statment

What exists now?

So what’s the problem

Proposed checkin association strategy

Proposed integration with sentry-cli and SDKs

Updates

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposed integration with `sentry-cli` and SDKs