You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/slo-policy.md
+9-10Lines changed: 9 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ toc_max_heading_level: 2
4
4
5
5
# Service Level Objective (SLO) Policy
6
6
7
-
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
7
+
We are committed to providing reliable, high-quality services to our customers.
8
8
9
9
## Incident Classification & Response Times
10
10
@@ -15,14 +15,13 @@ We are committed to providing reliable, high-quality services to our customers.
15
15
|**Severity 3 (Medium)**| Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed |
16
16
|**Severity 4 (Low)**| Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
17
17
18
-
## How We Catch Problems Fast
18
+
## Incident Detection Procedures
19
19
20
-
We've set up several systems to catch issues before you even notice them:
20
+
We've set up several systems to identify incidents:
21
21
22
-
-**Real-time Monitoring**: Our systems watch critical endpoints 24/7 and alert us the moment something goes wrong
23
-
-**Automated Testing**: We regularly test authentication and pipeline workflows to catch issues before they affect you
24
-
-**Error Tracking**: We use tools like Sentry to get instant notifications when errors occur
25
-
-**Support Monitoring**: Our team watches support channels during business hours to catch issues you report
22
+
-**Real-time Monitoring**: We have observability and monitoring on our core infrastructure.
23
+
-**Error Tracking**: We use tools like Sentry to aggregate and produce notifications of errors.
24
+
-**Support Monitoring**: Our team watches support channels during business hours to catch issues you report.
26
25
27
26
## Communication & Transparency
28
27
@@ -34,7 +33,7 @@ Here's what you can expect from us during an incident:
34
33
- We've found the problem and are working on it
35
34
- How bad it is and who's affected
36
35
- When you'll hear from us next
37
-
2.**Regular Updates** (every 2 hours for critical issues)
36
+
2.**Regular Updates**
38
37
- What's happening right now
39
38
- What we're doing to fix it
40
39
- Updated timeline if things change
@@ -45,7 +44,7 @@ Here's what you can expect from us during an incident:
45
44
46
45
### After We Fix It
47
46
48
-
For serious incidents (Severity 1 & 2), we'll publish a full report within 5 business days that includes:
47
+
For serious incidents (Severity 1 & 2), we'll create a Root Cause Analysis that, upon request, will be shared with customers, including:
49
48
50
49
-**What Happened**: Step-by-step timeline of the incident
51
50
-**Who Was Affected**: How many customers and what services were impacted
@@ -58,7 +57,7 @@ For serious incidents (Severity 1 & 2), we'll publish a full report within 5 bus
58
57
This SLO doesn't apply to:
59
58
60
59
- Beta or preview features (they're still experimental)
61
-
- Scheduled maintenance (we'll give you 72 hours notice)
60
+
- Scheduled maintenance
62
61
- Issues outside our control (internet outages, AWS problems, etc.)
63
62
- Problems you caused (wrong configuration, hitting rate limits, etc.)
0 commit comments