|
2 | 2 | toc_max_heading_level: 2
|
3 | 3 | ---
|
4 | 4 |
|
5 |
| -# Service Level Agreement (SLA) Policy |
| 5 | +# Service Level Objective (SLO) Policy |
6 | 6 |
|
7 |
| -We are committed to providing reliable, high-quality services to our customers. This Service Level Agreement (SLA) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services. |
8 |
| - |
9 |
| -## Service Availability Targets |
10 |
| - |
11 |
| -### Production API Services |
12 |
| - |
13 |
| -- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month) |
14 |
| -- **Measured Services**: |
15 |
| - - Primary API endpoints ([api.prod.app.gruntwork.io](https://api.prod.app.gruntwork.io)) |
16 |
| - - Authentication services |
17 |
| - |
18 |
| -### How We Calculate Uptime |
19 |
| - |
20 |
| -- **Simple Math**: (Total minutes in month - Downtime minutes) ÷ Total minutes in month × 100 |
21 |
| -- **What Counts as Downtime**: Service is completely down or failing for more than 5% of requests |
22 |
| -- **What Doesn't Count**: Scheduled maintenance (we'll tell you 72 hours ahead of time) |
23 |
| - |
24 |
| -## Customer Remedies |
25 |
| - |
26 |
| -While we strive to meet our SLA targets, we recognize that outages impact your business. For paying customers: |
27 |
| - |
28 |
| -### Service Credits |
29 |
| - |
30 |
| -| Monthly Uptime | Service Credit | |
31 |
| -|----------------|----------------| |
32 |
| -| 99.0% - 99.5% | 2.5% | |
33 |
| -| 95.0% - 99.0% | 5% | |
34 |
| -| < 95.0% | 10% | |
35 |
| - |
36 |
| -### Credit Request Process |
37 |
| - |
38 |
| -1. Submit request within 30 days of incident |
39 |
| -2. Include affected services and timeframe |
40 |
| -3. Credits applied to next billing cycle |
41 |
| -4. Maximum credit per month: 10% of monthly service fees |
| 7 | +We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services. |
42 | 8 |
|
43 | 9 | ## Incident Classification & Response Times
|
44 | 10 |
|
45 | 11 | | Severity | Definition | Response Time | Resolution Time | Communication |
|
46 | 12 | |----------|------------|---------------|-----------------|---------------|
|
47 |
| -| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via status page and email | |
48 |
| -| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Status page update within 2 hours | |
49 |
| -| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | Status page update within 4 hours | |
| 13 | +| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email | |
| 14 | +| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | |
| 15 | +| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed | |
50 | 16 | | **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
|
51 | 17 |
|
52 | 18 | ## How We Catch Problems Fast
|
@@ -87,16 +53,9 @@ For serious incidents (Severity 1 & 2), we'll publish a full report within 5 bus
|
87 | 53 | - **How We'll Prevent It**: Specific steps we're taking to avoid this happening again
|
88 | 54 | - **Lessons Learned**: What worked well and what we'll do better next time
|
89 | 55 |
|
90 |
| -### Our Status Page |
91 |
| - |
92 |
| -- **Check Status**: [status.gruntwork.io](https://status.gruntwork.io/) |
93 |
| -- **Live Updates**: Real-time health indicators for all our services |
94 |
| -- **Incident History**: 90 days of past incidents and resolutions |
95 |
| -- **Get Notified**: Subscribe to email or SMS alerts for outages |
96 |
| - |
97 | 56 | ## What's Not Covered
|
98 | 57 |
|
99 |
| -This SLA doesn't apply to: |
| 58 | +This SLO doesn't apply to: |
100 | 59 |
|
101 | 60 | - Beta or preview features (they're still experimental)
|
102 | 61 | - Scheduled maintenance (we'll give you 72 hours notice)
|
|
0 commit comments