Skip to content

Commit c90085e

Browse files
committed
Make updates per live chat.
1 parent c0ab2bb commit c90085e

File tree

1 file changed

+6
-47
lines changed

1 file changed

+6
-47
lines changed

docs/sla-policy.md

Lines changed: 6 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -2,51 +2,17 @@
22
toc_max_heading_level: 2
33
---
44

5-
# Service Level Agreement (SLA) Policy
5+
# Service Level Objective (SLO) Policy
66

7-
We are committed to providing reliable, high-quality services to our customers. This Service Level Agreement (SLA) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
8-
9-
## Service Availability Targets
10-
11-
### Production API Services
12-
13-
- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month)
14-
- **Measured Services**:
15-
- Primary API endpoints ([api.prod.app.gruntwork.io](https://api.prod.app.gruntwork.io))
16-
- Authentication services
17-
18-
### How We Calculate Uptime
19-
20-
- **Simple Math**: (Total minutes in month - Downtime minutes) ÷ Total minutes in month × 100
21-
- **What Counts as Downtime**: Service is completely down or failing for more than 5% of requests
22-
- **What Doesn't Count**: Scheduled maintenance (we'll tell you 72 hours ahead of time)
23-
24-
## Customer Remedies
25-
26-
While we strive to meet our SLA targets, we recognize that outages impact your business. For paying customers:
27-
28-
### Service Credits
29-
30-
| Monthly Uptime | Service Credit |
31-
|----------------|----------------|
32-
| 99.0% - 99.5% | 2.5% |
33-
| 95.0% - 99.0% | 5% |
34-
| < 95.0% | 10% |
35-
36-
### Credit Request Process
37-
38-
1. Submit request within 30 days of incident
39-
2. Include affected services and timeframe
40-
3. Credits applied to next billing cycle
41-
4. Maximum credit per month: 10% of monthly service fees
7+
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
428

439
## Incident Classification & Response Times
4410

4511
| Severity | Definition | Response Time | Resolution Time | Communication |
4612
|----------|------------|---------------|-----------------|---------------|
47-
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via status page and email |
48-
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Status page update within 2 hours |
49-
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | Status page update within 4 hours |
13+
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email |
14+
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
15+
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed |
5016
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
5117

5218
## How We Catch Problems Fast
@@ -87,16 +53,9 @@ For serious incidents (Severity 1 & 2), we'll publish a full report within 5 bus
8753
- **How We'll Prevent It**: Specific steps we're taking to avoid this happening again
8854
- **Lessons Learned**: What worked well and what we'll do better next time
8955

90-
### Our Status Page
91-
92-
- **Check Status**: [status.gruntwork.io](https://status.gruntwork.io/)
93-
- **Live Updates**: Real-time health indicators for all our services
94-
- **Incident History**: 90 days of past incidents and resolutions
95-
- **Get Notified**: Subscribe to email or SMS alerts for outages
96-
9756
## What's Not Covered
9857

99-
This SLA doesn't apply to:
58+
This SLO doesn't apply to:
10059

10160
- Beta or preview features (they're still experimental)
10261
- Scheduled maintenance (we'll give you 72 hours notice)

0 commit comments

Comments
 (0)