-
Notifications
You must be signed in to change notification settings - Fork 43
Add SLA policy #2741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SLA policy #2741
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughAdds a new SLO Policy documentation page and links it in the docs sidebar immediately after Support. No functional code changes; tsconfig.json only changes the trailing newline. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/sla-policy.md (2)
112-112
: Fix bare URL formatting issue.The static analysis tool flagged the bare URL on line 112. According to Markdown best practices, URLs should be properly formatted as links.
-- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues +- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues
13-13
: Verify the uptime calculation is accurate.
docs/sla-policy.md: line 13 — 99.9% uptime = 43.2 minutes/month (30‑day); range: 40.32 min (28‑day) → 44.64 min (31‑day). Current "43 minutes" is a rounded 30‑day value; change to "≈43.2 minutes per 30‑day month" or add "(approx.)".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/sla-policy.md
(1 hunks)sidebars/docs.js
(1 hunks)tsconfig.json
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/sla-policy.md
112-112: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Validate generated content
🔇 Additional comments (9)
tsconfig.json (1)
8-8
: Minor formatting adjustment looks good.The removal of the trailing newline maintains consistency in file formatting.
docs/sla-policy.md (7)
1-3
: LGTM - Proper Docusaurus front matter configuration.The YAML front matter correctly limits the table of contents depth to 2 levels, which is appropriate for the document structure.
5-7
: Strong opening that sets clear expectations.The introduction effectively communicates Gruntwork's commitment to reliability and transparency.
30-34
: Service credit structure is clear and fair.The tiered service credit structure provides reasonable compensation for different levels of service degradation.
45-50
: Well-structured incident response matrix.The incident classification and response time matrix provides clear expectations for different severity levels.
101-105
: Appropriate SLA exclusions that protect against external factors.The exclusions are reasonable and align with industry standards for what should and shouldn't be covered under an SLA.
92-95
: Verify status page content & subscription featuresstatus.gruntwork.io is reachable (HTTP 200) but the initial HTML lacks obvious status keywords (first 20 lines show BetterStack/client-side assets). Manually open the URL in a browser (or use a JS-capable renderer) and confirm it displays Live Updates, Incident History (90 days) and Subscribe (email/SMS); if any are missing, update docs/sla-policy.md (lines 92–95) with the correct URL or wording.
111-113
: Support portal reachable — confirm email & Slack (docs/sla-policy.md:111-113)
- support.gruntwork.io responds; final URL redirects to HubSpot login: https://support.gruntwork.io/_hcms/mem/login?redirect_url=https%3A%2F%2Fsupport.gruntwork.io%2Ftickets.
- [email protected] could not be verified by the script — confirm the mailbox is active and monitored.
- Confirm customer-specific Slack channels are configured and documented for customers.
sidebars/docs.js (1)
84-88
: Perfect sidebar integration for the SLA Policy.The new sidebar entry is properly placed after the Support section and correctly references the "sla-policy" document ID, which matches the new markdown file.
docs/sla-policy.md
Outdated
|
||
### Production API Services | ||
|
||
- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month) | |
- **Monthly Uptime Target**: 99.9% |
docs/sla-policy.md
Outdated
|
||
While we strive to meet our SLA targets, we recognize that outages impact your business. For paying customers: | ||
|
||
### Service Credits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove service credits. This isn't in our terms today and I think is something we move to only if we have to down the road.
docs/sla-policy.md
Outdated
| 95.0% - 99.0% | 5% | | ||
| < 95.0% | 10% | | ||
|
||
### Credit Request Process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove as well
docs/sla-policy.md
Outdated
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | Status page update within 4 hours | | ||
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed | | ||
|
||
## How We Catch Problems Fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this adds value, we can/should remove.
- Quick summary of what happened | ||
- We'll do a full review and share lessons learned | ||
|
||
### After We Fix It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we remove these commitments
docs/sla-policy.md
Outdated
- **How We'll Prevent It**: Specific steps we're taking to avoid this happening again | ||
- **Lessons Learned**: What worked well and what we'll do better next time | ||
|
||
### Our Status Page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a status page today
Why docs instead of legal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
docs/sla-policy.md (8)
11-16
: Resolve inconsistencies between the table and the later “When Things Go Wrong” section.Sev2 says “Notification via email upon resolution,” but below you promise first/regular updates. Recommend consistent comms via a status page + email for material changes. Also prefer “Target to restore service” over strict “Resolution Time.”
-| Severity | Definition | Response Time | Resolution Time | Communication | +| Severity | Definition | Response Time | Target to Restore Service | Communication | @@ -| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | +| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Status page update within response time; email on material changes and upon resolution |Also scope the “Regular Updates” cadence below to Sev1/Sev2 to match the table (see suggestion on Lines 37-38).
22-26
: Avoid vendor callouts (“Sentry”) in public policy docs.Keep tooling generic to reduce external dependencies and future-proof the doc.
-- **Error Tracking**: We use tools like Sentry to get instant notifications when errors occur +- **Error Tracking**: We use error-tracking tools to get instant notifications when errors occur
33-41
: Clarify update cadence scope.Call out which severities get regular updates to match the table and avoid over-promising.
-2. **Regular Updates** (every 2 hours for critical issues) +2. **Regular Updates** (every 2 hours for Severity 1; periodic updates for Severity 2)
46-55
: Soften the hard commitment on post-incident reports.If you intend SLOs (not contractual SLAs), use “aim to” language.
-For serious incidents (Severity 1 & 2), we'll publish a full report within 5 business days that includes: +For serious incidents (Severity 1 & 2), we aim to publish a full report within 5 business days that includes:
60-62
: Minor clarity: “at least” 72 hours and timezone.Make maintenance notice explicit and unambiguous.
-- Scheduled maintenance (we'll give you 72 hours notice) +- Scheduled maintenance (we'll give you at least 72 hours’ notice; all times in [your primary timezone, e.g., PT])
68-73
: Add the public status page and make the email a mailto link.This both advertises the new status page and resolves the markdownlint bare-URL warning by linking the email.
Here's how to reach us: +- **Status Page**: [status.gruntwork.io](https://status.gruntwork.io) - Real-time service status and incident updates - **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues -- **Email**: [email protected] - Direct email support +- **Email**: [[email protected]](mailto:[email protected]) - Direct email support - **Slack**: Customer-specific channels (where available) - Real-time chat with our teamPlease confirm the status page is live and owned by us before linking.
9-17
: Confirm practicality of “resolution” timelines.Strict 4h/8h “Resolution Time” SLAs are often unrealistic. If these are SLOs, ensure they’re achievable with current staffing/on-call. Consider “target to restore service or implement workaround.”
27-33
: Define “business hours” and timezone.You mention “business hours” earlier; add hours and timezone once to avoid ambiguity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/sla-policy.md
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/sla-policy.md
71-71: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Validate generated content
- GitHub Check: validate_build
- GitHub Check: Pull Request has non-contributor approval
# Service Level Objective (SLO) Policy | ||
|
||
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
❓ Verification inconclusive
Add a clear non-contractual disclaimer and align terminology (SLA vs SLO).
This content reads like commitments. Without a legal review, publish as “SLOs” and add a non‑contractual disclaimer. Also ensure the sidebar/nav and PR title don’t call it “SLA” if the document is “SLO”.
Apply this diff to add a disclaimer under the intro:
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
+
+> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non‑contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.
And verify the nav label matches (“SLO Policy” vs “SLA Policy”). If needed, I can propose the sidebar change.
Add non-contractual disclaimer and align terminology to SLO
Insert immediately after the introductory paragraph in docs/sla-policy.md:
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
+> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non-contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.
Update the sidebar/nav label and PR title from “SLA Policy” to “SLO Policy” to match the document.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Service Level Objective (SLO) Policy | |
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services. | |
# Service Level Objective (SLO) Policy | |
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services. | |
> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non-contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract. |
🤖 Prompt for AI Agents
In docs/sla-policy.md around lines 5 to 8, insert a non-contractual disclaimer
sentence immediately after the existing introductory paragraph clarifying that
the SLOs are targets and not legally binding commitments, and update the
document terminology by renaming the file title and all in-file occurrences from
“SLA Policy” to “SLO Policy”; additionally update the site navigation/sidebar
label and the PR title to “SLO Policy” so labels are consistent across docs and
UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/slo-policy.md (2)
14-14
: Align Severity 2 communication with the “First Update” section.The Sev 2 row says “email upon resolution,” which conflicts with the “First Update” promise. Clarify initial + regular updates.
-| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | +| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Initial update within 1 hour; regular updates |
59-64
: Tighten exclusions; remove “etc.” for precision.Avoid vague “etc.” in policy docs.
-- Scheduled maintenance -- Issues outside our control (internet outages, AWS problems, etc.) -- Problems you caused (wrong configuration, hitting rate limits, etc.) +- Scheduled maintenance +- Issues outside our control (e.g., internet outages or AWS provider incidents) +- Problems you caused (e.g., wrong configuration or hitting rate limits)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/slo-policy.md
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/slo-policy.md
70-70: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: validate_build
- GitHub Check: Validate generated content
- GitHub Check: Pull Request has non-contributor approval
🔇 Additional comments (4)
docs/slo-policy.md (4)
5-5
: Good structure and SLO framing; verify labels elsewhere match “SLO.”Doc title/file use “SLO” correctly. Please confirm the sidebar label and PR title also say “SLO Policy,” not “SLA.”
24-24
: Define “business hours” (timezone and hours) or link to a canonical page.“During business hours” is ambiguous; specify hours/timezone or reference your Support page where this is defined.
7-8
: Add a non‑contractual disclaimer to avoid creating enforceable commitments.Insert a brief disclaimer after the intro to clarify these are operational targets, not contractual SLAs.
We are committed to providing reliable, high-quality services to our customers. + +> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non‑contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract. +
47-54
: Soften RCA language to avoid hard commitments while preserving intent.Reflect “aim/summary/may include” phrasing.
-For serious incidents (Severity 1 & 2), we'll create a Root Cause Analysis that, upon request, will be shared with customers, including: +For serious incidents (Severity 1 & 2), we aim to create a Root Cause Analysis and, upon request, can share a summary with customers. This may include:
| Severity | Definition | Response Time | Resolution Time | Communication | | ||
|----------|------------|---------------|-----------------|---------------| | ||
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email | | ||
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | | ||
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed | | ||
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that response/resolution times are targets during support hours (not contractual).
Add a short note below the table to reduce legal risk and set expectations.
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
+Note: Response and resolution times are targets (not contractual guarantees) and apply during our support hours. We will communicate updates per the “Communication & Transparency” section.
+
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Severity | Definition | Response Time | Resolution Time | Communication | | |
|----------|------------|---------------|-----------------|---------------| | |
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email | | |
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | | |
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed | | |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed | | |
| Severity | Definition | Response Time | Resolution Time | Communication | | |
|----------|------------|---------------|-----------------|---------------| | |
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email | | |
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution | | |
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed | | |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed | | |
Note: Response and resolution times are targets (not contractual guarantees) and apply during our support hours. We will communicate updates per the “Communication & Transparency” section. |
🤖 Prompt for AI Agents
In docs/slo-policy.md around lines 11 to 17, the response and resolution times
in the SLO table need a clarifying note stating they are target goals applicable
during defined support hours and are not contractual guarantees; add a single
short sentence immediately below the table (e.g., "These response and resolution
times are target service-level goals applicable during our published support
hours and do not constitute contractual commitments.") and ensure it references
where support hours are defined or link to the support-hours section.
Here's how to reach us: | ||
|
||
- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues | ||
- **Email**: [email protected] - Direct email support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix markdownlint MD034 (bare URL) for the email address.
Wrap the email in a mailto link (or angle brackets).
-- **Email**: [email protected] - Direct email support
+- **Email**: [[email protected]](mailto:[email protected]) - Direct email support
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- **Email**: [email protected] - Direct email support | |
**Email**: [[email protected]](mailto:[email protected]) - Direct email support |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
70-70: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In docs/slo-policy.md around line 70 the email is written as a bare address
("**Email**: [email protected] - Direct email support") which triggers
markdownlint MD034; update the line to wrap the address as a mailto link or
enclose it in angle brackets, e.g. use <[email protected]> or
[[email protected]](mailto:[email protected]) so the email is not a bare
URL and the linter warning is resolved.
This LGTM, but looks like you need someone other than me to approve so you can merge. |
Added new SLO policy document (docs/sla-policy.md). Covers:
Fixed minor formatting (tsconfig.json):
Summary by CodeRabbit
Documentation
Chores