Skip to content

Conversation

josh-padnick
Copy link
Contributor

@josh-padnick josh-padnick commented Sep 19, 2025

Added new SLO policy document (docs/sla-policy.md). Covers:

  • Customer remedies including service credit structure
  • Incident classification and response time commitments
  • Communication procedures and transparency measures
  • Clear definitions of what is and isn't covered by the SLO

Fixed minor formatting (tsconfig.json):

  • Removed trailing newline to maintain consistent file formatting

Summary by CodeRabbit

  • Documentation

    • Added a new “SLO Policy” page detailing incident severities, response and resolution targets, communication cadence, detection processes, post-incident reviews, scope exclusions, and support contact options.
    • Updated the docs navigation to include “SLO Policy” after “Support” for easier access.
  • Chores

    • Minor configuration file formatting updated with no user-facing impact.

Copy link

vercel bot commented Sep 19, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
docs Ready Ready Preview Comment Sep 24, 2025 8:22pm

Copy link
Contributor

coderabbitai bot commented Sep 19, 2025

Walkthrough

Adds a new SLO Policy documentation page and links it in the docs sidebar immediately after Support. No functional code changes; tsconfig.json only changes the trailing newline.

Changes

Cohort / File(s) Summary
Documentation: SLO Policy
docs/slo-policy.md
New doc detailing SLO policy: severity levels, response/resolution times, incident detection, communication cadence, post-incident RCA, exclusions, and contact channels.
Sidebar Configuration
sidebars/docs.js
Inserts a new doc node { label: "SLO Policy", type: "doc", id: "slo-policy" } after the "Support" entry in the exported sidebar.
Config (No semantic change)
tsconfig.json
No content changes; only removes trailing newline at EOF.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A policy penned in careful light,
SLOs aligned, response in sight.
Sidebar whispers, “This way, friend,”
To docs that guide from start to end.
Incidents calm, like storms made small—
Clear words, quick hands, serving all.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The pull request title “Add SLA policy” inaccurately describes the change, which actually introduces a “SLO Policy” document and sidebar entry rather than an SLA. This mislabeling can mislead reviewers and future readers about the nature of the update. Ensuring consistency between the title and the content is important for clear project history. Please rename the title to accurately reflect the addition of the SLO policy, for example “Add SLO policy documentation,” so it aligns with the file names and content in the pull request.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add-sla

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
docs/sla-policy.md (2)

112-112: Fix bare URL formatting issue.

The static analysis tool flagged the bare URL on line 112. According to Markdown best practices, URLs should be properly formatted as links.

-- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues
+- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues

13-13: Verify the uptime calculation is accurate.
docs/sla-policy.md: line 13 — 99.9% uptime = 43.2 minutes/month (30‑day); range: 40.32 min (28‑day) → 44.64 min (31‑day). Current "43 minutes" is a rounded 30‑day value; change to "≈43.2 minutes per 30‑day month" or add "(approx.)".

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d41f5a and c0ab2bb.

📒 Files selected for processing (3)
  • docs/sla-policy.md (1 hunks)
  • sidebars/docs.js (1 hunks)
  • tsconfig.json (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/sla-policy.md

112-112: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Validate generated content
🔇 Additional comments (9)
tsconfig.json (1)

8-8: Minor formatting adjustment looks good.

The removal of the trailing newline maintains consistency in file formatting.

docs/sla-policy.md (7)

1-3: LGTM - Proper Docusaurus front matter configuration.

The YAML front matter correctly limits the table of contents depth to 2 levels, which is appropriate for the document structure.


5-7: Strong opening that sets clear expectations.

The introduction effectively communicates Gruntwork's commitment to reliability and transparency.


30-34: Service credit structure is clear and fair.

The tiered service credit structure provides reasonable compensation for different levels of service degradation.


45-50: Well-structured incident response matrix.

The incident classification and response time matrix provides clear expectations for different severity levels.


101-105: Appropriate SLA exclusions that protect against external factors.

The exclusions are reasonable and align with industry standards for what should and shouldn't be covered under an SLA.


92-95: Verify status page content & subscription features

status.gruntwork.io is reachable (HTTP 200) but the initial HTML lacks obvious status keywords (first 20 lines show BetterStack/client-side assets). Manually open the URL in a browser (or use a JS-capable renderer) and confirm it displays Live Updates, Incident History (90 days) and Subscribe (email/SMS); if any are missing, update docs/sla-policy.md (lines 92–95) with the correct URL or wording.


111-113: Support portal reachable — confirm email & Slack (docs/sla-policy.md:111-113)

sidebars/docs.js (1)

84-88: Perfect sidebar integration for the SLA Policy.

The new sidebar entry is properly placed after the Support section and correctly references the "sla-policy" document ID, which matches the new markdown file.


### Production API Services

- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Monthly Uptime Target**: 99.9% (allows for 43 minutes of downtime per month)
- **Monthly Uptime Target**: 99.9%


While we strive to meet our SLA targets, we recognize that outages impact your business. For paying customers:

### Service Credits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove service credits. This isn't in our terms today and I think is something we move to only if we have to down the road.

| 95.0% - 99.0% | 5% |
| < 95.0% | 10% |

### Credit Request Process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove as well

| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | Status page update within 4 hours |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |

## How We Catch Problems Fast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this adds value, we can/should remove.

- Quick summary of what happened
- We'll do a full review and share lessons learned

### After We Fix It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we remove these commitments

- **How We'll Prevent It**: Specific steps we're taking to avoid this happening again
- **Lessons Learned**: What worked well and what we'll do better next time

### Our Status Page
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a status page today

@ZachGoldberg
Copy link
Contributor

Why docs instead of legal?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
docs/sla-policy.md (8)

11-16: Resolve inconsistencies between the table and the later “When Things Go Wrong” section.

Sev2 says “Notification via email upon resolution,” but below you promise first/regular updates. Recommend consistent comms via a status page + email for material changes. Also prefer “Target to restore service” over strict “Resolution Time.”

-| Severity | Definition | Response Time | Resolution Time | Communication |
+| Severity | Definition | Response Time | Target to Restore Service | Communication |
@@
-| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
+| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Status page update within response time; email on material changes and upon resolution |

Also scope the “Regular Updates” cadence below to Sev1/Sev2 to match the table (see suggestion on Lines 37-38).


22-26: Avoid vendor callouts (“Sentry”) in public policy docs.

Keep tooling generic to reduce external dependencies and future-proof the doc.

-- **Error Tracking**: We use tools like Sentry to get instant notifications when errors occur
+- **Error Tracking**: We use error-tracking tools to get instant notifications when errors occur

33-41: Clarify update cadence scope.

Call out which severities get regular updates to match the table and avoid over-promising.

-2. **Regular Updates** (every 2 hours for critical issues)
+2. **Regular Updates** (every 2 hours for Severity 1; periodic updates for Severity 2)

46-55: Soften the hard commitment on post-incident reports.

If you intend SLOs (not contractual SLAs), use “aim to” language.

-For serious incidents (Severity 1 & 2), we'll publish a full report within 5 business days that includes:
+For serious incidents (Severity 1 & 2), we aim to publish a full report within 5 business days that includes:

60-62: Minor clarity: “at least” 72 hours and timezone.

Make maintenance notice explicit and unambiguous.

-- Scheduled maintenance (we'll give you 72 hours notice)
+- Scheduled maintenance (we'll give you at least 72 hours’ notice; all times in [your primary timezone, e.g., PT])

68-73: Add the public status page and make the email a mailto link.

This both advertises the new status page and resolves the markdownlint bare-URL warning by linking the email.

 Here's how to reach us:
 
+- **Status Page**: [status.gruntwork.io](https://status.gruntwork.io) - Real-time service status and incident updates
 - **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues
-- **Email**: [email protected] - Direct email support
+- **Email**: [[email protected]](mailto:[email protected]) - Direct email support
 - **Slack**: Customer-specific channels (where available) - Real-time chat with our team

Please confirm the status page is live and owned by us before linking.


9-17: Confirm practicality of “resolution” timelines.

Strict 4h/8h “Resolution Time” SLAs are often unrealistic. If these are SLOs, ensure they’re achievable with current staffing/on-call. Consider “target to restore service or implement workaround.”


27-33: Define “business hours” and timezone.

You mention “business hours” earlier; add hours and timezone once to avoid ambiguity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0ab2bb and c90085e.

📒 Files selected for processing (1)
  • docs/sla-policy.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/sla-policy.md

71-71: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Validate generated content
  • GitHub Check: validate_build
  • GitHub Check: Pull Request has non-contributor approval

Comment on lines 5 to 8
# Service Level Objective (SLO) Policy

We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

❓ Verification inconclusive

Add a clear non-contractual disclaimer and align terminology (SLA vs SLO).

This content reads like commitments. Without a legal review, publish as “SLOs” and add a non‑contractual disclaimer. Also ensure the sidebar/nav and PR title don’t call it “SLA” if the document is “SLO”.

Apply this diff to add a disclaimer under the intro:

 We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
+
+> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non‑contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.

And verify the nav label matches (“SLO Policy” vs “SLA Policy”). If needed, I can propose the sidebar change.


Add non-contractual disclaimer and align terminology to SLO
Insert immediately after the introductory paragraph in docs/sla-policy.md:

 We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
+> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non-contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.

Update the sidebar/nav label and PR title from “SLA Policy” to “SLO Policy” to match the document.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Service Level Objective (SLO) Policy
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
# Service Level Objective (SLO) Policy
We are committed to providing reliable, high-quality services to our customers. This Service Level Objective (SLO) outlines our availability commitments, incident response procedures, and the transparency measures we employ to keep you informed about the health of our services.
> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non-contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.
🤖 Prompt for AI Agents
In docs/sla-policy.md around lines 5 to 8, insert a non-contractual disclaimer
sentence immediately after the existing introductory paragraph clarifying that
the SLOs are targets and not legally binding commitments, and update the
document terminology by renaming the file title and all in-file occurrences from
“SLA Policy” to “SLO Policy”; additionally update the site navigation/sidebar
label and the PR title to “SLO Policy” so labels are consistent across docs and
UI.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/slo-policy.md (2)

14-14: Align Severity 2 communication with the “First Update” section.

The Sev 2 row says “email upon resolution,” which conflicts with the “First Update” promise. Clarify initial + regular updates.

-| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
+| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Initial update within 1 hour; regular updates |

59-64: Tighten exclusions; remove “etc.” for precision.

Avoid vague “etc.” in policy docs.

-- Scheduled maintenance 
-- Issues outside our control (internet outages, AWS problems, etc.)
-- Problems you caused (wrong configuration, hitting rate limits, etc.)
+- Scheduled maintenance
+- Issues outside our control (e.g., internet outages or AWS provider incidents)
+- Problems you caused (e.g., wrong configuration or hitting rate limits)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13f9c9e and 4c823e0.

📒 Files selected for processing (1)
  • docs/slo-policy.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/slo-policy.md

70-70: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: validate_build
  • GitHub Check: Validate generated content
  • GitHub Check: Pull Request has non-contributor approval
🔇 Additional comments (4)
docs/slo-policy.md (4)

5-5: Good structure and SLO framing; verify labels elsewhere match “SLO.”

Doc title/file use “SLO” correctly. Please confirm the sidebar label and PR title also say “SLO Policy,” not “SLA.”


24-24: Define “business hours” (timezone and hours) or link to a canonical page.

“During business hours” is ambiguous; specify hours/timezone or reference your Support page where this is defined.


7-8: Add a non‑contractual disclaimer to avoid creating enforceable commitments.

Insert a brief disclaimer after the intro to clarify these are operational targets, not contractual SLAs.

 We are committed to providing reliable, high-quality services to our customers. 
+
+> Note: This page describes our current operational Service Level Objectives (SLOs). It is informational and non‑contractual and does not modify any customer agreement or MSA. Any binding SLAs, if applicable, are defined exclusively in your contract.
+

47-54: Soften RCA language to avoid hard commitments while preserving intent.

Reflect “aim/summary/may include” phrasing.

-For serious incidents (Severity 1 & 2), we'll create a Root Cause Analysis that, upon request, will be shared with customers, including:
+For serious incidents (Severity 1 & 2), we aim to create a Root Cause Analysis and, upon request, can share a summary with customers. This may include:

Comment on lines +11 to +17
| Severity | Definition | Response Time | Resolution Time | Communication |
|----------|------------|---------------|-----------------|---------------|
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email |
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Clarify that response/resolution times are targets during support hours (not contractual).

Add a short note below the table to reduce legal risk and set expectations.

 | **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
 
+Note: Response and resolution times are targets (not contractual guarantees) and apply during our support hours. We will communicate updates per the “Communication & Transparency” section.
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Severity | Definition | Response Time | Resolution Time | Communication |
|----------|------------|---------------|-----------------|---------------|
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email |
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
| Severity | Definition | Response Time | Resolution Time | Communication |
|----------|------------|---------------|-----------------|---------------|
| **Severity 1 (Critical)** | Complete service outage or critical functionality unavailable affecting multiple customers | 30 minutes | 4 hours | Immediate notification via email |
| **Severity 2 (High)** | Significant degradation of service or critical functionality unavailable for a subset of customers | 1 hour | 8 hours | Notification via email upon resolution |
| **Severity 3 (Medium)** | Minor service degradation or non-critical functionality unavailable | 4 hours | 24 hours | As needed |
| **Severity 4 (Low)** | Cosmetic issues or minor bugs with workarounds available | 1 business day | Best effort | As needed |
Note: Response and resolution times are targets (not contractual guarantees) and apply during our support hours. We will communicate updates per the “Communication & Transparency” section.
🤖 Prompt for AI Agents
In docs/slo-policy.md around lines 11 to 17, the response and resolution times
in the SLO table need a clarifying note stating they are target goals applicable
during defined support hours and are not contractual guarantees; add a single
short sentence immediately below the table (e.g., "These response and resolution
times are target service-level goals applicable during our published support
hours and do not constitute contractual commitments.") and ensure it references
where support hours are defined or link to the support-hours section.

Here's how to reach us:

- **Support Portal**: [support.gruntwork.io](https://support.gruntwork.io) - Submit tickets and track issues
- **Email**: [email protected] - Direct email support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix markdownlint MD034 (bare URL) for the email address.

Wrap the email in a mailto link (or angle brackets).

-- **Email**: [email protected] - Direct email support
+- **Email**: [[email protected]](mailto:[email protected]) - Direct email support
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Email**: [email protected] - Direct email support
**Email**: [[email protected]](mailto:[email protected]) - Direct email support
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

70-70: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In docs/slo-policy.md around line 70 the email is written as a bare address
("**Email**: [email protected] - Direct email support") which triggers
markdownlint MD034; update the line to wrap the address as a mailto link or
enclose it in angle brackets, e.g. use <[email protected]> or
[[email protected]](mailto:[email protected]) so the email is not a bare
URL and the linter warning is resolved.

@josh-padnick
Copy link
Contributor Author

This LGTM, but looks like you need someone other than me to approve so you can merge.

@ZachGoldberg ZachGoldberg merged commit 99cad2d into main Sep 25, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants