Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Fix CortexCompactorHasNotSuccessfullyRunCompaction to avoid false positives #294

Merged
merged 1 commit into from
Apr 21, 2021

Conversation

pracucci
Copy link
Collaborator

What this PR does:
Yesterday we merged #292 but I just realised it suffers false positives. The reason is that a compaction run is marked as done once there's no more work to do for the compactor. Because of this it could take even a long time (several hours) while the compactor is working flawlessly as expected. Moreover, after startup the metric cortex_compactor_last_successful_run_timestamp_seconds value will be 0 until the first compaction run as completed (which, again, could take hours).

In this PR:

  1. Increase threshold to 24h
  2. Cover the case of the metric == 0 right after startup (while the 1st compaction run is on-going)

Which issue(s) this PR fixes:
N/A

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci requested a review from a team as a code owner April 21, 2021 07:43
@gouthamve
Copy link
Member

How long of a compaction run is considered bad? Is a compaction run taking 24hrs bad? Is a run taking 6hrs bad?

@pracucci
Copy link
Collaborator Author

How long of a compaction run is considered bad? Is a compaction run taking 24hrs bad? Is a run taking 6hrs bad?

There's no easy anwer. A compactor run loops until there's work to do. It may take forever and still catching up with the work, but run in a forever loop because as soon as it completes to compact a block there's more work to do. Realistically, we've observed 24h should be a threshold long enough. If compactor doesn't fully catch up in 24h, then you need to scale it up (vertically or horizontally).

Copy link
Member

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci merged commit 410f9fe into main Apr 21, 2021
@pracucci pracucci deleted the fix-compactor-alert branch April 21, 2021 09:13
simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021
…tor-alert

Fix CortexCompactorHasNotSuccessfullyRunCompaction to avoid false positives
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants