-
Notifications
You must be signed in to change notification settings - Fork 832
Add compactor auto-forget from ring on unhealthy #6563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
pkg/ring/lifecycler.go
Outdated
type LifecyclerDelegate interface { | ||
// OnRingInstanceHeartbeat is called while the instance is updating its heartbeat | ||
// in the ring. | ||
OnRingInstanceHeartbeat(lifecycler *Lifecycler, ctx context.Context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see that the same function parameter to be reused. The current interface is a bit weird and we are not using the lifecycler at all in the compactor implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I think you need a rebase. It contains some commits from your previous changes
pkg/compactor/lifecycle.go
Outdated
"github.com/cortexproject/cortex/pkg/ring" | ||
) | ||
|
||
func (c *Compactor) OnRingInstanceHeartbeat(_ *ring.Lifecycler, ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do similar to the basicLifecycler.
Create the generic delegate for the ring and make compactor use it.
…LifecyclerDelegate Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
pkg/ring/lifecycler.go
Outdated
instanceDesc.Zone = i.Zone | ||
instanceDesc.RegisteredTimestamp = i.getRegisteredAt().Unix() | ||
ringDesc.Ingesters[i.ID] = instanceDesc | ||
i.delegate.OnRingInstanceHeartbeat(i, ringDesc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this line outside the else block. We should call OnRingInstanceHeartbeat
no matter the instance exists in the ring or not. Same behavior as UpdateInstance
in the basic lifecycler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test in compactor for it?
pkg/compactor/compactor.go
Outdated
f.BoolVar(&cfg.SkipBlocksWithOutOfOrderChunksEnabled, "compactor.skip-blocks-with-out-of-order-chunks-enabled", false, "When enabled, mark blocks containing index with out-of-order chunks for no compact instead of halting the compaction.") | ||
f.IntVar(&cfg.BlockFilesConcurrency, "compactor.block-files-concurrency", 10, "Number of goroutines to use when fetching/uploading block files from object storage.") | ||
f.IntVar(&cfg.BlocksFetchConcurrency, "compactor.blocks-fetch-concurrency", 3, "Number of goroutines to use when fetching blocks from object storage when compacting.") | ||
f.DurationVar(&cfg.AutoForgetDelay, "compactor.auto-forget-delay", 15*time.Minute, "Time since last heartbeat before compactor will be removed from ring. 0 to disable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ruler, alertmanager and sg, we dont have a config for this. We usually just do 2x heartbeat timeout.
I am fine adding a new config, but lets add the default as 2*heartbeat_timeout.
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Test failure seems related. |
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does:
auto_forget_delay
for ring to auto forget unhealthy compactors.OnRingInstanceHeartbeat
Which issue(s) this PR fixes:
Fixes #6533
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]