Skip to content

Conversation

eeldaly
Copy link
Contributor

@eeldaly eeldaly commented Jan 28, 2025

What this PR does:

  • Add auto_forget_delay for ring to auto forget unhealthy compactors.
  • Add LifecyclerDelegate interface to normal Lifecycler with OnRingInstanceHeartbeat

Which issue(s) this PR fixes:
Fixes #6533

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

type LifecyclerDelegate interface {
// OnRingInstanceHeartbeat is called while the instance is updating its heartbeat
// in the ring.
OnRingInstanceHeartbeat(lifecycler *Lifecycler, ctx context.Context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see that the same function parameter to be reused. The current interface is a bit weird and we are not using the lifecycler at all in the compactor implementation

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I think you need a rebase. It contains some commits from your previous changes

"github.com/cortexproject/cortex/pkg/ring"
)

func (c *Compactor) OnRingInstanceHeartbeat(_ *ring.Lifecycler, ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do similar to the basicLifecycler.

Create the generic delegate for the ring and make compactor use it.

Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
instanceDesc.Zone = i.Zone
instanceDesc.RegisteredTimestamp = i.getRegisteredAt().Unix()
ringDesc.Ingesters[i.ID] = instanceDesc
i.delegate.OnRingInstanceHeartbeat(i, ringDesc)
Copy link
Contributor

@yeya24 yeya24 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this line outside the else block. We should call OnRingInstanceHeartbeat no matter the instance exists in the ring or not. Same behavior as UpdateInstance in the basic lifecycler

Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test in compactor for it?

f.BoolVar(&cfg.SkipBlocksWithOutOfOrderChunksEnabled, "compactor.skip-blocks-with-out-of-order-chunks-enabled", false, "When enabled, mark blocks containing index with out-of-order chunks for no compact instead of halting the compaction.")
f.IntVar(&cfg.BlockFilesConcurrency, "compactor.block-files-concurrency", 10, "Number of goroutines to use when fetching/uploading block files from object storage.")
f.IntVar(&cfg.BlocksFetchConcurrency, "compactor.blocks-fetch-concurrency", 3, "Number of goroutines to use when fetching blocks from object storage when compacting.")
f.DurationVar(&cfg.AutoForgetDelay, "compactor.auto-forget-delay", 15*time.Minute, "Time since last heartbeat before compactor will be removed from ring. 0 to disable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ruler, alertmanager and sg, we dont have a config for this. We usually just do 2x heartbeat timeout.

I am fine adding a new config, but lets add the default as 2*heartbeat_timeout.

Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
Signed-off-by: Essam Eldaly <[email protected]>
@yeya24
Copy link
Contributor

yeya24 commented Feb 13, 2025

Test failure seems related.

Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yeya24 yeya24 merged commit 2a6dd6b into cortexproject:master Feb 13, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compactor becomes unhealthy in the ring when stopped during startup
3 participants