Skip to content

Clock Skew can cause Ingesters to be unable to join the ring or be forgotten. #4461

Closed
@danielblando

Description

@danielblando

Describe the bug
This problem happens because memberlist ignores HeartBeat timestamps that are olders than the timestamp in memory.
This logic is implemented in the Merge function

cortex/pkg/ring/model.go

Lines 189 to 201 in 23aed55

for name, oing := range otherIngesterMap {
ting := thisIngesterMap[name]
// ting.Timestamp will be 0, if there was no such ingester in our version
if oing.Timestamp > ting.Timestamp {
oing.Tokens = append([]uint32(nil), oing.Tokens...) // make a copy of tokens
thisIngesterMap[name] = oing
updated = append(updated, name)
} else if oing.Timestamp == ting.Timestamp && ting.State != LEFT && oing.State == LEFT {
// we accept LEFT even if timestamp hasn't changed
thisIngesterMap[name] = oing // has no tokens already
updated = append(updated, name)
}
}

The logic itself is correct, but when a ingester have a clock skew and send an information that might be in the future (eg. 1 day ahead) that info will be valid until that timestamp is reached. Also when using the UI "Cortex Ring Status" to call action "Forget" that do not work as others will ignore the information.
The ingester cannot come back to the ring until that timestamp as well. If we try to remove the ingester, fix the clock and add it back, no changes to the ring would be noticed and the info is ignored.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (master@25726a168a9b5a9394c9e7c7ff67cdffed66a347)
  2. Start a new ingester in a new instance with a clock in the future (complicated to reproduce)
  3. Check you cortex ring status to validate the new ingester entry in the future
  4. Kill the new ingester
  5. Check the new ingester will continue in the ring.
  6. Even trying to forget it does not work as the existent date is fresher

Expected behavior
I would expect the ring to handle clock skew. It could either accept the information, but have another way to forcefully remove the bad ingester or deny the information that comes way ahead of time.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Storage Engine

  • Blocks
  • Chunks

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions