Skip to content

disallow instance with older timestamp to update instance with newer … #5480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 31, 2023

Conversation

wenxu1024
Copy link
Contributor

What this PR does:

  1. Fix the FindDifference function bug in multi KV.

Background:
When we tried to switch from memberlist as primary and DDB as secondary KVStore, to DDB as primary and memberlist as secondary KV Store. We had encountered two issues.

  1. New store-gatway got stuck in Leaving state
  2. New store-gateway get into Active state, but in the ring page, it was displaying the old pod's IP not the IP from the new pod.

We figured out, it was because

  1. when old pods (pod-1) got terminated, it updates memberlist KV, and updated DDB KV as secondary
  2. When new pods (pod-1) get created, it updated DDB KV, but it failed to update memberlist KV as secondary
  3. Now when older version pod (pod-2) whose is still using memberlist as primary, will revert the change in DDB when it tries to update DDB as secondary

Now two things can happen, if the revert is before the next heartbeat of pod-1, pod-1 will be waiting for the instance to be in JOINING state forever, in this case, the pod get stuck in Leaving

If the revert happens after the next heartbeat of pod-1, when store-gateway finished its initial sync, it will set its state to ACTIVE. In this case, from Leaving to Active. IP will stay reverted in DDB.

To fix this issue, we check the timestamp during the update process, if the timestamp of the instance we tried to update is smaller than current timestamp of the instance, we skip the update. This will fix the issue.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Comment on lines 751 to 753
if !tokensEqual(ing.Tokens, oing.Tokens) {
tokensChanged = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider now that the tokens might not change if the timestamp is in the past. We should mark as tokensChanged only if we add the instance to toUpdate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I agree. That will save us some resolveconflict executions. I'll make the change.

Copy link
Contributor

@danielblando danielblando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@yeya24 yeya24 merged commit 31065a7 into cortexproject:master Jul 31, 2023
rajagopalanand pushed a commit to rajagopalanand/rajagopalanand-cortex that referenced this pull request Aug 3, 2023
cortexproject#5480)

* disallow instance with older timestamp to update instance with newer timestamp

Signed-off-by: Wen Xu <[email protected]>

* add statement in changelog

Signed-off-by: Wen Xu <[email protected]>

* just update after resolve conflict in the token, no need to do checks again

Signed-off-by: Wen Xu <[email protected]>

* fix lint

Signed-off-by: Wen Xu <[email protected]>

* only try to resolve conflict tokens when there is update

Signed-off-by: Wen Xu <[email protected]>

---------

Signed-off-by: Wen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants