-
Notifications
You must be signed in to change notification settings - Fork 820
Use most recent time to remove from ring #4501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Daniel Blando <[email protected]>
25ab99b
to
5e7f104
Compare
// If timestamp on the ring is in "the future" we need to use the timestamp of the ring | ||
// otherwise the gossip message will be ignored by the peers. | ||
// see https://github.com/cortexproject/cortex/blob/d3f46c53616dcdf5c670f3e3a4d371e44b10683e/pkg/ring/model.go#L192-L200 | ||
if ting.Timestamp < now.Unix() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this fixes your issue. We must update the timestamp, otherwise the change is not propagated via gossip, so I think that what you want is something more like this:
if ting.Timestamp < now.Unix() {
ting.Timestamp = now.Unix()
} else {
ting.Timestamp += 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci thanks for looking into this.
We do add the LEFT change right after this line in the updates list.
Line 221 in 5e7f104
updated = append(updated, name) |
which will go onto the changes returned
Lines 236 to 241 in 5e7f104
// Let's build a "change" for returning | |
out := NewDesc() | |
for _, u := range updated { | |
ing := thisIngesterMap[u] | |
out.Ingesters[u] = ing | |
} |
Form what i can tell we don't depend on timestamp
I think what we need to make sure is that the misconfigured ingester be shut down before forgetting it, which will make sure it does not rejoin the ring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's best to put a newer timestamp on newer data.
That way everyone in the gossip will treat it as new, not be uncertain whether they should use the other info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This change makes sense to me: When we "forget" entry that has time in the future, we don't need to update the time (as it's already , but set state to LEFT. The change will be propagated via gossip, and other nodes will accept it unless they have even more recent entry.
* [BUGFIX] Memberlist: Remove entry from the ring even if the timestamp is in the future #4501 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entry should be moved to master / unreleased section now.
One problem I can see with this change is that if ingester (or other component) is restarted, and it uses same instance ID but no longer has clock-skew, it will be unable to join the ring, if there is still LEFT entry with future time :( This seems bit dangerous. |
Although this assumes skew more than just a few seconds... hopefully that's not the case! |
Hey @pstibrany, thanks for the time. Good catch, in our case the clock skew was 3 days which would made very long to rejoin even with the change. This would make this change not much helpful. We could try to add a flag for this scenarios, but seems just creating more code for issues. I think in this case the best approach is to refuse the bad info from the beginning and not let that in the ring. I was trying to do that in this draft, but the problem is how to add a configuration in Merge implementation. To add in the context we would need to insert a new info on the Desc which would make this spread in the protocol. Lines 10 to 13 in e7e05b2
Do you have any opinion on this approach? |
One problem is how to pass that information to Desc, that would be solveable. What is more difficult is how to make that work properly with requirements on Merge method. I don't see any easy answers there. By breaking Merge properties we would introduce more problems into how gossip works :( I think the best approach would be to make sure that clocks are reasonably synced in the first place, if possible. |
That is a little trick, no? I agree that these are rare scenarios, but we can't assume it will not happen. In my case, I had it running in EC2 with AWS NTP and still happened. I think the biggest problem is that we don't have a way out of these situation right now. The best solution to get the ingester back to the ring would be restarting the ring.
Agree that it can be solvable, it is just a matter on how we decide to introduce that information. I understand your point of breaking the "Commutativity" requirement, but denying the information would make an error pop up and leave the wrong configured ingester without update. The information would not be gossip forward. From the error we could act on it to fix the issue. |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
Signed-off-by: Daniel Blando [email protected]
What this PR does:
We previously did a change for forget instance from the ring on 4603. In some rare cases the timestamp in the ring can be more recent than our current time (eg. clock skew into the future).
This change make sure we can remove the instance even if these cases occcurs
Which issue(s) this PR fixes:
Fixes #4461
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]