-
Notifications
You must be signed in to change notification settings - Fork 818
Stop early-return after majority success in distributor writes #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Doesn't that mean one misbehaving ingester - being really slow or never responding - would slow down everything? |
Probably. The default distributor timeout is 2.5 seconds. But, thinking about it some more, in the current code one ingester being a bit slower slower than all the others would result in it having most of its calls cancelled, hence it will be missing lots of samples. Now it looks like a replica but isn't really. |
We were curious about this so made a PR, currently testing it out in a staging instance: #732 |
Sorry, been doing other things for a few days. |
To clarify my last point, suppose you have ingesters A, B, C. Distributor sends a sample X to A, B, then early-returns and cancels the call to C before C receives sample X. Shortly after, B crashes and restarts. Now a querier requests data from A, B, C; receives results from B and C and returns them. The result is missing sample X because C never got it and B crashed, losing its memory. This is to demonstrate we cannot withstand a single failure (B). |
#100 talks about only writing to two ingesters |
I saw #100 a couple days ago, but I think it would exacerbate the use case you explain above. And it is very confusing (and concerning to me) that despite a replication factor of 3 only any piece of data will only ever live on 2 ingesters. |
Having read that part of the Jeff Dean paper, I don't think it is talking about the case where the data never goes further than the memory of the servers you send it to. Also, on the write side, why do we care that the sending Prometheus sees a 200ms response instead of 100ms? It will create "shards" to do multiple calls in parallel and get the same throughput. cc @tomwilkie for interest |
We went with letting the calls complete in the background. Fixed by #736 |
Suppose we have a replication factor of 3, then the writing process is:
I already think this causes #670; the idea that each replica may be missing some samples makes me more queasy.
In a typical
Push()
call containing 100 samples, the distributor will be writing to all ingesters (in different combinations for each metric), which will make the missing samples issue rare. And until #673 there was no timeout on ruler updates, so that context was never cancelled.Let's just stop it - wait for all calls to return, or a timeout.
The text was updated successfully, but these errors were encountered: