Skip to content

Stop early-return after majority success in distributor writes #730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bboreham opened this issue Mar 2, 2018 · 10 comments
Closed

Stop early-return after majority success in distributor writes #730

bboreham opened this issue Mar 2, 2018 · 10 comments

Comments

@bboreham
Copy link
Contributor

bboreham commented Mar 2, 2018

Suppose we have a replication factor of 3, then the writing process is:

  • Distributor replicates the data three times and fires up three goroutines to deliver the data.
  • Once two of the calls have returned from ingesters, distributor returns success.
  • Context is cancelled (in distributor, because the call ended; in ruler, to avoid leaking timers)
  • Third call may or may not have updated its ingester.

I already think this causes #670; the idea that each replica may be missing some samples makes me more queasy.

In a typical Push() call containing 100 samples, the distributor will be writing to all ingesters (in different combinations for each metric), which will make the missing samples issue rare. And until #673 there was no timeout on ruler updates, so that context was never cancelled.

Let's just stop it - wait for all calls to return, or a timeout.

@rade
Copy link

rade commented Mar 2, 2018

Let's just stop it - wait for all calls to return, or a timeout.

Doesn't that mean one misbehaving ingester - being really slow or never responding - would slow down everything?

@bboreham
Copy link
Contributor Author

bboreham commented Mar 2, 2018

Doesn't that mean one misbehaving ingester - being really slow or never responding - would slow down everything?

Probably. The default distributor timeout is 2.5 seconds.

But, thinking about it some more, in the current code one ingester being a bit slower slower than all the others would result in it having most of its calls cancelled, hence it will be missing lots of samples. Now it looks like a replica but isn't really.

@bboreham bboreham changed the title Stop early-return after majority success in distributor Stop early-return after majority success in distributor writes Mar 2, 2018
@csmarchbanks
Copy link
Contributor

We were curious about this so made a PR, currently testing it out in a staging instance: #732

@limscoder
Copy link

@bboreham Did you see the PR for this? #732

@bboreham
Copy link
Contributor Author

bboreham commented Mar 6, 2018

Sorry, been doing other things for a few days.

@bboreham
Copy link
Contributor Author

bboreham commented Mar 7, 2018

To clarify my last point, suppose you have ingesters A, B, C. Distributor sends a sample X to A, B, then early-returns and cancels the call to C before C receives sample X.

Shortly after, B crashes and restarts.

Now a querier requests data from A, B, C; receives results from B and C and returns them. The result is missing sample X because C never got it and B crashed, losing its memory.

This is to demonstrate we cannot withstand a single failure (B).

@bboreham
Copy link
Contributor Author

bboreham commented Mar 9, 2018

#100 talks about only writing to two ingesters

@csmarchbanks
Copy link
Contributor

I saw #100 a couple days ago, but I think it would exacerbate the use case you explain above. And it is very confusing (and concerning to me) that despite a replication factor of 3 only any piece of data will only ever live on 2 ingesters.

@bboreham
Copy link
Contributor Author

bboreham commented Mar 9, 2018

Having read that part of the Jeff Dean paper, I don't think it is talking about the case where the data never goes further than the memory of the servers you send it to.

Also, on the write side, why do we care that the sending Prometheus sees a 200ms response instead of 100ms? It will create "shards" to do multiple calls in parallel and get the same throughput.

cc @tomwilkie for interest

@bboreham
Copy link
Contributor Author

bboreham commented Mar 9, 2018

We went with letting the calls complete in the background. Fixed by #736

@bboreham bboreham closed this as completed Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants