Closed
Description
Suppose we have a replication factor of 3, then the writing process is:
- Distributor replicates the data three times and fires up three goroutines to deliver the data.
- Once two of the calls have returned from ingesters, distributor returns success.
- Context is cancelled (in distributor, because the call ended; in ruler, to avoid leaking timers)
- Third call may or may not have updated its ingester.
I already think this causes #670; the idea that each replica may be missing some samples makes me more queasy.
In a typical Push()
call containing 100 samples, the distributor will be writing to all ingesters (in different combinations for each metric), which will make the missing samples issue rare. And until #673 there was no timeout on ruler updates, so that context was never cancelled.
Let's just stop it - wait for all calls to return, or a timeout.
Metadata
Metadata
Assignees
Labels
No labels