Open
Description
For some instances we see "sample timestamp out of order for series" in our logs with a gap between previous and new timestamp of 15 or 30 seconds.
If this was going wrong in the sending Prometheus we would see the same error on all ingester replicas. We do not: they are reported sporadically on one ingester at a time. From this I deduce the out-of-order is happening inside Cortex.
Here is my best theory: suppose some client Prometheus has hundreds of samples queued up for remote write, then the following can happen:
- Prometheus sends 100 samples to distributor.
- Distributor replicates the data three times and fires up three goroutines to deliver the data.
- Once two of the calls have returned from ingesters, distributor returns success to prometheus.
- Third call continues, on its goroutine.
- Prometheus sends the next 100 samples; distributor (likely on another node) fires up another 3 goroutines.
- One of those goroutines can overtake the third one from the previous call.