Skip to content

Conversation

alanprot
Copy link
Member

@alanprot alanprot commented Sep 16, 2022

What this PR does:

We are always sorting the response even when its not needed.

We still need to sort the series if we are querying multiples backends (ex: ingesters + store-gateways) as the result needs to be sorted in order to be merged here.

Why we don't need to sort when querying only one backend?

  • Each backend should be responsible to not returning duplicated timeseries:
    • When querying ingesters the dedup happens here
    • When querying store gateways this flag is ignored as the series are already sorted on tsdb

Why sorting matters?

This sort can add considerable latency when fetching millions of timeseries as it have to sort all those using those tags.

Prometheus itself does similar thing on the remote_read code:

https://github.com/prometheus/prometheus/blob/d56d0a9d52c8614fdf1c5c652061929ede562f3e/storage/remote/read.go#L174

https://github.com/prometheus/prometheus/blob/d56d0a9d52c8614fdf1c5c652061929ede562f3e/storage/remote/codec.go#L148-L164

Im also removing this extra sort as it seems to not be needed as both ingesters and store-gateways should return the labels already sorted as we already make this assumption in other places like here and here

This change will also benefit GetSeries when update prometheus: See prometheus/prometheus#11311

before/after:

Screen Shot 2022-09-16 at 4 13 55 PM
Screen Shot 2022-09-16 at 4 14 08 PM

Which issue(s) this PR fixes:
Fixes #

Checklist

  • [] Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alan Protasio <[email protected]>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 17, 2022
@alanprot alanprot force-pushed the sortSeries branch 2 times, most recently from 1e9510d to 40e51ed Compare September 20, 2022 00:17
Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>
@alanprot alanprot marked this pull request as ready for review September 20, 2022 01:00
Signed-off-by: Alan Protasio <[email protected]>
sets := make(chan storage.SeriesSet, len(q.queriers))
for _, querier := range q.queriers {
go func(querier storage.Querier) {
// We should always select sorted here as we will need to merge the series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit misleading because I think what you meant that is "if we need to query multiple backend, then we need to merge results which depends on the result from each querier being sorted. We are not sure if each result is sorted default, so it's safer to force sort to ensure the merge can work properly"

@alvinlin123 alvinlin123 merged commit 01dc235 into cortexproject:master Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants