-
Notifications
You must be signed in to change notification settings - Fork 816
Implement max_fetched_data_bytes_per_query limit #4854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -658,12 +658,16 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |
} | ||
} | ||
chunksSize := countChunkBytes(s) | ||
dataSize := countDataBytes(s) | ||
if chunkBytesLimitErr := queryLimiter.AddChunkBytes(chunksSize); chunkBytesLimitErr != nil { | ||
return validation.LimitError(chunkBytesLimitErr.Error()) | ||
} | ||
if chunkLimitErr := queryLimiter.AddChunks(len(s.Chunks)); chunkLimitErr != nil { | ||
return validation.LimitError(chunkLimitErr.Error()) | ||
} | ||
if dataBytesLimitErr := queryLimiter.AddDataBytes(dataSize); dataBytesLimitErr != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One question here. So the query limiter we are using have the same lifecycle with each query, let's say if we first query a SG and we limit some data bytes, then maybe due to network issue or SG restart, the stream erros and we need to retry another store gateway. But it is also okay to hit the limit, and the query frontend will retry. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a valid point. I also want to note that we are not doing this for other existing limits like fetched chunks and fetched series limits. The query-frontend only retries 5XXs. In this case, we'll be returning a 422 error which will not be retried. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do something like:
If others agree with this approach, I can implement this as an enhancement in another PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. We should follow up with another PR to fix this i think! |
||
return validation.LimitError(dataBytesLimitErr.Error()) | ||
} | ||
} | ||
|
||
if w := resp.GetWarning(); w != "" { | ||
|
@@ -687,14 +691,17 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |
|
||
numSeries := len(mySeries) | ||
chunkBytes := countChunkBytes(mySeries...) | ||
dataBytes := countDataBytes(mySeries...) | ||
|
||
reqStats.AddFetchedSeries(uint64(numSeries)) | ||
reqStats.AddFetchedChunkBytes(uint64(chunkBytes)) | ||
reqStats.AddFetchedDataBytes(uint64(dataBytes)) | ||
|
||
level.Debug(spanLog).Log("msg", "received series from store-gateway", | ||
"instance", c.RemoteAddress(), | ||
"fetched series", numSeries, | ||
"fetched chunk bytes", chunkBytes, | ||
"fetched data bytes", dataBytes, | ||
"requested blocks", strings.Join(convertULIDsToString(blockIDs), " "), | ||
"queried blocks", strings.Join(convertULIDsToString(myQueriedBlocks), " ")) | ||
|
||
|
@@ -1000,6 +1007,15 @@ func countChunkBytes(series ...*storepb.Series) (count int) { | |
return count | ||
} | ||
|
||
// countDataBytes returns the combined size of the all series | ||
func countDataBytes(series ...*storepb.Series) (count int) { | ||
for _, s := range series { | ||
count += s.Size() | ||
} | ||
|
||
return count | ||
} | ||
|
||
// only retry connection issues | ||
func isRetryableError(err error) bool { | ||
switch status.Code(err) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this limit apply only to
Select
call? Since I don't see we are limiting label names and label values. If that's the case probably we can mention it in the doc or give the flag a better name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the only question I have now. Other LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of the limit is consistent with other similar limits. I think we can implement this limit also on the LabelNames and LabelValues as well. Right now I've updated the doc saying that it is only applied for query, query_range and series APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
when running Cortex with blocks storage
Currently we only have block storage as chunk storage was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other limits have this line:
Should we remove it everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also address this in the PR to handle retryable failures.