-
Notifications
You must be signed in to change notification settings - Fork 833
Enforce query response size limit after decompression in query-frontend #6607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b1de34
to
967ef4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we apply the limit only on instant query and range query tripperware, then we don't have a way to protect us for large response from GetSeries and GetLabels API.
Do we plan to do something there? Or rely on the existing gRPC message size limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
I believe the intend is to protect QFEs from OOMs. Could you test this and validate if it works as intended? Maybe reproduce a case where QFEs are getting OOMed and check whether the limit prevents this from happening.
Added a new config for I will reproduce QFE OOM and test the new limit + add integration test Thanks for the feedback! |
Addressed the previous feedback. I think the PR is in a good place for a second review now |
|
||
responseSizeLimiter := limiter.ResponseSizeLimiterFromContextWithFallback(ctx) | ||
|
||
if strings.EqualFold(r.Header.Get("Content-Encoding"), "gzip") && len(buf.Bytes()) >= 4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Extract the logic to estimate the uncompressed size into its own function.
What happens when it overflows? For example if the uncompressed size is 5GB, what will be these last 4 bytes be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a 32-bit unsigned integer. When it overflows it should loop around so it will show 1 GB instead of 5 GB. It will not be a negative value.
The only risk in that case is not enforcing the limit when we should have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall great work! Left some optional nit comments.
} | ||
|
||
func BodyBuffer(res *http.Response, logger log.Logger) ([]byte, error) { | ||
func BodyBytes(res *http.Response, responseSizeLimiter *limiter.ResponseSizeLimiter, logger log.Logger) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Does this method need to be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function does not return body buffer. It returns body bytes array. I made that distinction when I split it into two functions, but even as one function it didn't make sense to me that it is called BodyBuffer()
when it does not return a body buffer.
I am okay with keeping the name or changing it to something else, but what is the reason for using BodyBuffer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a great feature! I just have few minor comments
} else if strings.EqualFold(res.Header.Get("Content-Encoding"), "snappy") { | ||
sReader := snappy.NewReader(buf) | ||
return io.ReadAll(sReader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we double check if this is safe to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the compression headers for instant and range query requests here https://github.com/afhassan/cortex/blob/be0fc7f216589a3e9cb66af85ee25cf192fd533b/pkg/querier/tripperware/query.go#L762
We currently only support gzip compression (default) or no compression. We never set any other Accept-Encoding
header.
Use compression for metrics query API or instant and range query APIs.
Supports 'gzip' and '' (disable compression)
CLI flag: -querier.response-compression
[response_compression: | default = "gzip"]
Snappy was added because we planned to potentially support it as well, but I think that can be left for a future PR to do.
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
99f8630
to
295d4b7
Compare
Rebased the changes and force pushed |
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and great work. Most of the comments are nits I think
|
||
var ( | ||
responseLimiterCtxKey = &responseSizeLimiterCtxKey{} | ||
ErrMaxResponseSizeHit = "the query response size exceeds limit (limit: %d bytes)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the error message clear enough to tell it is the total response size for sharded queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the total received query response size exceeds limit (limit: %d bytes)
? II don't know if we should explicitly mention shards since it is an internal detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried that the query response size exceeds limit
might be a bit misleading to users as it reads as the query response size in their result. People will be confused that how their query exceeds several GBs even though the final size is only several MBs.
If we don't have a better error message we can go with what you have now
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
What this PR does:
Add new config
max_query_response_size
that limits the total response size of instant and range queries after decompression. The limit is applied to the total response size of all query shards.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]