Skip to content

Conversation

afhassan
Copy link
Contributor

@afhassan afhassan commented Feb 25, 2025

What this PR does:
Add new config max_query_response_size that limits the total response size of instant and range queries after decompression. The limit is applied to the total response size of all query shards.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we apply the limit only on instant query and range query tripperware, then we don't have a way to protect us for large response from GetSeries and GetLabels API.

Do we plan to do something there? Or rely on the existing gRPC message size limit.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an integration test?

Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
I believe the intend is to protect QFEs from OOMs. Could you test this and validate if it works as intended? Maybe reproduce a case where QFEs are getting OOMed and check whether the limit prevents this from happening.

@afhassan
Copy link
Contributor Author

Added a new config for max_query_response_size that tracks the total response size of all query shards.

I will reproduce QFE OOM and test the new limit + add integration test

Thanks for the feedback!

@afhassan
Copy link
Contributor Author

Addressed the previous feedback.

I think the PR is in a good place for a second review now


responseSizeLimiter := limiter.ResponseSizeLimiterFromContextWithFallback(ctx)

if strings.EqualFold(r.Header.Get("Content-Encoding"), "gzip") && len(buf.Bytes()) >= 4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extract the logic to estimate the uncompressed size into its own function.
What happens when it overflows? For example if the uncompressed size is 5GB, what will be these last 4 bytes be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a 32-bit unsigned integer. When it overflows it should loop around so it will show 1 GB instead of 5 GB. It will not be a negative value.

The only risk in that case is not enforcing the limit when we should have.

Copy link
Contributor

@harry671003 harry671003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great work! Left some optional nit comments.

}

func BodyBuffer(res *http.Response, logger log.Logger) ([]byte, error) {
func BodyBytes(res *http.Response, responseSizeLimiter *limiter.ResponseSizeLimiter, logger log.Logger) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does this method need to be renamed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function does not return body buffer. It returns body bytes array. I made that distinction when I split it into two functions, but even as one function it didn't make sense to me that it is called BodyBuffer() when it does not return a body buffer.

I am okay with keeping the name or changing it to something else, but what is the reason for using BodyBuffer()

Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a great feature! I just have few minor comments

Comment on lines -478 to -480
} else if strings.EqualFold(res.Header.Get("Content-Encoding"), "snappy") {
sReader := snappy.NewReader(buf)
return io.ReadAll(sReader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we double check if this is safe to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the compression headers for instant and range query requests here https://github.com/afhassan/cortex/blob/be0fc7f216589a3e9cb66af85ee25cf192fd533b/pkg/querier/tripperware/query.go#L762

We currently only support gzip compression (default) or no compression. We never set any other Accept-Encoding header.

Use compression for metrics query API or instant and range query APIs.
Supports 'gzip' and '' (disable compression)
CLI flag: -querier.response-compression
[response_compression: | default = "gzip"]

Snappy was added because we planned to potentially support it as well, but I think that can be left for a future PR to do.

afhassan added 16 commits March 25, 2025 12:32
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
Signed-off-by: Ahmed Hassan <[email protected]>
@afhassan
Copy link
Contributor Author

Rebased the changes and force pushed

Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot!

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and great work. Most of the comments are nits I think


var (
responseLimiterCtxKey = &responseSizeLimiterCtxKey{}
ErrMaxResponseSizeHit = "the query response size exceeds limit (limit: %d bytes)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error message clear enough to tell it is the total response size for sharded queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the total received query response size exceeds limit (limit: %d bytes)? II don't know if we should explicitly mention shards since it is an internal detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried that the query response size exceeds limit might be a bit misleading to users as it reads as the query response size in their result. People will be confused that how their query exceeds several GBs even though the final size is only several MBs.

If we don't have a better error message we can go with what you have now

@yeya24 yeya24 merged commit f97ec76 into cortexproject:master Mar 27, 2025
17 checks passed
justinjung04 pushed a commit to justinjung04/cortex that referenced this pull request Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants