Skip to content

resource based throttling: reject only adhoc queries #6947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

erlan-z
Copy link
Contributor

@erlan-z erlan-z commented Aug 6, 2025

What this PR does:
Exclude rules queries from resource-based query rejection

Description:
This change improves the resource-based query rejection mechanism by ensuring that only ad-hoc API queries are rejected when resource thresholds are breached. Queries originating from rule evaluations are now excluded from rejection.

Introduced RequestSource metadata to distinguish between API and ruler queries.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@dosubot dosubot bot added the component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. label Aug 6, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 5e39a8b to 53cedc7 Compare August 6, 2025 16:22
@erlan-z erlan-z changed the title resource based throttling: reject only addhock queries resource based throttling: reject only adhoc queries Aug 6, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 53cedc7 to 47da956 Compare August 6, 2025 16:25
Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -2262,7 +2263,7 @@ func (i *Ingester) trackInflightQueryRequest() (func(), error) {

i.maxInflightQueryRequests.Track(i.inflightQueryRequests.Inc())

if i.resourceBasedLimiter != nil {
if i.resourceBasedLimiter != nil && requestmeta.RequestSourceFromContext(ctx) == requestmeta.SourceApi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think of adding helper functions in requestmeta itself? ie. requestmeta.RequestFromApi(ctx) and requestmeta.RequestFromRuler(ctx)

@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 47da956 to 8bb3d4f Compare August 8, 2025 18:19
Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

Comment on lines 7 to 10
const (
SourceApi = "api"
SourceRuler = "ruler"
)
Copy link
Member

@SungJin1212 SungJin1212 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reusing tripperware.SourceAPI and tripperware.SourceRuler?

@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 15, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from bfdbdd3 to 7cfa79a Compare August 15, 2025 20:21
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a feature that intends to protect storage layer, I am not sure if it is the best way to hardcode it by differentiating source of requests and only rejecting one type but not others. I understand the use case behind this change. But if we go with this, it opens a door that we may need to extend and support throttling based on other types of source such as user agent, dashboard ID, etc.

I think a better approach is to go with #6922 and throttle heavy queries no matter its source.

@erlan-z
Copy link
Contributor Author

erlan-z commented Aug 16, 2025

As a feature that intends to protect storage layer, I am not sure if it is the best way to hardcode it by differentiating source of requests and only rejecting one type but not others. I understand the use case behind this change. But if we go with this, it opens a door that we may need to extend and support throttling based on other types of source such as user agent, dashboard ID, etc.

I think a better approach is to go with #6922 and throttle heavy queries no matter its source.

Thanks for the review. I agree #6922 is the right direction to avoid hard-coding on user attributes like User-Agent or dashboard ID. That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven. So I don’t think this opens the door to extending based on those other characteristics.

I also recognize that ruler queries can sometimes be heavy, but that’s a rare corner case and not really the problem this feature is trying to solve. Because #6922 is more of a best-effort mechanism, I’d even suggest carrying this system-level distinction into that work as well, to avoid ruler queries being rejected in cases where precision isn’t perfect.

What do you think?

@yeya24
Copy link
Contributor

yeya24 commented Aug 17, 2025

That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven

I don't think we can easily identify queries coming from API as ad-hoc queries. There are different usecases that might be even more important than rules and it is all up to the user to decide the priority. Even rules can come from the query API endpoint from some remote evaluators like Thanos Ruler.

@erlan-z
Copy link
Contributor Author

erlan-z commented Aug 17, 2025

That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven

I don't think we can easily identify queries coming from API as ad-hoc queries. There are different usecases that might be even more important than rules and it is all up to the user to decide the priority. Even rules can come from the query API endpoint from some remote evaluators like Thanos Ruler.

Makes sense, thanks. I’m aligned with #6922. I realize it’s not always clear which queries are higher priority, and there can certainly be important API queries too, but since we can’t reliably identify those, it might help to at least avoid rejecting ruler queries, which are usually the priority ones we can identify in practice. Happy to leave this out here and revisit if needed.

…replace request source implementation on qfe when ruler calls qfe to unify experience across services

Signed-off-by: Erlan Zholdubai uulu <[email protected]>
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 7cfa79a to c4cfec4 Compare August 22, 2025 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants