Skip to content

Added /ready endpoint to queriers #1934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Dec 23, 2019

What this PR does:
The querier - when running with the TSDB experimental storage - does the initial sync of all TSDB blocks' index header, which may take some time (order of minutes). Until successfully completed, it will not be ready to serve queriers.

In this PR:

  • Add /ready endpoint to the querier, which always return 204, once the initialization has been done
  • Start the query frontend worker only after the store and querier have been successfully initialized

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci
Copy link
Contributor Author

The failing lint looks unrelated.

Copy link
Contributor

@thorfour thorfour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit otherwise good change


// Once the execution reaches this point, all synchronous initialization has been
// done and the querier is ready to serve queries, so we're just returning a 202.
t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit I think I'd rather see it return something like a 503 before its ready, instead of timing out. And then switch to the 204 once it gets here. But I don't think it's a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The reason why I didn't do it, is because the HTTP server is not opened until the cortex.Run() is executed, which occurs after all the modules have been initialized.

The cleaner solution would be not having any synchronous initialization in the modules initialization (so that modules are fast to initialize), while making sure we do start the querier worker only after the asynchronous initial sync is completed, but this would require some significative refactoring which I'm not sure it's worth at this stage.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not worth it right now. This change is good

@pracucci
Copy link
Contributor Author

The failed lint looks unrelated:

go: finding sigs.k8s.io/structured-merge-diff v0.0.0-20190525122527-15d366b2352e
github.com/apache/[email protected]: stream error: stream ID 609; INTERNAL_ERROR
Makefile:140: recipe for target 'mod-check' failed
make: *** [mod-check] Error 1

Exited with code exit status 2

CHANGELOG.md Outdated
@@ -15,6 +15,8 @@
* [FEATURE] The distributor can now drop labels from samples (similar to the removal of the replica label for HA ingestion) per user via the `distributor.drop-label` flag. #1726
* [FEATURE] Added `global` ingestion rate limiter strategy. Deprecated `-distributor.limiter-reload-period` flag. #1766
* [FEATURE] Added support for Microsoft Azure blob storage to be used for storing chunk data. #1913
* [FEATURE] Added readiness probe endpoint`/ready` to queriers. #1934
* [ENHANCEMENT] Start the query frontend worker - in the querier target - after the querier has been successfully initialized. #1934
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the internals and has no operator impact. Shoudn't be in the changelog.

@pracucci pracucci force-pushed the add-readiness-probe-to-querier branch from 2549295 to 5846404 Compare December 30, 2019 10:21
@pracucci pracucci force-pushed the add-readiness-probe-to-querier branch from 9275efc to ba144b2 Compare January 7, 2020 08:10
@gouthamve gouthamve merged commit 729f308 into cortexproject:master Jan 7, 2020
@pracucci pracucci deleted the add-readiness-probe-to-querier branch January 7, 2020 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants