-
Notifications
You must be signed in to change notification settings - Fork 816
Modules can now start additional startup tasks, while server is starting #2119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This looks like a great qol change thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It might be worth noting the new startup behavior in Blocks documentation. If I'm not mistaken, the user must be aware that the startup function needs complete before a transfer can take place and tune accordingly?
pkg/cortex/cortex.go
Outdated
func (t *Cortex) start(target moduleName) error { | ||
deps := orderedDeps(target) | ||
|
||
wg := sync.WaitGroup{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are only returning errors you should consider using a sync.errgroup
instead of the waitgroup and channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks exactly like what I need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Withdrawing approval til integration tests are sorted
Before approving this, I will check return values from gRPC methods to make sure we send the correct errors and status codes back. |
Integration tests found a real bug ( |
…startup. Start functions are called concurrently, and only after all modules are initialized. Server's HTTP service starts while modules are still starting, so modules must be able to handle that. This is currently used by blocks-based ingester and querier. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
…d yet. codes.Unavailable seems to be the correct code, and client is expected to retry later (with some backoff). Signed-off-by: Peter Štibraný <[email protected]>
After some reading, I believe returning I've updated the code accordingly. |
Of course, ingester doesn't even register itself into the Ring, until it is ready. With blocks store situation is more complicated... both querier and ruler that use it are started immediately, and don't wait for "block store" to be ready first. All requests to the block store will however fail until it is. This may cause troubles and I will investigate more how to solve it. Ideally they should observe it's component for readiness before using it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I was looking for atm. Perfect timing!
@@ -67,7 +67,9 @@ require ( | |||
github.com/weaveworks/common v0.0.0-20200201141823-27e183090ab1 | |||
go.etcd.io/bbolt v1.3.3 | |||
go.etcd.io/etcd v0.0.0-20190709142735-eb7dd97135a5 | |||
go.uber.org/atomic v1.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this additional dependency, can we use https://golang.org/pkg/sync/atomic/ for atomic loads and stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this dependency, although under old name (github.com/uber-go/atomic). I've removed old version in PR #2127.
@pstibrany Is the querier and ruler using the ingester only a problem in single binary mode? Can we not register the ingester in the ring (or mark it pending) until it starts up? |
Ruler isn't using ingester. Querier is, but only indirectly, via gRPC calls. Problem that this PR is solving for QuerierChunkStore is that blocks-based QuerierChunkStore needs to fetch blocks metadata on startup, which can take a while. However, Ruler is using QuerierChunkStore and has no way to see whether QuerierChunkStore has finished its fetching and is now ready to use, or not. I'd like to solve this in a general way, and will send new PR showing how to do that. I'd suggest to keep this PR open for now, so that we can compare both approaches and see which one works better. (I don't really like approach in this PR, to be honest) |
I have sent alternative PR to this one, #2166. It revamps services module and supports everything that we need to correctly implement in this PR, in a much better way. If you have time, please take a look. |
Closing this in favour of #2166 |
Modules can now have start function for starting additional tasks on startup.
Start functions are called concurrently, and only after all modules are initialized. Server's HTTP service starts while modules are still starting, so modules must be able to handle that. Both ingester and queriers will return error.
This is currently used by blocks-based ingester (to replay WAL for locally stored TSDBs) and querier (to do initial fetch of blocks).
This is alternative to #2104.
Note: this is not ideal, but is relatively simple change. I think we should overhaul the lifecycle of our modules, and have separate NEW (instantiated, but nothing started yet), STARTING (replaying WALs, fetching blocks, whatever needs to be done for service to be ready) and RUNNING (service is running) states and make those states observable by other modules (for dependencies and for global view on the server, plus make those states observable via metrics / simple HTML page), but this PR doesn't go that far. I'll write a design document for that first.
Signed-off-by: Peter Štibraný [email protected]
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]