-
Notifications
You must be signed in to change notification settings - Fork 816
[Experimental] Memberlist singleton and support for multiple codecs #2016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] Memberlist singleton and support for multiple codecs #2016
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides some small nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @pstibrany! I left few minor comments.
Would you be up to add a integration test with single-binary Cortex and gossip to (hopefully) freeze the current implementation from future regressions?
Added integration test that verifies that Cortex in single-binary mode works using memberlist setup. |
Signed-off-by: Peter Štibraný <[email protected]>
This allows multiple clients to use the same memberlist.KV instance. This is another step towards supporting multiple codecs per KV. Signed-off-by: Peter Štibraný <[email protected]>
To support multiple codecs, we need to add a new field. As we're going to break the format anyway, we may as well switch to protobuf. Signed-off-by: Peter Štibraný <[email protected]>
Codec is now stored in the client, while KV has a registry of codecs that are used when receiving values. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
It was introduced for memberlist Client, but it will not be needed anymore as memberlist.KV will be a top-level component with its own lifecycle, separated from the client. Signed-off-by: Peter Štibraný <[email protected]>
It is initialized if any component actually uses it, but not sooner. If initialized, Stop method is called when server shuts down. Signed-off-by: Peter Štibraný <[email protected]>
Client now only verifies that KV knows about coded that client wants to use, but doesn't register it anymore. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
and all distributors can see all ingesters and their tokens. Signed-off-by: Peter Štibraný <[email protected]>
Added note on how to upgrade. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've found out this test is quite slow due to the table-manager
stop. Look at the timestamps (I've enabled info
log level):
cortex-1: level=info ts=2020-02-12T15:50:19.5615697Z caller=cortex.go:281 msg=stopping module=table-manager
cortex-1: level=info ts=2020-02-12T15:50:36.8263139Z caller=cortex.go:281 msg=stopping module=runtime-config
This is caused by:
cortex/pkg/chunk/table_manager.go
Line 180 in 58ef607
time.Sleep(time.Duration(rand.Int63n(int64(m.cfg.DynamoDBPollInterval)))) |
Would you mind fixing it in a separate PR, please? I should be just a matter of changing the table manager initial time.Sleep()
into a select
which returns if m.done
is closed.
This PR changes experimental memberlist implementation of KV store in following ways:
memberlist.Client
is now only a small wrapper aroundmemberlist.KV
. KV is a top-level cortex component (but cannot be run on its own), and is a singleton. KV is only initialized if any component actually wants to use memberlist as its KV store.These changes allow Cortex in single-binary mode to use memberlist as KV store. Previously, such attempts would fail with panic, since all components (distributor/ingester/querier) tried to create a memberlist instance, and it failed with metrics registration. Even if it didn't fail there, it wouldn't work because each instance would try to open same port for listening.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]