From 2a3a9500a9b606f3ee4f0f9f787660fcea01667d Mon Sep 17 00:00:00 2001 From: Bryan Boreham Date: Tue, 6 Aug 2019 14:48:06 +0000 Subject: [PATCH 1/2] Document the ingester hand-over process Signed-off-by: Bryan Boreham --- docs/architecture.md | 2 ++ docs/ingester-handover.md | 44 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) create mode 100644 docs/ingester-handover.md diff --git a/docs/architecture.md b/docs/architecture.md index 71604ac4017..953c8eab5f2 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -67,6 +67,8 @@ Ingesters are semi-stateful in that they always retain the last 12 hours worth o As *semi*-stateful processes, ingesters are *not* designed to be long-term data stores. In Cortex, that role is played by the [chunk store](#chunk-store). +A [hand-over process](ingester-handover.md) manages the state when ingesters are added, removed or replaced. + #### Write de-amplification Ingesters store the last 12 hours worth of samples in order to perform **write de-amplification**, i.e. batching and compressing samples for the same series and flushing them out to the [chunk store](#chunk-store). Under normal operations, there should be *many* orders of magnitude fewer queries per second (QPS) worth of writes to the chunk store than to the ingesters. diff --git a/docs/ingester-handover.md b/docs/ingester-handover.md new file mode 100644 index 00000000000..2245cf1c52a --- /dev/null +++ b/docs/ingester-handover.md @@ -0,0 +1,44 @@ +# Ingester Hand-over + +The [ingester](architecture.md#ingester) holds several hours of sample +data in memory. When we want to shut down an ingester, either for +software version update or to drain a node for maintenance, this data +must not be discarded. + +Each ingester goes through different states in its lifecycle. When +working normally, the state is `ACTIVE`. + +On start-up, an ingester first goes into state `PENDING`. After a +short time, if nothing happens, it adds itself to the ring and goes +into state ACTIVE. + +A running ingester is notified to shut down by Unix signal +`SIGINT`. On receipt of this signal it goes into state `LEAVING` and +looks for an ingester in state `PENDING`. If it finds one, that +ingester goes into state `JOINING` and the leaver transfers all its +in-memory data over to the joiner. On successful transfer the leaver +removes itself from the ring and exits and the joiner changes to +`ACTIVE`, taking over ownership of the leaver's +[ring tokens](architecture.md#hashing). + +If a leaving ingester does not find a pending ingester, it will flush +all of its chunks to the backing database, then remove itself from the +ring and exit. This may take tens of minutes to complete. + +During hand-over, neither the leaving nor joining ingesters will +accept new samples. Distributors are aware of this, and "spill" the +samples to the next ingester in the ring. This creates a set of extra +"spilled" chunks which will idle out and flush after hand-over is +complete. The sudden increase in flush queue can be alarming! + +The following metrics can be used to observe this process: + + - `cortex_member_ring_tokens_owned` - how many tokens each ingester thinks it owns + - `cortex_ring_tokens_owned` - how many tokens each ingester is seen to own by other components + - `cortex_ring_member_ownership_percent` same as `cortex_ring_tokens_owned` but expressed as a percentage + - `cortex_ring_members` - how many ingesters can be seen in each state, by other components + - `cortex_ingester_sent_chunks` - number of chunks sent by leaving ingester + - `cortex_ingester_received_chunks` - number of chunks received by joining ingester + +You can see the current state of the ring via http browser request to +`/ring` on a distributor. From 45d191ac801f6d2393e46aefa5d638e18dcaec0d Mon Sep 17 00:00:00 2001 From: Bryan Boreham Date: Thu, 8 Aug 2019 16:48:34 +0000 Subject: [PATCH 2/2] Document some command-line options for hand-over Signed-off-by: Bryan Boreham --- docs/arguments.md | 10 ++++++++++ docs/ingester-handover.md | 11 ++++++----- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/arguments.md b/docs/arguments.md index dc25fc181e4..f60ec343d46 100644 --- a/docs/arguments.md +++ b/docs/arguments.md @@ -1,5 +1,7 @@ # Cortex Arguments Explained +Duration arguments should be specified with a unit like `5s` or `3h`. Valid time units are "ms", "s", "m", "h". + ## Querier - `-querier.max-concurrent` @@ -155,6 +157,14 @@ It also talks to a KVStore and has it's own copies of the same flags used by the ## Ingester +- `-ingester.join-after` + + How long to wait in PENDING state during the [hand-over process](ingester-handover.md). (default 0s) + +- `-ingester.ingester.max-transfer-retries` + + How many times a LEAVING ingester tries to find a PENDING ingester during the [hand-over process](ingester-handover.md). Each attempt takes a second or so. (default 10) + - `-ingester.normalise-tokens` Write out "normalised" tokens to the ring. Normalised tokens consume less memory to encode and decode; as the ring is unmarshalled regularly, this significantly reduces memory usage of anything that watches the ring. diff --git a/docs/ingester-handover.md b/docs/ingester-handover.md index 2245cf1c52a..1cf8c7dda02 100644 --- a/docs/ingester-handover.md +++ b/docs/ingester-handover.md @@ -9,8 +9,8 @@ Each ingester goes through different states in its lifecycle. When working normally, the state is `ACTIVE`. On start-up, an ingester first goes into state `PENDING`. After a -short time, if nothing happens, it adds itself to the ring and goes -into state ACTIVE. +[short time](arguments.md#ingester), if nothing happens, it adds +itself to the ring and goes into state ACTIVE. A running ingester is notified to shut down by Unix signal `SIGINT`. On receipt of this signal it goes into state `LEAVING` and @@ -21,9 +21,10 @@ removes itself from the ring and exits and the joiner changes to `ACTIVE`, taking over ownership of the leaver's [ring tokens](architecture.md#hashing). -If a leaving ingester does not find a pending ingester, it will flush -all of its chunks to the backing database, then remove itself from the -ring and exit. This may take tens of minutes to complete. +If a leaving ingester does not find a pending ingester after [several +attempts](arguments.md#ingester), it will flush all of its chunks to +the backing database, then remove itself from the ring and exit. This +may take tens of minutes to complete. During hand-over, neither the leaving nor joining ingesters will accept new samples. Distributors are aware of this, and "spill" the