Skip to content

Commit 4be2876

Browse files
authored
Removed ingesters blocks transfer support (#2996)
* Removed ingesters blocks transfer support Signed-off-by: Marco Pracucci <[email protected]> * Updated CHANGELOG Signed-off-by: Marco Pracucci <[email protected]> * Small improvements based on code review feedback Signed-off-by: Marco Pracucci <[email protected]> * Improved documentation based on feedback receive Signed-off-by: Marco Pracucci <[email protected]> * Simplified explaination about WAL loading Signed-off-by: Marco Pracucci <[email protected]>
1 parent 5e5b96d commit 4be2876

27 files changed

+221
-1358
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
## master / unreleased
44

5+
* [CHANGE] Experimental blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
6+
* `cortex_ingester_sent_files`
7+
* `cortex_ingester_received_files`
8+
* `cortex_ingester_received_bytes_total`
9+
* `cortex_ingester_sent_bytes_total`
510
* [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
611
* [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
712

development/tsdb-blocks-storage-s3-single-binary/config/cortex.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ ingester_client:
1313
use_gzip_compression: true
1414

1515
ingester:
16-
max_transfer_retries: 1
17-
1816
lifecycler:
1917
# We want to start immediately.
2018
join_after: 0

development/tsdb-blocks-storage-s3/config/cortex.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ ingester_client:
1313
use_gzip_compression: true
1414

1515
ingester:
16-
max_transfer_retries: 1
17-
1816
lifecycler:
1917
# We want to start immediately.
2018
join_after: 0

docs/architecture.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -153,17 +153,18 @@ Incoming series are not immediately written to the storage but kept in memory an
153153

154154
Ingesters contain a **lifecycler** which manages the lifecycle of an ingester and stores the **ingester state** in the [hash ring](#the-hash-ring). Each ingester could be in one of the following states:
155155

156-
1. `PENDING` is an ingester's state when it just started and is waiting for a hand-over from another ingester that is `LEAVING`. If no hand-over occurs within the configured timeout period ("auto-join timeout", configurable via `-ingester.join-after` option), the ingester will join the ring with a new set of random tokens (ie. during a scale up). When hand-over process starts, state changes to `JOINING`.
157-
158-
2. `JOINING` is an ingester's state in two situations. First, ingester will switch to a `JOINING` state from `PENDING` state after auto-join timeout. In this case, ingester will generate tokens, store them into the ring, optionally observe the ring for token conflicts and then move to `ACTIVE` state. Second, ingester will also switch into a `JOINING` state as a result of another `LEAVING` ingester initiating a hand-over process with `PENDING` (which then switches to `JOINING` state). `JOINING` ingester then receives series and tokens from `LEAVING` ingester, and if everything goes well, `JOINING` ingester switches to `ACTIVE` state. If hand-over process fails, `JOINING` ingester will move back to `PENDING` state and either wait for another hand-over or auto-join timeout.
159-
160-
3. `ACTIVE` is an ingester's state when it is fully initialized. It may receive both write and read requests for tokens it owns.
161-
162-
4. `LEAVING` is an ingester's state when it is shutting down. It cannot receive write requests anymore, while it could still receive read requests for series it has in memory. While in this state, the ingester may look for a `PENDING` ingester to start a hand-over process with, used to transfer the state from `LEAVING` ingester to the `PENDING` one, during a rolling update (`PENDING` ingester moves to `JOINING` state during hand-over process). If there is no new ingester to accept hand-over, ingester in `LEAVING` state will flush data to storage instead.
163-
164-
5. `UNHEALTHY` is an ingester's state when it has failed to heartbeat to the ring's KV Store. While in this state, distributors skip the ingester while building the replication set for incoming series and the ingester does not receive write or read requests.
165-
166-
For more information about the hand-over process, please check out the [Ingester hand-over](guides/ingester-handover.md) documentation.
156+
- **`PENDING`**<br />
157+
The ingester has just started. While in this state, the ingester doesn't receive neither write and read requests, and could be waiting for time series data transfer from another ingester if running the chunks storage and the [hand-over](guides/ingesters-rolling-updates.md#chunks-storage-with-wal-disabled-hand-over) is enabled.
158+
- **`JOINING`**<br />
159+
The ingester is starting up and joining the ring. While in this state the ingester doesn't receive neither write and read requests. The ingester will join the ring using tokens received by a leaving ingester as part of the [hand-over](guides/ingesters-rolling-updates.md#chunks-storage-with-wal-disabled-hand-over) process (if enabled), otherwise it could load tokens from disk (if `-ingester.tokens-file-path` is configured) or generate a set of new random ones. Finally, the ingester optionally observes the ring for tokens conflicts and then, once any conflict is resolved, will move to `ACTIVE` state.
160+
- **`ACTIVE`**<br />
161+
The ingester is up and running. While in this state the ingester can receive both write and read requests.
162+
- **`LEAVING`**<br />
163+
The ingester is shutting down and leaving the ring. While in this state the ingester doesn't receive write requests, while it could receive read requests.
164+
- **`UNHEALTHY`**<br />
165+
The ingester has failed to heartbeat to the ring's KV Store. While in this state, distributors skip the ingester while building the replication set for incoming series and the ingester does not receive write or read requests.
166+
167+
_The ingester states are interally used for different purposes, including the series hand-over process supported by the chunks storage. For more information about it, please check out the [Ingester hand-over](guides/ingesters-rolling-updates.md#chunks-storage-with-wal-disabled-hand-over) documentation._
167168

168169
Ingesters are **semi-stateful**.
169170

docs/blocks-storage/querier.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -475,9 +475,8 @@ blocks_storage:
475475
# CLI flag: -experimental.blocks-storage.tsdb.wal-compression-enabled
476476
[wal_compression_enabled: <boolean> | default = false]
477477
478-
# If true, and transfer of blocks on shutdown fails or is disabled,
479-
# incomplete blocks are flushed to storage instead. If false, incomplete
480-
# blocks will be reused after restart, and uploaded when finished.
478+
# True to flush blocks to storage on shutdown. If false, incomplete blocks
479+
# will be reused after restart.
481480
# CLI flag: -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
482481
[flush_blocks_on_shutdown: <boolean> | default = false]
483482

docs/blocks-storage/store-gateway.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -502,9 +502,8 @@ blocks_storage:
502502
# CLI flag: -experimental.blocks-storage.tsdb.wal-compression-enabled
503503
[wal_compression_enabled: <boolean> | default = false]
504504
505-
# If true, and transfer of blocks on shutdown fails or is disabled,
506-
# incomplete blocks are flushed to storage instead. If false, incomplete
507-
# blocks will be reused after restart, and uploaded when finished.
505+
# True to flush blocks to storage on shutdown. If false, incomplete blocks
506+
# will be reused after restart.
508507
# CLI flag: -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
509508
[flush_blocks_on_shutdown: <boolean> | default = false]
510509

docs/configuration/arguments.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -306,11 +306,11 @@ It also talks to a KVStore and has it's own copies of the same flags used by the
306306

307307
- `-ingester.join-after`
308308

309-
How long to wait in PENDING state during the [hand-over process](../guides/ingester-handover.md). (default 0s)
309+
How long to wait in PENDING state during the [hand-over process](../guides/ingesters-rolling-updates.md#chunks-storage-with-wal-disabled-hand-over) (supported only by the chunks storage). (default 0s)
310310

311311
- `-ingester.max-transfer-retries`
312312

313-
How many times a LEAVING ingester tries to find a PENDING ingester during the [hand-over process](../guides/ingester-handover.md). Each attempt takes a second or so. Negative value or zero disables hand-over process completely. (default 10)
313+
How many times a LEAVING ingester tries to find a PENDING ingester during the [hand-over process](../guides/ingesters-rolling-updates.md#chunks-storage-with-wal-disabled-hand-over) (supported only by the chunks storage). Negative value or zero disables hand-over process completely. (default 10)
314314

315315
- `-ingester.normalise-tokens`
316316

docs/configuration/config-file-reference.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,8 @@ lifecycler:
543543
[availability_zone: <string> | default = ""]
544544
545545
# Number of times to try and transfer chunks before falling back to flushing.
546-
# Negative value or zero disables hand-over.
546+
# Negative value or zero disables hand-over. This feature is supported only by
547+
# the chunks storage.
547548
# CLI flag: -ingester.max-transfer-retries
548549
[max_transfer_retries: <int> | default = 10]
549550
@@ -3263,9 +3264,8 @@ tsdb:
32633264
# CLI flag: -experimental.blocks-storage.tsdb.wal-compression-enabled
32643265
[wal_compression_enabled: <boolean> | default = false]
32653266
3266-
# If true, and transfer of blocks on shutdown fails or is disabled, incomplete
3267-
# blocks are flushed to storage instead. If false, incomplete blocks will be
3268-
# reused after restart, and uploaded when finished.
3267+
# True to flush blocks to storage on shutdown. If false, incomplete blocks
3268+
# will be reused after restart.
32693269
# CLI flag: -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
32703270
[flush_blocks_on_shutdown: <boolean> | default = false]
32713271

docs/configuration/single-process-config-blocks-gossip-1.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,6 @@ ingester_client:
2929
use_gzip_compression: true
3030

3131
ingester:
32-
# Disable blocks transfers on ingesters shutdown or rollout.
33-
max_transfer_retries: 0
34-
3532
lifecycler:
3633
# The address to advertise for this ingester. Will be autodiscovered by
3734
# looking up address on eth0 or en0; can be specified if this fails.

docs/configuration/single-process-config-blocks-gossip-2.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,6 @@ ingester_client:
2929
use_gzip_compression: true
3030

3131
ingester:
32-
# Disable blocks transfers on ingesters shutdown or rollout.
33-
max_transfer_retries: 0
34-
3532
lifecycler:
3633
# The address to advertise for this ingester. Will be autodiscovered by
3734
# looking up address on eth0 or en0; can be specified if this fails.

docs/configuration/single-process-config-blocks-tls.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,6 @@ ingester_client:
3737
tls_ca_path: "root.crt"
3838

3939
ingester:
40-
# Disable blocks transfers on ingesters shutdown or rollout.
41-
max_transfer_retries: 0
42-
4340
lifecycler:
4441
# The address to advertise for this ingester. Will be autodiscovered by
4542
# looking up address on eth0 or en0; can be specified if this fails.

docs/configuration/single-process-config-blocks.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,6 @@ ingester_client:
2828
use_gzip_compression: true
2929

3030
ingester:
31-
# Disable blocks transfers on ingesters shutdown or rollout.
32-
max_transfer_retries: 0
33-
3431
lifecycler:
3532
# The address to advertise for this ingester. Will be autodiscovered by
3633
# looking up address on eth0 or en0; can be specified if this fails.

docs/guides/ingester-handover.md

Lines changed: 0 additions & 49 deletions
This file was deleted.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: "Ingesters rolling updates"
3+
linkTitle: "Ingesters rolling updates"
4+
weight: 102
5+
slug: ingesters-rolling-updates
6+
---
7+
8+
Cortex [ingesters](architecture.md#ingester) are semi-stateful.
9+
A running ingester holds several hours of time series data in memory, before they're flushed to the long-term storage.
10+
When an ingester shutdowns, because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss.
11+
12+
In this document we describe the techniques employed to safely handle rolling updates, based on different setups:
13+
14+
- [Blocks storage](#blocks-storage)
15+
- [Chunks storage with WAL enabled](#chunks-storage-with-wal-enabled)
16+
- [Chunks storage with WAL disabled](#chunks-storage-with-wal-disabled-hand-over)
17+
18+
## Blocks storage
19+
20+
The Cortex [blocks storage](../blocks-storage/_index.md) requires ingesters to run with a persistent disk where the TSDB WAL and blocks are stored (eg. a StatefulSet when deployed on Kubernetes).
21+
22+
During a rolling update, the leaving ingester closes the open TSDBs, synchronize the data to disk (`fsync`) and releases the disk resources.
23+
The new ingester, which is expected to reuse the same disk of the leaving one, will replay the TSDB WAL on startup in order to load back in memory the time series that have not been compacted into a block yet.
24+
25+
_The blocks storage doesn't support the series [hand-over](#chunks-storage-with-wal-disabled-hand-over)._
26+
27+
## Chunks storage
28+
29+
The Cortex chunks storage optionally supports a write-ahead log (WAL).
30+
The rolling update procedure for a Cortex cluster running the chunks storage depends whether the WAL is enabled or not.
31+
32+
### Chunks storage with WAL enabled
33+
34+
Similarly to the blocks storage, when Cortex is running the chunks storage with WAL enabled, it requires ingesters to run with a persistent disk where the WAL is stored (eg. a StatefulSet when deployed on Kubernetes).
35+
36+
During a rolling update, the leaving ingester closes the WAL, synchronize the data to disk (`fsync`) and releases the disk resources.
37+
The new ingester, which is expected to reuse the same disk of the leaving one, will replay the WAL on startup in order to load back in memory the time series data.
38+
39+
_For more information about the WAL, please refer to [Ingesters with WAL](../production/ingesters-with-wal.md)._
40+
41+
### Chunks storage with WAL disabled (hand-over)
42+
43+
When Cortex is running the chunks storage with WAL disabled, Cortex supports on-the-fly series hand-over between a leaving ingester and a joining one.
44+
45+
The hand-over is based on the ingesters state stored in the ring. Each ingester could be in one of the following **states**:
46+
47+
- `PENDING`
48+
- `JOINING`
49+
- `ACTIVE`
50+
- `LEAVING`
51+
52+
On startup, an ingester goes into the **`PENDING`** state.
53+
In this state, the ingester is waiting for a hand-over from another ingester that is `LEAVING`.
54+
If no hand-over occurs within the configured timeout period ("auto-join timeout", configurable via `-ingester.join-after` option), the ingester will join the ring with a new set of random tokens (eg. during a scale up) and will switch its state to `ACTIVE`.
55+
56+
When a running ingester in the **`ACTIVE`** state is notified to shutdown via `SIGINT` or `SIGTERM` Unix signal, the ingester switches to `LEAVING` state. In this state it cannot receive write requests anymore, but it can still receive read requests for series it has in memory.
57+
58+
A **`LEAVING`** ingester looks for a `PENDING` ingester to start a hand-over process with.
59+
If it finds one, that ingester goes into the `JOINING` state and the leaver transfers all its in-memory data over to the joiner.
60+
On successful transfer the leaver removes itself from the ring and exits, while the joiner changes its state to `ACTIVE`, taking over ownership of the leaver's [ring tokens](../architecture.md#hashing). As soon as the joiner switches it state to `ACTIVE`, it will start receive both write requests from distributors and queries from queriers.
61+
62+
If the `LEAVING` ingester does not find a `PENDING` ingester after `-ingester.max-transfer-retries` retries, it will flush all of its chunks to the long-term storage, then removes itself from the ring and exits. The chunks flushing to the storage may take several minutes to complete.
63+
64+
#### Higher number of series / chunks during rolling updates
65+
66+
During hand-over, neither the leaving nor joining ingesters will
67+
accept new samples. Distributors are aware of this, and "spill" the
68+
samples to the next ingester in the ring. This creates a set of extra
69+
"spilled" series and chunks which will idle out and flush after hand-over is
70+
complete.
71+
72+
#### Observability
73+
74+
The following metrics can be used to observe this process:
75+
76+
- **`cortex_member_ring_tokens_owned`**<br />
77+
How many tokens each ingester thinks it owns.
78+
- **`cortex_ring_tokens_owned`**<br />
79+
How many tokens each ingester is seen to own by other components.
80+
- **`cortex_ring_member_ownership_percent`**<br />
81+
Same as `cortex_ring_tokens_owned` but expressed as a percentage.
82+
- **`cortex_ring_members`**<br />
83+
How many ingesters can be seen in each state, by other components.
84+
- **`cortex_ingester_sent_chunks`**<br />
85+
Number of chunks sent by leaving ingester.
86+
- **`cortex_ingester_received_chunks`**<br />
87+
Number of chunks received by joining ingester.
88+
89+
You can see the current state of the ring via http browser request to
90+
`/ring` on a distributor.

integration/ingester_hand_over_test.go

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,6 @@ import (
1616
"github.com/cortexproject/cortex/integration/e2ecortex"
1717
)
1818

19-
func TestIngesterHandOverWithBlocksStorage(t *testing.T) {
20-
runIngesterHandOverTest(t, BlocksStorageFlags, func(t *testing.T, s *e2e.Scenario) {
21-
minio := e2edb.NewMinio(9000, BlocksStorageFlags["-experimental.blocks-storage.s3.bucket-name"])
22-
require.NoError(t, s.StartAndWaitReady(minio))
23-
})
24-
}
25-
2619
func TestIngesterHandOverWithChunksStorage(t *testing.T) {
2720
runIngesterHandOverTest(t, ChunksStorageFlags, func(t *testing.T, s *e2e.Scenario) {
2821
dynamo := e2edb.NewDynamoDB()
@@ -77,9 +70,7 @@ func runIngesterHandOverTest(t *testing.T, flags map[string]string, setup func(t
7770
assert.Equal(t, expectedVector, result.(model.Vector))
7871

7972
// Ensure 1st ingester metrics are tracked correctly.
80-
if flags["-store.engine"] != blocksStorageEngine {
81-
require.NoError(t, ingester1.WaitSumMetrics(e2e.Equals(1), "cortex_ingester_chunks_created_total"))
82-
}
73+
require.NoError(t, ingester1.WaitSumMetrics(e2e.Equals(1), "cortex_ingester_chunks_created_total"))
8374

8475
// Start ingester-2.
8576
ingester2 := e2ecortex.NewIngester("ingester-2", consul.NetworkHTTPEndpoint(), mergeFlags(flags, map[string]string{

0 commit comments

Comments
 (0)