Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
## master / unreleased
* [CHANGE] Use cortex v1.17.1
* [CHANGE] Enable shuffle sharding in compactors
* [CHANGE] Remove chunks support for dashboards
* [CHANGE] Remove chunks support for dashboardsm
* [CHANGE] Target 3M memory series per ingester instead of 1.5M
* [CHANGE] Update jsonnet-libs to Fri Jul 19 12:51:49 2024 #57
* [ENHANCEMENT] Configure `-ingester.client.grpc-compression` to be `snappy-block`
* [ENHANCEMENT] Support Grafana 11 in Cortex Service Scaling Dashboard
Expand Down
6 changes: 3 additions & 3 deletions cortex-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -389,11 +389,11 @@
rules: [
{
alert: 'CortexProvisioningTooManyActiveSeries',
// We target each ingester to 1.5M in-memory series. This alert fires if the average
// number of series / ingester in a Cortex cluster is > 1.6M for 2h (we compact
// We target each ingester to 3.0M in-memory series. This alert fires if the average
// number of series / ingester in a Cortex cluster is > 3.2M for 2h (we compact
// the TSDB head every 2h).
expr: |||
avg by (%s) (cortex_ingester_memory_series) > 1.6e6
avg by (%s) (cortex_ingester_memory_series) > 3.2e6
||| % [$._config.alert_aggregation_labels],
'for': '2h',
labels: {
Expand Down
6 changes: 3 additions & 3 deletions cortex-mixin/docs/playbooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -555,13 +555,13 @@ How to **investigate**:

### CortexProvisioningTooManyActiveSeries

This alert fires if the average number of in-memory series per ingester is above our target (1.5M).
This alert fires if the average number of in-memory series per ingester is above our target (3.0M).

How to **fix**:
- Scale up ingesters
- To find out the Cortex clusters where ingesters should be scaled up and how many minimum replicas are expected:
```
ceil(sum by(cluster, namespace) (cortex_ingester_memory_series) / 1.5e6) >
ceil(sum by(cluster, namespace) (cortex_ingester_memory_series) / 3.0e6) >
count by(cluster, namespace) (cortex_ingester_memory_series)
```
- After the scale up, the in-memory series are expected to be reduced at the next TSDB head compaction (occurring every 2h)
Expand Down Expand Up @@ -595,7 +595,7 @@ How to **fix**:
kubectl -n <namespace> delete pod ingester-XXX
```
- Restarting an ingester typically reduces the memory allocated by mmap-ed files. After the restart, ingester may allocate this memory again over time, but it may give more time while working on a longer term solution
- Check the `Cortex / Writes Resources` dashboard to see if the number of series per ingester is above the target (1.5M). If so:
- Check the `Cortex / Writes Resources` dashboard to see if the number of series per ingester is above the target (3.0M). If so:
- Scale up ingesters
- Memory is expected to be reclaimed at the next TSDB head compaction (occurring every 2h)

Expand Down
6 changes: 3 additions & 3 deletions cortex-mixin/recording_rules.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ local utils = import 'mixin-utils/utils.libsonnet';

{
local _config = {
max_series_per_ingester: 1.5e6,
max_series_per_ingester: 3.0e6,
max_samples_per_sec_per_ingester: 80e3,
max_samples_per_sec_per_distributor: 240e3,
limit_utilisation_target: 0.6,
Expand Down Expand Up @@ -148,7 +148,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
||| % _config,
},
{
// Ingester should have 1.5M series in memory
// Ingester should have 3.0M series in memory
record: 'cluster_namespace_deployment_reason:required_replicas:count',
labels: {
deployment: 'ingester',
Expand All @@ -167,7 +167,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
{
// We should be about to cover 60% of our limits,
// and ingester can have 1.5M series in memory
// and ingester can have 3.0M series in memory
record: 'cluster_namespace_deployment_reason:required_replicas:count',
labels: {
deployment: 'ingester',
Expand Down