Skip to content

Commit 8f6da89

Browse files
authored
Allowing rule backup for rules API HA (#5782)
* Allowing ruler replication to be configurable Signed-off-by: Emmanuel Lodovice <[email protected]> * Allow rules to be loaded to rulers as backup for List rules API HA Signed-off-by: Emmanuel Lodovice <[email protected]> * Add integration test for rulers API with backup enabled Signed-off-by: Emmanuel Lodovice <[email protected]> * Mark the entire feature as experimental and improve variable names Signed-off-by: Emmanuel Lodovice <[email protected]> * Rename backUpRuleGroups to setRuleGroups to make it code less confusing Signed-off-by: Emmanuel Lodovice <[email protected]> * Remove backup manager lock because its not needed Signed-off-by: Emmanuel Lodovice <[email protected]> * Improve code quality - Remove duplicate code and use better data structures - Make backup rule_group label match the prometheus rule_group label - Skip initialization when feature is not enabled Signed-off-by: Emmanuel Lodovice <[email protected]> * Store rulepb.RuleGroupList in rules backup instead of promRules.Group Signed-off-by: Emmanuel Lodovice <[email protected]> * Add GetReplicationSetForOperationWithNoQuorum ring method and use it in getShardedRules Signed-off-by: Emmanuel Lodovice <[email protected]> * Refactor getLocalRules to make the method shorter Signed-off-by: Emmanuel Lodovice <[email protected]> * Add new ring method to get all instances and created a new method in ruler to get Replicaset without requiring quorum Signed-off-by: Emmanuel Lodovice <[email protected]> * Fix flaky test due to sorting issue Signed-off-by: Emmanuel Lodovice <[email protected]> --------- Signed-off-by: Emmanuel Lodovice <[email protected]>
1 parent 44a5d25 commit 8f6da89

18 files changed

+1720
-145
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
* [FEATURE] Ruler: Add `ruler.concurrent-evals-enabled` flag to enable concurrent evaluation within a single rule group for independent rules. Maximum concurrency can be configured via `ruler.max-concurrent-evals`. #5766
2121
* [FEATURE] Distributor Queryable: Experimental: Add config `zone_results_quorum_metadata`. When querying ingesters using metadata APIs such as label names and values, only results from quorum number of zones will be included and merged. #5779
2222
* [FEATURE] Storage Cache Clients: Add config `set_async_circuit_breaker_config` to utilize the circuit breaker pattern for dynamically thresholding asynchronous set operations. Implemented in both memcached and redis cache clients. #5789
23+
* [FEATURE] Ruler: Add experimental `experimental.ruler.api-deduplicate-rules` flag to remove duplicate rule groups from the Prometheus compatible rules API endpoint. Add experimental `ruler.ring.replication-factor` and `ruler.ring.zone-awareness-enabled` flags to configure rule group replication, but only the first ruler in the replicaset evaluates the rule group, the rest will just hold a copy as backup. Add experimental `experimental.ruler.api-enable-rules-backup` flag to configure rulers to send the rule group backups stored in the replicaset to handle events when a ruler is down during an API request to list rules. #5782
2324
* [ENHANCEMENT] Store Gateway: Added `-store-gateway.enabled-tenants` and `-store-gateway.disabled-tenants` to explicitly enable or disable store-gateway for specific tenants. #5638
2425
* [ENHANCEMENT] Compactor: Add new compactor metric `cortex_compactor_start_duration_seconds`. #5683
2526
* [ENHANCEMENT] Upgraded Docker base images to `alpine:3.18`. #5684

docs/configuration/config-file-reference.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4245,6 +4245,16 @@ ring:
42454245
# CLI flag: -ruler.ring.heartbeat-timeout
42464246
[heartbeat_timeout: <duration> | default = 1m]
42474247
4248+
# EXPERIMENTAL: The replication factor to use when loading rule groups for API
4249+
# HA.
4250+
# CLI flag: -ruler.ring.replication-factor
4251+
[replication_factor: <int> | default = 1]
4252+
4253+
# EXPERIMENTAL: True to enable zone-awareness and load rule groups across
4254+
# different availability zones for API HA.
4255+
# CLI flag: -ruler.ring.zone-awareness-enabled
4256+
[zone_awareness_enabled: <boolean> | default = false]
4257+
42484258
# Name of network interface to read address from.
42494259
# CLI flag: -ruler.ring.instance-interface-names
42504260
[instance_interface_names: <list of string> | default = [eth0 en0]]
@@ -4266,6 +4276,21 @@ ring:
42664276
# CLI flag: -experimental.ruler.enable-api
42674277
[enable_api: <boolean> | default = false]
42684278
4279+
# EXPERIMENTAL: Enable rulers to store a copy of rules owned by other rulers
4280+
# with default state (state before any evaluation) and send this copy in list
4281+
# API requests as backup in case the ruler who owns the rule fails to send its
4282+
# rules. This allows the rules API to handle ruler outage by returning rules
4283+
# with default state. Ring replication-factor needs to be set to 2 or more for
4284+
# this to be useful.
4285+
# CLI flag: -experimental.ruler.api-enable-rules-backup
4286+
[api_enable_rules_backup: <boolean> | default = false]
4287+
4288+
# EXPERIMENTAL: Remove duplicate rules in the prometheus rules and alerts API
4289+
# response. If there are duplicate rules the rule with the latest evaluation
4290+
# timestamp will be kept.
4291+
# CLI flag: -experimental.ruler.api-deduplicate-rules
4292+
[api_deduplicate_rules: <boolean> | default = false]
4293+
42694294
# Comma separated list of tenants whose rules this ruler can evaluate. If
42704295
# specified, only these tenants will be handled by ruler, otherwise this ruler
42714296
# can process rules from all tenants. Subject to sharding.

integration/ruler_test.go

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,14 @@ func TestRulerSharding(t *testing.T) {
395395
}
396396

397397
func TestRulerAPISharding(t *testing.T) {
398+
testRulerAPIWithSharding(t, false)
399+
}
400+
401+
func TestRulerAPIShardingWithAPIRulesBackupEnabled(t *testing.T) {
402+
testRulerAPIWithSharding(t, true)
403+
}
404+
405+
func testRulerAPIWithSharding(t *testing.T, enableAPIRulesBackup bool) {
398406
const numRulesGroups = 100
399407

400408
random := rand.New(rand.NewSource(time.Now().UnixNano()))
@@ -444,24 +452,30 @@ func TestRulerAPISharding(t *testing.T) {
444452
require.NoError(t, s.StartAndWaitReady(consul, minio))
445453

446454
// Configure the ruler.
455+
overrides := map[string]string{
456+
// Since we're not going to run any rule, we don't need the
457+
// store-gateway to be configured to a valid address.
458+
"-querier.store-gateway-addresses": "localhost:12345",
459+
// Enable the bucket index so we can skip the initial bucket scan.
460+
"-blocks-storage.bucket-store.bucket-index.enabled": "true",
461+
}
462+
if enableAPIRulesBackup {
463+
overrides["-ruler.ring.replication-factor"] = "3"
464+
overrides["-experimental.ruler.api-enable-rules-backup"] = "true"
465+
}
447466
rulerFlags := mergeFlags(
448467
BlocksStorageFlags(),
449468
RulerFlags(),
450469
RulerShardingFlags(consul.NetworkHTTPEndpoint()),
451-
map[string]string{
452-
// Since we're not going to run any rule, we don't need the
453-
// store-gateway to be configured to a valid address.
454-
"-querier.store-gateway-addresses": "localhost:12345",
455-
// Enable the bucket index so we can skip the initial bucket scan.
456-
"-blocks-storage.bucket-store.bucket-index.enabled": "true",
457-
},
470+
overrides,
458471
)
459472

460473
// Start rulers.
461474
ruler1 := e2ecortex.NewRuler("ruler-1", consul.NetworkHTTPEndpoint(), rulerFlags, "")
462475
ruler2 := e2ecortex.NewRuler("ruler-2", consul.NetworkHTTPEndpoint(), rulerFlags, "")
463-
rulers := e2ecortex.NewCompositeCortexService(ruler1, ruler2)
464-
require.NoError(t, s.StartAndWaitReady(ruler1, ruler2))
476+
ruler3 := e2ecortex.NewRuler("ruler-3", consul.NetworkHTTPEndpoint(), rulerFlags, "")
477+
rulers := e2ecortex.NewCompositeCortexService(ruler1, ruler2, ruler3)
478+
require.NoError(t, s.StartAndWaitReady(ruler1, ruler2, ruler3))
465479

466480
// Upload rule groups to one of the rulers.
467481
c, err := e2ecortex.NewClient("", "", "", ruler1.HTTPEndpoint(), "user-1")
@@ -542,6 +556,10 @@ func TestRulerAPISharding(t *testing.T) {
542556
},
543557
}
544558
// For each test case, fetch the rules with configured filters, and ensure the results match.
559+
if enableAPIRulesBackup {
560+
err := ruler2.Kill() // if api-enable-rules-backup is enabled the APIs should be able to handle a ruler going down
561+
require.NoError(t, err)
562+
}
545563
for name, tc := range testCases {
546564
t.Run(name, func(t *testing.T) {
547565
actualGroups, err := c.GetPrometheusRules(tc.filter)

pkg/compactor/shuffle_sharding_grouper_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -766,6 +766,11 @@ func (r *RingMock) GetInstanceDescsForOperation(op ring.Operation) (map[string]r
766766
return args.Get(0).(map[string]ring.InstanceDesc), args.Error(1)
767767
}
768768

769+
func (r *RingMock) GetAllInstanceDescs(op ring.Operation) ([]ring.InstanceDesc, []ring.InstanceDesc, error) {
770+
args := r.Called(op)
771+
return args.Get(0).([]ring.InstanceDesc), make([]ring.InstanceDesc, 0), args.Error(1)
772+
}
773+
769774
func (r *RingMock) GetReplicationSetForOperation(op ring.Operation) (ring.ReplicationSet, error) {
770775
args := r.Called(op)
771776
return args.Get(0).(ring.ReplicationSet), args.Error(1)

pkg/ring/ring.go

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ type ReadRing interface {
4949
// of unhealthy instances is greater than the tolerated max unavailable.
5050
GetAllHealthy(op Operation) (ReplicationSet, error)
5151

52+
// GetAllInstanceDescs returns a slice of healthy and unhealthy InstanceDesc.
53+
GetAllInstanceDescs(op Operation) ([]InstanceDesc, []InstanceDesc, error)
54+
5255
// GetInstanceDescsForOperation returns map of InstanceDesc with instance ID as the keys.
5356
GetInstanceDescsForOperation(op Operation) (map[string]InstanceDesc, error)
5457

@@ -463,6 +466,28 @@ func (r *Ring) GetAllHealthy(op Operation) (ReplicationSet, error) {
463466
}, nil
464467
}
465468

469+
// GetAllInstanceDescs implements ReadRing.
470+
func (r *Ring) GetAllInstanceDescs(op Operation) ([]InstanceDesc, []InstanceDesc, error) {
471+
r.mtx.RLock()
472+
defer r.mtx.RUnlock()
473+
474+
if r.ringDesc == nil || len(r.ringDesc.Ingesters) == 0 {
475+
return nil, nil, ErrEmptyRing
476+
}
477+
healthyInstances := make([]InstanceDesc, 0, len(r.ringDesc.Ingesters))
478+
unhealthyInstances := make([]InstanceDesc, 0, len(r.ringDesc.Ingesters))
479+
storageLastUpdate := r.KVClient.LastUpdateTime(r.key)
480+
for _, instance := range r.ringDesc.Ingesters {
481+
if r.IsHealthy(&instance, op, storageLastUpdate) {
482+
healthyInstances = append(healthyInstances, instance)
483+
} else {
484+
unhealthyInstances = append(unhealthyInstances, instance)
485+
}
486+
}
487+
488+
return healthyInstances, unhealthyInstances, nil
489+
}
490+
466491
// GetInstanceDescsForOperation implements ReadRing.
467492
func (r *Ring) GetInstanceDescsForOperation(op Operation) (map[string]InstanceDesc, error) {
468493
r.mtx.RLock()

pkg/ring/ring_test.go

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -959,6 +959,41 @@ func TestRing_GetInstanceDescsForOperation(t *testing.T) {
959959
}, instanceDescs)
960960
}
961961

962+
func TestRing_GetAllInstanceDescs(t *testing.T) {
963+
now := time.Now().Unix()
964+
twoMinutesAgo := time.Now().Add(-2 * time.Minute).Unix()
965+
966+
ringDesc := &Desc{Ingesters: map[string]InstanceDesc{
967+
"instance-1": {Addr: "127.0.0.1", Tokens: []uint32{1}, State: ACTIVE, Timestamp: now},
968+
"instance-2": {Addr: "127.0.0.2", Tokens: []uint32{2}, State: LEAVING, Timestamp: now}, // not healthy state
969+
"instance-3": {Addr: "127.0.0.3", Tokens: []uint32{3}, State: ACTIVE, Timestamp: twoMinutesAgo}, // heartbeat timed out
970+
}}
971+
972+
ring := Ring{
973+
cfg: Config{HeartbeatTimeout: time.Minute},
974+
ringDesc: ringDesc,
975+
ringTokens: ringDesc.GetTokens(),
976+
ringTokensByZone: ringDesc.getTokensByZone(),
977+
ringInstanceByToken: ringDesc.getTokensInfo(),
978+
ringZones: getZones(ringDesc.getTokensByZone()),
979+
strategy: NewDefaultReplicationStrategy(),
980+
KVClient: &MockClient{},
981+
}
982+
983+
testOp := NewOp([]InstanceState{ACTIVE}, nil)
984+
985+
healthyInstanceDescs, unhealthyInstanceDescs, err := ring.GetAllInstanceDescs(testOp)
986+
require.NoError(t, err)
987+
require.EqualValues(t, []InstanceDesc{
988+
{Addr: "127.0.0.1", Tokens: []uint32{1}, State: ACTIVE, Timestamp: now},
989+
}, healthyInstanceDescs)
990+
sort.Slice(unhealthyInstanceDescs, func(i, j int) bool { return unhealthyInstanceDescs[i].Addr < unhealthyInstanceDescs[j].Addr })
991+
require.EqualValues(t, []InstanceDesc{
992+
{Addr: "127.0.0.2", Tokens: []uint32{2}, State: LEAVING, Timestamp: now},
993+
{Addr: "127.0.0.3", Tokens: []uint32{3}, State: ACTIVE, Timestamp: twoMinutesAgo},
994+
}, unhealthyInstanceDescs)
995+
}
996+
962997
func TestRing_GetReplicationSetForOperation(t *testing.T) {
963998
now := time.Now()
964999
g := NewRandomTokenGenerator()

pkg/ring/util_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,11 @@ func (r *RingMock) GetInstanceDescsForOperation(op Operation) (map[string]Instan
3636
return args.Get(0).(map[string]InstanceDesc), args.Error(1)
3737
}
3838

39+
func (r *RingMock) GetAllInstanceDescs(op Operation) ([]InstanceDesc, []InstanceDesc, error) {
40+
args := r.Called(op)
41+
return args.Get(0).([]InstanceDesc), make([]InstanceDesc, 0), args.Error(1)
42+
}
43+
3944
func (r *RingMock) GetReplicationSetForOperation(op Operation) (ReplicationSet, error) {
4045
args := r.Called(op)
4146
return args.Get(0).(ReplicationSet), args.Error(1)

pkg/ruler/api.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,9 @@ func (a *API) PrometheusRules(w http.ResponseWriter, req *http.Request) {
258258

259259
// keep data.groups are in order
260260
sort.Slice(groups, func(i, j int) bool {
261+
if groups[i].File == groups[j].File {
262+
return groups[i].Name < groups[j].Name
263+
}
261264
return groups[i].File < groups[j].File
262265
})
263266

pkg/ruler/manager.go

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ type DefaultMultiTenantManager struct {
4444
notifiers map[string]*rulerNotifier
4545
notifiersDiscoveryMetrics map[string]discovery.DiscovererMetrics
4646

47+
// rules backup
48+
rulesBackupManager *rulesBackupManager
49+
4750
managersTotal prometheus.Gauge
4851
lastReloadSuccessful *prometheus.GaugeVec
4952
lastReloadSuccessfulTimestamp *prometheus.GaugeVec
@@ -79,7 +82,7 @@ func NewDefaultMultiTenantManager(cfg Config, managerFactory ManagerFactory, eva
7982
os.Exit(1)
8083
}
8184

82-
return &DefaultMultiTenantManager{
85+
m := &DefaultMultiTenantManager{
8386
cfg: cfg,
8487
notifierCfg: ncfg,
8588
managerFactory: managerFactory,
@@ -112,7 +115,11 @@ func NewDefaultMultiTenantManager(cfg Config, managerFactory ManagerFactory, eva
112115
}, []string{"user"}),
113116
registry: reg,
114117
logger: logger,
115-
}, nil
118+
}
119+
if cfg.APIEnableRulesBackup {
120+
m.rulesBackupManager = newRulesBackupManager(cfg, logger, reg)
121+
}
122+
return m, nil
116123
}
117124

118125
func (r *DefaultMultiTenantManager) SyncRuleGroups(ctx context.Context, ruleGroups map[string]rulespb.RuleGroupList) {
@@ -161,8 +168,14 @@ func (r *DefaultMultiTenantManager) deleteRuleCache(user string) {
161168
delete(r.ruleCache, user)
162169
}
163170

171+
func (r *DefaultMultiTenantManager) BackUpRuleGroups(ctx context.Context, ruleGroups map[string]rulespb.RuleGroupList) {
172+
if r.rulesBackupManager != nil {
173+
r.rulesBackupManager.setRuleGroups(ctx, ruleGroups)
174+
}
175+
}
176+
164177
// syncRulesToManager maps the rule files to disk, detects any changes and will create/update the
165-
// the users Prometheus Rules Manager.
178+
// users Prometheus Rules Manager.
166179
func (r *DefaultMultiTenantManager) syncRulesToManager(ctx context.Context, user string, groups rulespb.RuleGroupList) {
167180
// Map the files to disk and return the file names to be passed to the users manager if they
168181
// have been updated
@@ -333,6 +346,13 @@ func (r *DefaultMultiTenantManager) GetRules(userID string) []*promRules.Group {
333346
return groups
334347
}
335348

349+
func (r *DefaultMultiTenantManager) GetBackupRules(userID string) rulespb.RuleGroupList {
350+
if r.rulesBackupManager != nil {
351+
return r.rulesBackupManager.getRuleGroups(userID)
352+
}
353+
return nil
354+
}
355+
336356
func (r *DefaultMultiTenantManager) Stop() {
337357
r.notifiersMtx.Lock()
338358
for _, n := range r.notifiers {

pkg/ruler/manager_test.go

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,47 @@ func TestSyncRuleGroupsCleanUpPerUserMetrics(t *testing.T) {
253253
require.NotContains(t, mfm["cortex_ruler_config_last_reload_successful"].String(), "value:\""+user+"\"")
254254
}
255255

256+
func TestBackupRules(t *testing.T) {
257+
dir := t.TempDir()
258+
reg := prometheus.NewPedanticRegistry()
259+
evalMetrics := NewRuleEvalMetrics(Config{RulePath: dir, EnableQueryStats: true}, reg)
260+
waitDurations := []time.Duration{
261+
1 * time.Millisecond,
262+
1 * time.Millisecond,
263+
}
264+
ruleManagerFactory := RuleManagerFactory(nil, waitDurations)
265+
m, err := NewDefaultMultiTenantManager(Config{RulePath: dir, APIEnableRulesBackup: true}, ruleManagerFactory, evalMetrics, reg, log.NewNopLogger())
266+
require.NoError(t, err)
267+
268+
const user1 = "testUser"
269+
const user2 = "testUser2"
270+
271+
require.Equal(t, 0, len(m.GetBackupRules(user1)))
272+
require.Equal(t, 0, len(m.GetBackupRules(user2)))
273+
274+
userRules := map[string]rulespb.RuleGroupList{
275+
user1: {
276+
&rulespb.RuleGroupDesc{
277+
Name: "group1",
278+
Namespace: "ns",
279+
Interval: 1 * time.Minute,
280+
User: user1,
281+
},
282+
},
283+
user2: {
284+
&rulespb.RuleGroupDesc{
285+
Name: "group2",
286+
Namespace: "ns",
287+
Interval: 1 * time.Minute,
288+
User: user1,
289+
},
290+
},
291+
}
292+
m.BackUpRuleGroups(context.TODO(), userRules)
293+
require.Equal(t, userRules[user1], m.GetBackupRules(user1))
294+
require.Equal(t, userRules[user2], m.GetBackupRules(user2))
295+
}
296+
256297
func getManager(m *DefaultMultiTenantManager, user string) RulesManager {
257298
m.userManagerMtx.RLock()
258299
defer m.userManagerMtx.RUnlock()

pkg/ruler/merger.go

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
package ruler
2+
3+
import (
4+
"time"
5+
6+
promRules "github.com/prometheus/prometheus/rules"
7+
)
8+
9+
// mergeGroupStateDesc removes duplicates from the provided []*GroupStateDesc by keeping the GroupStateDesc with the
10+
// latest information. It uses the EvaluationTimestamp of the GroupStateDesc and the EvaluationTimestamp of the
11+
// ActiveRules in a GroupStateDesc to determine the which GroupStateDesc has the latest information.
12+
func mergeGroupStateDesc(in []*GroupStateDesc) []*GroupStateDesc {
13+
states := make(map[string]*GroupStateDesc)
14+
rgTime := make(map[string]time.Time)
15+
for _, state := range in {
16+
latestTs := state.EvaluationTimestamp
17+
for _, r := range state.ActiveRules {
18+
if latestTs.Before(r.EvaluationTimestamp) {
19+
latestTs = r.EvaluationTimestamp
20+
}
21+
}
22+
key := promRules.GroupKey(state.Group.Namespace, state.Group.Name)
23+
ts, ok := rgTime[key]
24+
if !ok || ts.Before(latestTs) {
25+
states[key] = state
26+
rgTime[key] = latestTs
27+
}
28+
}
29+
groups := make([]*GroupStateDesc, 0, len(states))
30+
for _, state := range states {
31+
groups = append(groups, state)
32+
}
33+
return groups
34+
}

0 commit comments

Comments
 (0)