Skip to content

Federated ruler draft #4520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

rdooley
Copy link

@rdooley rdooley commented Oct 7, 2021

What this PR does:
This PR introduces the features discussed in #4477 allowing the creation of federated rules which have explicit assignment of source and destination tenant for the rule, rather than assuming source and destination from rule ownership.

A quick example of this being used in a sandbox environment

In a multitenant sandbox where tenants 0, 1, thru 19 were fed data via avalanche I created two recording rules owned by tenant 15

curl -X GET -H "X-Scope-OrgID: 15" localhost:8080/prometheus/api/v1/rules | jq                                                                                 ✭
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   908  100   908    0     0   5710      0 --:--:-- --:--:-- --:--:--  5710
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "no_dest",
        "file": "no_dest",
        "rules": [
          {
            "name": "no_dest",
            "query": "sum(avalanche_metric_mmmmm_1_0)",
            "srcTenants": "0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19",
            "labels": {},
            "health": "ok",
            "lastError": "",
            "type": "recording",
            "lastEvaluation": "2021-10-07T21:39:26.964807323Z",
            "evaluationTime": 0.003348246
          }
        ],
        "interval": 10,
        "lastEvaluation": "2021-10-07T21:39:26.96476279Z",
        "evaluationTime": 0.003397023
      },
      {
        "name": "record_rule_eg",
        "file": "record_rule",
        "rules": [
          {
            "name": "record_rule_eg",
            "query": "sum(avalanche_metric_mmmmm_1_0)",
            "srcTenants": "0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19",
            "destTenant": "17",
            "labels": {},
            "health": "ok",
            "lastError": "",
            "type": "recording",
            "lastEvaluation": "2021-10-07T21:39:25.197178261Z",
            "evaluationTime": 0.003211119
          }
        ],
        "interval": 10,
        "lastEvaluation": "2021-10-07T21:39:25.19710275Z",
        "evaluationTime": 0.003291214
      }
    ]
  },
  "errorType": "",
  "error": ""
}
// Alt view
curl -X GET -H "X-Scope-OrgID: 15" localhost:8080/api/v1/rules                                                                                                 ✭
no_dest:
    - name: no_dest
      interval: 10s
      rules:
        - record: no_dest
          expr: sum(avalanche_metric_mmmmm_1_0)
          src_tenants: 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19
record_rule:
    - name: record_rule_eg
      interval: 10s
      rules:
        - record: record_rule_eg
          expr: sum(avalanche_metric_mmmmm_1_0)
          src_tenants: 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19
          dest_tenant: "17"

Then viewing this in a grafana configured to query the composite tenant 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19
Screenshot 2021-10-06 at 15-20-04 Explore - Cortex - Grafana

we can see that the record_rule_eg and no_dest metrics match the expected value of sum(avalanche_metric_mmmmm_1_0) while being stored in tenants 17 and 15 respectively.

As this is a new feature, there is extensive documentation needed to go alongside this code change. Because of that, I'm leaving this PR in draft for now, to be viewed alongside the proposal.

Which issue(s) this PR fixes:
Fixes #4403

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

andrejbranch and others added 4 commits October 4, 2021 14:15
…crease multitenant query max concurrency

Signed-off-by: Rees Dooley <[email protected]>
* Push ruler metrics to a random subtenant if rule is multitenant
* random seed is set based off labelset hash to ensure series always
  pushed to same subtenant
* add changelog message about multitenant ruler support

Signed-off-by: Rees Dooley <[email protected]>
* expand federated tenant alertmanager test case
* Add tenant-federation.max-concurrency config opt (docs need
  regenerating later)
* switch to hash mod instead of random for federated ruler subtenant
  selection

Signed-off-by: Rees Dooley <[email protected]>
* implements the rest of the proposal cortexproject#4477
* bring in vendored prometheus rules and rulefmt code
* add optional src and dest tenant fields
* Block the creation of these federated rules behind a feature flag in
  ruler api
* Original issue cortexproject#4403
* a large amount of documentation surrounding this expanded feature set
  is needed

Signed-off-by: Rees Dooley <[email protected]>
@rdooley rdooley mentioned this pull request Oct 7, 2021
3 tasks
@@ -72,7 +72,9 @@ func (a *PusherAppender) Commit() error {

// Since a.pusher is distributor, client.ReuseSlice will be called in a.pusher.Push.
// We shouldn't call client.ReuseSlice here.
_, err := a.pusher.Push(user.InjectOrgID(a.ctx, a.userID), cortexpb.ToWriteRequest(a.labels, a.samples, nil, cortexpb.RULE))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinjection of the pusherappendable's context was blocking the ability to set the destination tenant context appropriately during the rule group eval

srcTenants := rule.GetSrcTenants()
if srcTenants != "" {
queryCtx = user.InjectOrgID(ctx, srcTenants)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the key additions here, setting a specific query context user if srcTenants is set

i := int(rule.Labels().Copy().Hash()) % len(tenantIDs)
tenantID := tenantIDs[i]
appCtx = user.InjectOrgID(ctx, tenantID)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other key addition, setting a dest context based on the dest tenant if set, and if the dest tenant (or owning tenant) is composite, then consistently chosing a subtenant.

tenantIDs, err := tenant.TenantIDs(appCtx)
if err == nil && len(tenantIDs) > 1 {
// Mod the hash of the series so same series always goes to same subtenant
i := int(rule.Labels().Copy().Hash()) % len(tenantIDs)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to handle the hash here being negative producing a negative i

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably also needs a test to verify the expected behavior

@rdooley
Copy link
Author

rdooley commented Jan 6, 2022

This is going to have to change as the proposal was updated closing

@rdooley rdooley closed this Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ruler] Allow Federated rules
2 participants