Skip to content

20190117 prom rules endpoints #1999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Feb 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c630dce
add api to support prom rules and alerts to ruler
jtlisi Jan 17, 2020
c7fc483
clean up deps
jtlisi Jan 17, 2020
ed010fe
update changelog
jtlisi Jan 17, 2020
c5dfab9
update documentation
jtlisi Jan 18, 2020
142e5e1
register ruler routes in modules.go
jtlisi Jan 21, 2020
43a45a9
refactor to have proper struct tags and cleaner logic for rule retrieval
jtlisi Jan 21, 2020
48fec75
add Rules test
jtlisi Jan 22, 2020
6f124cf
update docs
jtlisi Jan 22, 2020
9f14236
explicitly pass registerer to the ruler
jtlisi Jan 22, 2020
3073c81
create respondError function
jtlisi Jan 22, 2020
79222b2
remove unused return and err vars
jtlisi Jan 22, 2020
912599c
reorder registry wrapping
jtlisi Jan 22, 2020
f5732e1
refactor based on pr suggestions
jtlisi Jan 24, 2020
4c6ec75
improve ruler tests for loading rules
jtlisi Jan 28, 2020
60e8d84
add logger to mapper tests
jtlisi Jan 28, 2020
80b28b2
add tests for ruler api calls
jtlisi Jan 28, 2020
79e0925
ensure all test rulers load rules before returning
jtlisi Jan 28, 2020
3eb7425
format api_test file
jtlisi Jan 28, 2020
43baec1
go format
jtlisi Jan 28, 2020
31b624c
clean mapper_test file
jtlisi Jan 28, 2020
1754949
add api documentation
jtlisi Jan 28, 2020
c1683c3
fix alert array instantiation
jtlisi Jan 28, 2020
1d34e66
refactor according to changes and comments on PR
jtlisi Jan 29, 2020
b00ec4b
make all time related fields nonnullable for rules protos
jtlisi Jan 29, 2020
abe4ab7
ensure ruler is registered as grpc service
jtlisi Jan 29, 2020
15de020
use noop querier for test cases
jtlisi Jan 29, 2020
53aadef
format alert value string identical to Prometheus
jtlisi Jan 29, 2020
c7fe74f
refactor per PR comments
jtlisi Jan 30, 2020
ecf1860
fix rebase changelog
jtlisi Feb 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* `--querier.query-store-after` has been added in it's place.
* [FEATURE] Added user sub rings to distribute users to a subset of ingesters. #1947
* `--experimental.distributor.user-subring-size`
* [FEATURE] Added flag `-experimental.ruler.enable-api` to enable the ruler api which implements the Prometheus API `/api/v1/rules` and `/api/v1/alerts` endpoints under the configured `-http.prefix`. #1999
* [ENHANCEMENT] Experimental TSDB: Export TSDB Syncer metrics from Compactor component, they are prefixed with `cortex_compactor_`. #2023
* [ENHANCEMENT] Experimental TSDB: Added dedicated flag `-experimental.tsdb.bucket-store.tenant-sync-concurrency` to configure the maximum number of concurrent tenants for which blocks are synched. #2026
* [ENHANCEMENT] Experimental TSDB: Expose metrics for objstore operations (prefixed with `cortex_<component>_thanos_objstore_`, component being one of `ingester`, `querier` and `compactor`). #2027
Expand Down
7 changes: 7 additions & 0 deletions docs/apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ APIs. The encoding is Protobuf over http.

Read is on `/api/prom/read` and write is on `/api/prom/push`.

## Alerts & Rules API

Cortex supports the Prometheus' [alerts](https://prometheus.io/docs/prometheus/latest/querying/api/#alerts) and [rules](https://prometheus.io/docs/prometheus/latest/querying/api/#rules) api endpoints. This is supported in the Ruler service and can be enabled using the `experimental.ruler.enable-api` flag.

`GET /api/prom/api/v1/rules` - List of alerting and recording rules that are currently loaded

`GET /api/prom/api/v1/alerts` - List of all active alerts

## Configs API

Expand Down
4 changes: 4 additions & 0 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -736,6 +736,10 @@ ring:
# Period with which to attempt to flush rule groups.
# CLI flag: -ruler.flush-period
[flushcheckperiod: <duration> | default = 1m0s]

# Enable the ruler api
# CLI flag: -experimental.ruler.enable-api
[enable_api: <boolean> | default = false]
```

## `alertmanager_config`
Expand Down
10 changes: 8 additions & 2 deletions pkg/cortex/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -415,9 +415,15 @@ func (t *Cortex) initRuler(cfg *Config) (err error) {
cfg.Ruler.Ring.ListenPort = cfg.Server.GRPCListenPort
queryable, engine := querier.New(cfg.Querier, t.distributor, t.store)

t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor)
t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor, prometheus.DefaultRegisterer, util.Logger)
if err != nil {
return
return err
}

if cfg.Ruler.EnableAPI {
subrouter := t.server.HTTP.PathPrefix(cfg.HTTPPrefix).Subrouter()
t.ruler.RegisterRoutes(subrouter)
ruler.RegisterRulerServer(t.server.GRPC, t.ruler)
}

t.server.HTTP.Handle("/ruler_ring", t.ruler)
Expand Down
252 changes: 252 additions & 0 deletions pkg/ruler/api.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
package ruler

import (
"encoding/json"
"net/http"
"strconv"
"time"

"github.com/go-kit/kit/log"
"github.com/go-kit/kit/log/level"
"github.com/gorilla/mux"
v1 "github.com/prometheus/client_golang/api/prometheus/v1"
"github.com/prometheus/prometheus/pkg/labels"
"github.com/weaveworks/common/user"

"github.com/cortexproject/cortex/pkg/ingester/client"
"github.com/cortexproject/cortex/pkg/util"
)

// RegisterRoutes registers the ruler API HTTP routes with the provided Router.
func (r *Ruler) RegisterRoutes(router *mux.Router) {
for _, route := range []struct {
name, method, path string
handler http.HandlerFunc
}{
{"get_rules", "GET", "/api/v1/rules", r.rules},
{"get_alerts", "GET", "/api/v1/alerts", r.alerts},
} {
level.Debug(util.Logger).Log("msg", "ruler: registering route", "name", route.name, "method", route.method, "path", route.path)
router.Handle(route.path, route.handler).Methods(route.method).Name(route.name)
}
}

// In order to reimplement the prometheus rules API, a large amount of code was copied over
// This is required because the prometheus api implementation does not pass a context to
// the rule retrieval function.
// https://github.com/prometheus/prometheus/blob/2aacd807b3ec6ddd90ae55f3a42f4cffed561ea9/web/api/v1/api.go#L108
// https://github.com/prometheus/prometheus/pull/4999

type response struct {
Status string `json:"status"`
Data interface{} `json:"data,omitempty"`
ErrorType v1.ErrorType `json:"errorType,omitempty"`
Error string `json:"error,omitempty"`
}

// AlertDiscovery has info for all active alerts.
type AlertDiscovery struct {
Alerts []*Alert `json:"alerts"`
}

// Alert has info for an alert.
type Alert struct {
Labels labels.Labels `json:"labels"`
Annotations labels.Labels `json:"annotations"`
State string `json:"state"`
ActiveAt *time.Time `json:"activeAt,omitempty"`
Value string `json:"value"`
}

// RuleDiscovery has info for all rules
type RuleDiscovery struct {
RuleGroups []*RuleGroup `json:"groups"`
}

// RuleGroup has info for rules which are part of a group
type RuleGroup struct {
Name string `json:"name"`
File string `json:"file"`
// In order to preserve rule ordering, while exposing type (alerting or recording)
// specific properties, both alerting and recording rules are exposed in the
// same array.
Rules []rule `json:"rules"`
Interval float64 `json:"interval"`
}

type rule interface{}

type alertingRule struct {
// State can be "pending", "firing", "inactive".
State string `json:"state"`
Name string `json:"name"`
Query string `json:"query"`
Duration float64 `json:"duration"`
Labels labels.Labels `json:"labels"`
Annotations labels.Labels `json:"annotations"`
Alerts []*Alert `json:"alerts"`
Health string `json:"health"`
LastError string `json:"lastError,omitempty"`
Type v1.RuleType `json:"type"`
}

type recordingRule struct {
Name string `json:"name"`
Query string `json:"query"`
Labels labels.Labels `json:"labels,omitempty"`
Health string `json:"health"`
LastError string `json:"lastError,omitempty"`
Type v1.RuleType `json:"type"`
}

func respondError(logger log.Logger, w http.ResponseWriter, msg string) {
b, err := json.Marshal(&response{
Status: "error",
ErrorType: v1.ErrServer,
Error: msg,
Data: nil,
})

if err != nil {
level.Error(logger).Log("msg", "error marshaling json response", "err", err)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}

w.WriteHeader(http.StatusInternalServerError)
if n, err := w.Write(b); err != nil {
level.Error(logger).Log("msg", "error writing response", "bytesWritten", n, "err", err)
}
}

func (r *Ruler) rules(w http.ResponseWriter, req *http.Request) {
logger := util.WithContext(req.Context(), util.Logger)
userID, ctx, err := user.ExtractOrgIDFromHTTPRequest(req)
if err != nil {
level.Error(logger).Log("msg", "error extracting org id from context", "err", err)
respondError(logger, w, "no valid org id found")
return
}

w.Header().Set("Content-Type", "application/json")
rgs, err := r.GetRules(ctx, userID)

if err != nil {
respondError(logger, w, err.Error())
return
}

groups := make([]*RuleGroup, 0, len(rgs))

for _, g := range rgs {
grp := RuleGroup{
Name: g.Name,
File: g.Namespace,
Interval: g.Interval.Seconds(),
Rules: make([]rule, len(g.Rules)),
}

for i, rl := range g.Rules {
if g.Rules[i].Alert != "" {
alerts := make([]*Alert, 0, len(rl.Alerts))
for _, a := range rl.Alerts {
alerts = append(alerts, &Alert{
Labels: client.FromLabelAdaptersToLabels(a.Labels),
Annotations: client.FromLabelAdaptersToLabels(a.Annotations),
State: a.GetState(),
ActiveAt: &a.ActiveAt,
Value: strconv.FormatFloat(a.Value, 'e', -1, 64),
})
}
grp.Rules[i] = alertingRule{
State: rl.GetState(),
Name: rl.GetAlert(),
Query: rl.GetExpr(),
Duration: rl.For.Seconds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't rl.For be nil?

Copy link
Contributor Author

@jtlisi jtlisi Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes. However, I don't think it can occur as things stand. However, the more I look at this, I think it would be appropriate to use a nonnullable approach to Times in the rules/alerts protos. The zero values are considered unset by prometheus and cortex should take the same approach. It should be safe to make this change as of right now. I will do some tests to verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the protos to use a non-nullable field for timestamps and durations. This allows us to avoid using any pointers for timestamps/durations except when we marshal them into JSON, similar to the prometheus API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as it's not a breaking change, LGTM.

Labels: client.FromLabelAdaptersToLabels(rl.Labels),
Annotations: client.FromLabelAdaptersToLabels(rl.Annotations),
Alerts: alerts,
Health: rl.GetHealth(),
LastError: rl.GetLastError(),
Type: v1.RuleTypeAlerting,
}
} else {
grp.Rules[i] = recordingRule{
Name: rl.GetRecord(),
Query: rl.GetExpr(),
Labels: client.FromLabelAdaptersToLabels(rl.Labels),
Health: rl.GetHealth(),
LastError: rl.GetLastError(),
Type: v1.RuleTypeRecording,
}
}
}
groups = append(groups, &grp)
}

b, err := json.Marshal(&response{
Status: "success",
Data: &RuleDiscovery{RuleGroups: groups},
})
if err != nil {
level.Error(logger).Log("msg", "error marshaling json response", "err", err)
respondError(logger, w, "unable to marshal the requested data")
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
if n, err := w.Write(b); err != nil {
level.Error(logger).Log("msg", "error writing response", "bytesWritten", n, "err", err)
}
}

func (r *Ruler) alerts(w http.ResponseWriter, req *http.Request) {
logger := util.WithContext(req.Context(), util.Logger)
userID, ctx, err := user.ExtractOrgIDFromHTTPRequest(req)
if err != nil {
level.Error(logger).Log("msg", "error extracting org id from context", "err", err)
respondError(logger, w, "no valid org id found")
return
}

w.Header().Set("Content-Type", "application/json")
rgs, err := r.GetRules(ctx, userID)

if err != nil {
respondError(logger, w, err.Error())
return
}

alerts := []*Alert{}

for _, g := range rgs {
for _, rl := range g.Rules {
if rl.Alert != "" {
for _, a := range rl.Alerts {
alerts = append(alerts, &Alert{
Labels: client.FromLabelAdaptersToLabels(a.Labels),
Annotations: client.FromLabelAdaptersToLabels(a.Annotations),
State: a.GetState(),
ActiveAt: &a.ActiveAt,
Value: strconv.FormatFloat(a.Value, 'e', -1, 64),
})
}
}
}
}

b, err := json.Marshal(&response{
Status: "success",
Data: &AlertDiscovery{Alerts: alerts},
})
if err != nil {
level.Error(logger).Log("msg", "error marshaling json response", "err", err)
respondError(logger, w, "unable to marshal the requested data")
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
if n, err := w.Write(b); err != nil {
level.Error(logger).Log("msg", "error writing response", "bytesWritten", n, "err", err)
}
}
Loading