Skip to content

Update prometheus to v2.16.0 #2088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Feb 21, 2020
Merged

Update prometheus to v2.16.0 #2088

merged 29 commits into from
Feb 21, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Feb 6, 2020

What this PR does: This PR updates Prometheus dependency to v2.16.0 (b90be6f32a) (was -rc.0 and -rc.1 before). This is required to finish work on issue #2003.

Recent Prometheus changes introduced some incompatibilities:

  • promql.EngineOpts struct had its MaxConcurrent field removed, and functionality replaced with active query tracker. Cortex' querier therefore enables the tracker to keep -querier.max-concurrent flag still working. Since tracker needs a directory to write files to, Cortex uses ./active-query-tracker by default. Update: If directory is explicitly set to empty string by using -querier.active-query-tracker-dir="", Active Query Tracker is disabled. (Previously, random temp directory was created for empty dir)

  • Changes in rulefmt package modified RuleNode type, which replaced previously used Rule. Unfortunately, while this provides better error reporting when doing deserialization, it makes it impossible to serialize this structure back to YAML because of problem in yaml.v3 package. Until we find a better solution, I've made a copy of rulefmt package before the change into Cortex (as "legacy_rulefmt"). It's a very tiny package, but admittedly, this is not a good solution.

  • Prometheus also added SelectSorted method to storage.Querier interface, but fortunately all our queriers already sorted their output in Select method anyway, so all implementations just call that.

  • promql.ParseErr has lost its Line and Pos fields and now only has PositionRange with start and end location. We use this in /api/prom/validate_expr API endpoint to report query syntax problems. This endpoint doesn't seem to be documented anywhere and was added by request from Grafana Observability team. It doesn't seem to be used as of now. If it's not used anywhere else, it might be possible to remove it completely (it's not provided by Prometheus). For now I've changed endpoint to return start and end position instead of line and col fields. /cc @davkal. Update: this functionality has been completely removed by Remove unused /validate_expr endpoint. #2152.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just update the logic for the temp directory in the ruler_test so it's called after the test is completed not the newTestRuler function.

@pstibrany pstibrany changed the title Update prometheus to v2.16.0-rc.0 Update prometheus to v2.16.0-rc.1 Feb 12, 2020
@pstibrany pstibrany changed the title Update prometheus to v2.16.0-rc.1 Update prometheus to v2.16.0 Feb 17, 2020
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @pstibrany ! Changes look good to me. I left few comments I would be glad if you could take a look.

@pstibrany
Copy link
Contributor Author

I've rebased and addressed your feedback. Now instead of creating temp dir for Active Query Tracker, we disable it, if no dir is configured. Dir defaults to ./active-query-tracker.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pstibrany for addressing my comments. I left a last comment in the code changes to fix the YAML field name for the query tracker.

I went through the vendor/github.com/prometheus changes and a think catched my attention. The diff in vendor/github.com/prometheus/prometheus/web/api/v1/api.go shows that two new fields have been added to the exported rules:

EvaluationTime float64   `json:"evaluationTime"`
LastEvaluation time.Time `json:"lastEvaluation"`

To keep Prometheus compatibility, I think we should add the support for these two new fields to the ruler API endpoints. @jtlisi ?

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Peter! Changes LGTM.

I went through the vendor/github.com/prometheus changes and a think catched my attention. The diff in vendor/github.com/prometheus/prometheus/web/api/v1/api.go shows that two new fields have been added to the exported rules.

About this comment, to me we can address it in a separate PR. It doesn't break the current logic, it's just a new "feature".

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Withdrawing my approval because we found out that the read path is failing with the error missing port in address.

@pstibrany
Copy link
Contributor Author

pstibrany commented Feb 18, 2020

Withdrawing my approval because we found out that the read path is failing with the error missing port in address.

There is now workaround for this, basically we fake the address in RemoteAddr field. Since we don't set any QueryLogger, this value is unused. (update: I mean parsed value from RemoteAddr)

I've also sent PR to Prometheus to be less strict about missing or invalid RemoteAddr value.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pstibrany! I'm OK keeping the temporarily fakeRemoteAddr() until the fix won't be upstream.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
That change is currently incompatible with our serialization code,
so until that is fixed, I've copied previous state of rulefmt from
Prometheus (github.com/prometheus/prometheus/pkg/rulefmt).

It's a small package, and I've included tests as well.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Also explains why we use temp dir, if it's not configured.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Disable AQT if no directory is set.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Since our Select methods already return sorted results, it makes sense
to call them SelectSorted, and let Select simply call SelectSorted,
rather then the other way around.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Prometheus requires non-empty RemoteAddr, but only uses it when Engine
actually has QueryLogger set. Which it doesn't, in Cortex.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pstibrany for addressing the rebase! The changes after the rebase LGTM.

@pracucci pracucci merged commit fbe1c63 into cortexproject:master Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants