Skip to content

Validate incoming labels for order and duplicate names. #1964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 15, 2020
Merged

Validate incoming labels for order and duplicate names. #1964

merged 6 commits into from
Jan 15, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Jan 8, 2020

Originally this PR added only check for sorted labels. This PR sorts unsorted labels in distributor, before using them for computing sharding token. It also adds check if there are duplicate label names, and reports them.

To properly format the input, it reimplements formatting function. Previously used function merged duplicate labels, and also sorted labels, so error message didn't include real input, but "normalized" version already. That's not very useful for debugging.

Checklist

  • Tests updated
  • CHANGELOG.md updated

Related to #1957 and #1961 which reverts it. This builds on #1957 and fixes it, while ignoring #1961.

@gouthamve
Copy link
Contributor

Given the remote_write API doesn't specify ordering and because Prometheus 1.x sends unordered labels, I think rejecting unordered labels is not the approach we want to take. Maybe we can sort them, but lets not reject them.

@pstibrany
Copy link
Contributor Author

pstibrany commented Jan 9, 2020

To summarize our Slack discussion:

Reason to make this PR was to make sure that sharding token is correct when sharding using all labels. Ordering issues or duplicate label names generate different tokens, eg. test{a="a",b="b"}, test{b="b",a="a"} and test{a="a",b="b",b="b"} have different token and end up on different ingesters, but we currently accept them as a same timeseries.

There is no question about that duplicate labels should be rejected. In order to detect them cheaply, I added check for ordering. I assumed that since the code was already there (although not working properly, hence #1957), that is what we wanted.

If we don't want to reject unordered labels, then I suggest that we simply order them in distributor (if needed). This will help us to 1) fix above-mentioned problem with token, while not resharding the world again (by changing the sharding computation), 2) still detect duplicate label names cheaply.

I'll update my PR to implement this change instead.

Copy link
Contributor

@khaines khaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for adding this check and sorting of labels! LGTM now that we are sorting the labels vs rejecting

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @pstibrany ! The changes LGTM, with a couple of nits.

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open this against the release-0.5.0 branch?

{__name__=""} will be shown formatted as {__name__=""} instead of just empty string.

Signed-off-by: Peter Štibraný <[email protected]>
@pstibrany pstibrany changed the base branch from master to release-0.5 January 14, 2020 10:52
Signed-off-by: Peter Štibraný <[email protected]>
@pstibrany
Copy link
Contributor Author

Can you open this against the release-0.5.0 branch?

Rebased on top of release-0.5 and changed target branch for this PR.

@gouthamve gouthamve merged commit 8b16675 into cortexproject:release-0.5 Jan 15, 2020
@pstibrany pstibrany deleted the label_validation branch January 15, 2020 11:23
gouthamve added a commit that referenced this pull request Jan 17, 2020
* Release 0.5.0-rc0

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix changelog to point to 0.5.0-rc.0

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Validate incoming labels for order and duplicate names. (#1964)

* Validate incoming labels for order and duplicate names.

Signed-off-by: Peter Štibraný <[email protected]>

* Document that cortex rejects requests with incorrectly ordered or duplicate labels.

Signed-off-by: Peter Štibraný <[email protected]>

* Ignore empty metric name.

{__name__=""} will be shown formatted as {__name__=""} instead of just empty string.

Signed-off-by: Peter Štibraný <[email protected]>

* As we rely on sorted labels, sort them before using them.

Signed-off-by: Peter Štibraný <[email protected]>

* Updated CHANGELOG.md to reflect latest changes to PR.

Signed-off-by: Peter Štibraný <[email protected]>

* Put back redundant aliases.

Signed-off-by: Peter Štibraný <[email protected]>

* wrap migration commands (#1980)

* wrap migration commands

Signed-off-by: Jacob Lisi <[email protected]>

* update changelog

Signed-off-by: Jacob Lisi <[email protected]>

* fix missing semicolon

Signed-off-by: Jacob Lisi <[email protected]>

* Fix typo to make lint pass.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

Co-authored-by: Goutham Veeramachaneni <[email protected]>

* Call out breaking changes better.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Label this version as -rc1

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* We're abandoning the `0.5.0` release.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Jacob Lisi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants