Skip to content

Add support to horizontally scale blocks compactor #2113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 12, 2020

Conversation

pracucci
Copy link
Contributor

What this PR does:
This PR introduces the support to horizontally scale the compactor, when using the experimental blocks storage.

I've also took the change to remove ingester references from the ring HTTP page, given it's now used by multiple components (ingesters, ruler, compactor). The refactoring done here should be safe, given we've recently removed support for denormalised tokens.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci force-pushed the horizontally-scale-compactor branch from 796fa14 to 8045fcb Compare February 11, 2020 15:07
Copy link
Contributor

@thorfour thorfour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I guess my only question is during a topology change is there not a way that two compactors can run on the same user? Ex. first compactor comes up, adds itself to the ring and starts compacting user 0. Next compactor starts up adds itself to the ring, and it now owns user 0 and will start compaction on that user, resulting int two compactors working on the same user.

for {
// Check if the ingester is ACTIVE in the ring and our ring client
// has detected it.
if rs, err := c.ring.GetAll(); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? Won't GetState only return ACTIVE if it's in the ring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm getting your question. Here we need to check if the ring client (not the lifecycler) has detected the update (made by the lifecycler).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard me, I need to go read the ring/lifecycler code again :)

Copy link
Contributor

@khaines khaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature to add! The PR also LGTM

@pracucci
Copy link
Contributor Author

I guess my only question is during a topology change is there not a way that two compactors can run on the same user? Ex. first compactor comes up, adds itself to the ring and starts compacting user 0. Next compactor starts up adds itself to the ring, and it now owns user 0 and will start compaction on that user, resulting int two compactors working on the same user.

@thorfour Definitely. This can happen "by design". After looking at the bucket compactor logic, doing some manual experiments and talking to Bartek (back in December) this looked to be not a real issue. If the same blocks for the same tenant are simultaneously compacted by two instances, the next compactor run will get rid of the extra blocks created due to simultaneous runs (at this point only 1 of the two compactors will run the next compaction).

Copy link
Contributor

@thorfour thorfour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM awesome work, thanks for clarifying what happens when two compactors work on the same user 👍

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR looks good to me, but I'm concerned about resharding. What if one compactor is working on a user, and at the same time, new compactor is added – and suddenly it's new compactor who owns user 1. Does first compactor detect and cancels the compaction? Or does it not upload compacted blocks in such case?

Update: I see this has been addressed by other comments already.

ShowTokens bool
}{
Ingesters: ingesters,
Now: time.Now(),
Ring: ringDescString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found it useful in the past to be able to see original ring descriptor, as stored in KV store. Can we still expose it, similarly how we hide tokens by default?

[Moved comment to relevant line]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still expose it [...] ?

I can even rollback to the previous logic, but was containing the "ingester" naming which is quite confusing to users. Which information would you like to see? I may add it without using the marshalling of the entire RingDesc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was useful when I introduced new ingester state, which was unknown to previous versions, but I could see on this page that it was actually received correctly. In general, I want to see all fields of the protobuf message exactly as they are stored in KV.

No need to rollback, your change is easier to read when one is interested in tokens only. My use-case is probably very rare [even more than checking tokens, which is already very rare].

@pracucci
Copy link
Contributor Author

Overall PR looks good to me, but I'm concerned about resharding. What if one compactor is working on a user, and at the same time, new compactor is added – and suddenly it's new compactor who owns user 1. Does first compactor detect and cancels the compaction? Or does it not upload compacted blocks in such case?

@pstibrany It doesn't detect or cancel the on-going compaction. There's no strong guarantee two replicas will not compact the same blocks. My take is this one:
#2113 (comment)

Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci
Copy link
Contributor Author

@pstibrany I should have addressed your comments. Do you have any other feedback?

@pracucci pracucci force-pushed the horizontally-scale-compactor branch from a0f3077 to 5f3566c Compare February 12, 2020 14:42
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have addressed your comments. Do you have any other feedback?

Thanks Marco!

@pracucci pracucci merged commit 58ef607 into cortexproject:master Feb 12, 2020
@pracucci pracucci deleted the horizontally-scale-compactor branch February 12, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants