Skip to content

TSDB head compactor concurrency #2172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 25, 2020
Merged

TSDB head compactor concurrency #2172

merged 21 commits into from
Feb 25, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Feb 21, 2020

What this PR does: This PR makes TSDB head compaction interval and concurrency configurable (defaults to 2h). Shipping now happens right after compaction, and previous options to control shipping interval and concurrency have been removed. Shipping can still be disabled.

Which issue(s) this PR fixes:
Fixes #2003

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, will 👍 after comments are addressed :)

@@ -123,6 +123,14 @@ tsdb:
# CLI flag: -experimental.tsdb.ship-concurrency
[ship_concurrency: <int> | default = 10]

# How frequently does Cortex try to compact TSDB head. Block is only created if data covers smallest block range.
# CLI flag: -experimental.tsdb.head-compaction-interval
[head_compaction_interval: <duration> | default = 2h]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this should be 5m and the tests should fail. Can you check why its not failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have edited this file manually. Somehow it isn't covered by Marco's tool, but haven't looked into more detail why yet. I'll fix the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is simply because block-storage is a separate file, while doc tool only generates config-file-reference.md. @pracucci surely knows better, I'm just guessing based on 5min looking around.

@pracucci
Copy link
Contributor

Fixes #2174

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @pstibrany and thanks for taking care of this! I left few comments, everything else LGTM 🎉

@@ -129,6 +145,11 @@ func NewV2(cfg Config, clientConfig client.Config, limits *validation.Overrides,
i.done.Add(1)
go i.updateLoop()

if i.cfg.TSDBConfig.HeadCompactionConcurrency > 0 && i.cfg.TSDBConfig.HeadCompactionInterval > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it possible to disable compactions? We should never allow it. Even if you run without shipper you definitely wants compactions otherwise your memory will soon explode.

I would suggest:

  1. Add a config validation to ensure that HeadCompactionConcurrency > 0
  2. Always run compactionLoop()

What's your take?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left an option to disable it when running tests. But now I think it doesn't make sense even in that scenario. I'll modify it.

@@ -108,6 +112,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&cfg.ShipConcurrency, "experimental.tsdb.ship-concurrency", 10, "Maximum number of tenants concurrently shipping blocks to the storage.")
f.StringVar(&cfg.Backend, "experimental.tsdb.backend", "s3", "TSDB storage backend to use")
f.IntVar(&cfg.MaxTSDBOpeningConcurrencyOnStartup, "experimental.tsdb.max-tsdb-opening-concurrency-on-startup", 10, "limit the number of concurrently opening TSDB's on startup")
f.DurationVar(&cfg.HeadCompactionInterval, "experimental.tsdb.head-compaction-interval", 5*time.Minute, "How frequently does Cortex try to compact TSDB head. Block is only created if data covers smallest block range.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5m looks a bit unfrequent to me. This is a very inexpensive operation and I would suggest to run it more frequently: 30s or 1m.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to make it configurable? I'd just hardcode 1m into the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about hardcoding to 1m as well, and that was my initial comment. I then changed my mind because I think being able to configure it could help future integration tests (ie. compact every 5s to have integration tests running faster). What's your take?

P.S. I'm talking about integration tests cause I would like to add at least 1 integration test testing the compaction and shipping soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a secret option then ;-) I'd suggest adding a check to avoid setting it too high (> 5 mins).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a secret option then ;-)

We can have a secret YAML config option with doc:"hidden" but I'm not sure it's possible to have an hidden CLI flag (I haven't investigated).

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
This reverts commit ac372e9f065f00e909f67d7c3b3557e2974f311c.

Signed-off-by: Peter Štibraný <[email protected]>
This reverts commit 9adb4cb9b4273ac189a3bb0478090966a0db67ec.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
To avoid confusion with compactor, which also has "compaction_interval"
config field.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
They are now updated in Cortex code, when we call compaction.

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
- decrease default to 1 min
- compaction cannot be disabled
- compaction interval must be greater than 0, and <= 5 mins

Signed-off-by: Peter Štibraný <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a nit. Thanks!

Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
@pracucci pracucci merged commit 6d4da2c into cortexproject:master Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set a max concurrency TSDB blocks compaction
3 participants