Skip to content

Add planner filter #4318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

ac1214
Copy link
Contributor

@ac1214 ac1214 commented Jun 24, 2021

Signed-off-by: Albert [email protected]

What this PR does:

Implements generation of parallelize plans for the proposal outlined in #4272. Currently the parallelizable plans are generated but only the first plan in the plans list is selected to run.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Albert <[email protected]>
@ac1214
Copy link
Contributor Author

ac1214 commented Jun 28, 2021

Testing done for these changes

To test these changes I collected Prometheus metrics for a couple of days without any compaction. After generating some uncompacted blocks, I saved them to be able to replicate the same compaction multiple times. I downloaded blocks from s3 to be able to replicate testing completed and not delete blocks in s3.
The testing I completed to ensure that this compaction method works is described below.

  • Tested by querying same Prometheus metrics before and after compaction and comparing the query results to ensure that the data had remained the same.
  • Ran Prometheus’ tsdb analyze tool to compare the data in the blocks before and after compaction.
  • Compared the resulting blocks from using the current compaction method with the compaction/grouping method introduced in this PR. I compared the minTime, maxTime, and stats attributes between the blocks of the two different compactions to find matching blocks. After finding matching blocks, I compared the md5 sum of the chunks of the blocks with matching minTime, maxTime, and stats attributes to ensure that the data was identical.

Signed-off-by: Albert <[email protected]>
@bboreham
Copy link
Contributor

Sorry but we have begun the process of cutting a new release; please rebase from master and move your CHANGELOG entry to the top under ## master / unreleased

Signed-off-by: Albert <[email protected]>
@@ -146,6 +149,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
"If 0, blocks will be deleted straight away. Note that deleting blocks immediately can cause query failures.")
f.DurationVar(&cfg.TenantCleanupDelay, "compactor.tenant-cleanup-delay", 6*time.Hour, "For tenants marked for deletion, this is time between deleting of last block, and doing final cleanup (marker files, debug files) of the tenant.")
f.BoolVar(&cfg.BlockDeletionMarksMigrationEnabled, "compactor.block-deletion-marks-migration-enabled", true, "When enabled, at compactor startup the bucket will be scanned and all found deletion marks inside the block location will be copied to the markers global location too. This option can (and should) be safely disabled as soon as the compactor has successfully run at least once.")
f.BoolVar(&cfg.PlannerFilterEnabled, "compactor.planner-filter-enabled", false, "Filter and plan blocks within PlannerFilter instead of through Thanos planner and grouper.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the config option should be that specific (what this CLI flag describes is an internal implementation detail). The whole purpose of the #4272 proposal is to introduce a different sharding strategy for the compactor. To keep it consistent with other Cortex services, the config option could be compactor.sharding-strategy with values default (the current one) and shuffle-sharding (the new one you're working on).

level.Info(c.logger).Log("msg", "Compactor using planner filter")

// Create a new planner filter
f, err := NewPlannerFilter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to build it. It's not the responsability of the metadata fetcher to run the planning and filter out blocks belonging to other shards. It's not how the compactor was designed. You should build this feature working on the compactor grouper and planner.

@ac1214 ac1214 closed this Jul 10, 2021
@ac1214 ac1214 mentioned this pull request Jul 14, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants