Skip to content

Shared in-memory index cache for queriers with blocks storage #2189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

What this PR does:
In this PR I propose to shift to a shared in-memory index cache for queriers with blocks storage. The problem with per-tenant caches is that the total max cache size linearly increase with the number of tenants, while with a single cache it's easier to keep the max memory used under control.

Which issue(s) this PR fixes:
Fixes #2069

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci
Copy link
Contributor Author

About this TODO:

// TODO: apparently Thanos has a bug which cause a block to not be considered if the
//       query timetamp matches the block max timestamp

This issue is fixed in Thanos master but will upgrade it in a separate PR.

@pracucci pracucci force-pushed the single-querier-in-memory-cache-for-blocks-storage branch 2 times, most recently from ecc6f61 to 7167a70 Compare February 28, 2020 09:17
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci force-pushed the single-querier-in-memory-cache-for-blocks-storage branch 2 times, most recently from 0f40094 to 9ffb9d9 Compare February 28, 2020 10:20
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice PR, only one minor nit.

Though for me there is a bigger concern that we are not prefixing anything with the tenantID. I know its extremely rare for ULIDs to match across tenants, I'm afraid that if it did, we might be leaking some data.

Can we add a TODO to make sure that we propagate the tenantID into the cache somehow?

@@ -154,7 +154,7 @@ type BucketStoreConfig struct {
func (cfg *BucketStoreConfig) RegisterFlags(f *flag.FlagSet) {
f.StringVar(&cfg.SyncDir, "experimental.tsdb.bucket-store.sync-dir", "tsdb-sync", "Directory to store synchronized TSDB index headers.")
f.DurationVar(&cfg.SyncInterval, "experimental.tsdb.bucket-store.sync-interval", 5*time.Minute, "How frequently scan the bucket to look for changes (new blocks shipped by ingesters and blocks removed by retention or compaction). 0 disables it.")
f.Uint64Var(&cfg.IndexCacheSizeBytes, "experimental.tsdb.bucket-store.index-cache-size-bytes", uint64(250*units.Mebibyte), "Size - in bytes - of a per-tenant in-memory index cache used to speed up blocks index lookups.")
f.Uint64Var(&cfg.IndexCacheSizeBytes, "experimental.tsdb.bucket-store.index-cache-size-bytes", uint64(1*units.Gibibyte), "Size in bytes of in-memory index cache used to speed up blocks index lookups (shared across multiple tenants).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared across all tenants is better, but only slightly :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared between ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to "shared between all tenants". I hope to have correctly merged both feedback 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merges can be difficult :)

pracucci and others added 6 commits February 28, 2020 14:25
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Co-Authored-By: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci pracucci force-pushed the single-querier-in-memory-cache-for-blocks-storage branch from 9ffb9d9 to 8f19883 Compare February 28, 2020 13:29
Signed-off-by: Marco Pracucci <[email protected]>
@pracucci
Copy link
Contributor Author

Though for me there is a bigger concern that we are not prefixing anything with the tenantID. I know its extremely rare for ULIDs to match across tenants, I'm afraid that if it did, we might be leaking some data.

Can we add a TODO to make sure that we propagate the tenantID into the cache somehow?

This is a very good point. After some discussion in #2069 we decided to not propagate the tenantID but make the block ID entropy safer, which is what I did here:
prometheus/prometheus#6867

Feel free share further feedback. I'm willing to re-consider this decision if you think it's risky.

@pracucci pracucci merged commit 151c577 into cortexproject:master Feb 28, 2020
@pracucci pracucci deleted the single-querier-in-memory-cache-for-blocks-storage branch February 28, 2020 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shift from a per-tenant to a per-querier in-memory index cache in the blocks storage
3 participants