-
Notifications
You must be signed in to change notification settings - Fork 817
Refactor block storage User Scanner and introduce user index file #6780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks awesome
Any reason you split the old |
About the updater, i like the way you did with fake tenantId because it give us the owner of the update, but it doesnt work in cases where shard is off or if compactor is not running. |
Because today I see different components have different behaviors. Store Gateway will try to load blocks for deleting tenants, which means they have the tenant deletion marker and data in their folder but still within tenant clean up delay. While other components only care about active or active and tenants with deletion marker. I don't want to change store gateway behavior so I separate it out. It should remain the same behavior for those components.
That's correct. I thought about the idea of updating user index as part of the scanner call but ended up splitting it out for the ownership logic to make things cleaner. For the case of compactor not running, I think it is fine as user index has max stale period and components can easily fallback to the list scanner. It shouldn't be a big deal as worst case is the same as today. When sharding is off, yes I want to avoid updating the same user index file from all compactors. I think it is better to just fallback to list for all compactors and rely on tenants iter caching bucket. I can update the doc and mention that this feature is only when sharding is enabled. I would say if the scale is small enough and sharding is disabled then user index is unnecessary for that usecase. |
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!!!
LGTM! thanks for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
One question -- why is there caching in the user_index strategy? I thought the user index will be updated by the clean up interval, and we always wanted to read the latest user index. Can you elaborate on situations where the TTL will be smaller or greater than the clean up interval?
So the idea of caching is to share the user scan result within the same process. For example, compactor and cleaner are the same process and they have their own scanner as they have different goroutines to scan users. We share the base scanner so that with client side caching we don't have to scan twice. This caching would be most useful for list strategy. User index could also work. We don't have to read the latest user index I think. Users don't change frequently and we are ok as long as we get latest users before |
Signed-off-by: yeya24 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
What this PR does:
This PR refactors the existing block storage user scanner package into new user scanner with configurable strategy. The default value is
list
which scans the object storage using the same way.A new strategy called
user_index
is introduced in this PR. It is intended to reduce the List API call to the object storage by caching user list in a file. It is similar to the idea of bucket index and it contains the list of active, deleting and deleted users.user_index
strategy will fallback tolist
strategy in case user index file is corrupted or not found, or exceeds max stale period. Ifuser_index
strategy is used, compactor is responsible of updating user index file periodically.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]