-
Notifications
You must be signed in to change notification settings - Fork 816
Dynamic tenant shard sizes #5374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
k8s has a type named IntOrString that can hold an integer or a string. It is used on some resource attributes where a integer value or a percentage is allowed. For example, you can specify the RollingUpdateStrategy.MaxUnavailable attribute for Deployments as an absolute number (for example, 5) or a percentage of desired Pods (for example, "10%"). Could we do something similar here so we don't need the |
Thanks for creating the issue! Current situation that requires shards to be adjusted manually per tenant per component is far from ideal. Still adding percentage to it doesn't exactly make it better. Tenants using large clusters will get more resources using the same configuration. Or less resources in smaller clusters. I don't think there is any other limit having that behavior. What about making the number of shards depend on a per instance value? suppose for example we could specify how many active series we want per ingester per user. Suppose 150k active series per user per ingester. We could have a desired ingestion_tenant_shard_size and a real ingestion_tenant_shard_size. The desired value could be bigger than the number of ingesters, thus creating a metric that could be used for Horizontal Pod Autoscalers (Or any other autoscaler) to decide more ingesters are desirable. In the case of store-gateways and compactors we could use storage size and for queriers and rulers we could use latency or something more specific. |
One thing that may not have been clear from the original issue is that this feature would give us the ability to set the shard sizes to say 40% for all tenants in the cluster. This means every tenant gets 40% queriers and store-gateways avoiding the need for per tenant overrides. @friedrichg - I'll respond to your concerns below:
Larger clusters would have large tenants there or there are more tenants. Naturally, in larger clusters, there will be a higher peak QPS. That would justify giving bigger shard sizes for tenants in the query path in larger clusters and smaller shard sizes in smaller clusters. Another idea I'm looking at is provisioning (
For ingesters, this approach might work because we evenly divide the series on every ingester. On the query path, we cannot expect the load to be evenly distributed. Hence the per instance value would change across pod. |
No concerns, just food for thought, ultimately your propose feature looks very simple in the implementation. If the percentage makes sense for you and your team, please go ahead. I am happy to discuss other solutions too 😄
I know that. On a wild thought, we could count (or sum) more than just healthy instances in the ring Lines 826 to 829 in 220683a
We probably should make it general to be used in all the type of rings.
We can discuss different approaches on how to do this. |
Hi @friedrichg - I took a stab at the percentage shard size implementation based on the above proposal. For now I've only implemented this for querier and store-gateways because those are the components we need to auto-scale. Would like to hear your thoughts on the implementation. |
Is your feature request related to a problem? Please describe.
AWS is working on auto-scaling various components in the query path based on resource utilization. However, auto-scaling just the number of pods will not change the total shards allocated to a tenant.
Currently, the tenant shard sizes can only be constants.
It'll be really great to have the option to set shard sizes that grow or shrink dynamically with the total number of pods for each component.
Describe the solution you'd like
One approach to do this would be to introduce a
type
param on the config which can be eitherpercent
ornumber
.If
type
ispercent
the shard size will dynamically grow and shrink with the total pods for the component.Describe alternatives you've considered
Alternative, I've considered is to have a controller that will watch the number of pods for say querier and update the
runtime-config
with the shard size. This would require an additional component that is not part of Cortex to dynamically resize the shard sizes.The text was updated successfully, but these errors were encountered: