Dynamic tenant shard sizes #5374

harry671003 · 2023-05-31T15:15:26Z

Is your feature request related to a problem? Please describe.
AWS is working on auto-scaling various components in the query path based on resource utilization. However, auto-scaling just the number of pods will not change the total shards allocated to a tenant.

Currently, the tenant shard sizes can only be constants.

max_queriers_per_tenant: 10
store_gateway_tenant_shard_size: 30
compactor_tenant_shard_size: 5
ruler_tenant_shard_size: 3
ingestion_tenant_shard_size: 100

It'll be really great to have the option to set shard sizes that grow or shrink dynamically with the total number of pods for each component.

Describe the solution you'd like
One approach to do this would be to introduce a type param on the config which can be either percent or number.
If type is percent the shard size will dynamically grow and shrink with the total pods for the component.

max_queriers_per_tenant:
  type: percent
  value: 40 # Sets the shard size to be 40% the total queriers.

store_gateway_tenant_shard_size:
  type: percent
  value: 40 # Sets the SG shard size to be 40% of the total pods.

ingestion_tenant_shard_size:
  type: number
  value: 100 # Sets the shard size to be 100 ingesters. This will not grow dynamically.

Describe alternatives you've considered
Alternative, I've considered is to have a controller that will watch the number of pods for say querier and update the runtime-config with the shard size. This would require an additional component that is not part of Cortex to dynamically resize the shard sizes.

The text was updated successfully, but these errors were encountered:

marianafranco · 2023-05-31T23:24:00Z

k8s has a type named IntOrString that can hold an integer or a string. It is used on some resource attributes where a integer value or a percentage is allowed. For example, you can specify the RollingUpdateStrategy.MaxUnavailable attribute for Deployments as an absolute number (for example, 5) or a percentage of desired Pods (for example, "10%").

Could we do something similar here so we don't need the type param?

friedrichg · 2023-06-01T11:31:10Z

Thanks for creating the issue!

Current situation that requires shards to be adjusted manually per tenant per component is far from ideal.

Still adding percentage to it doesn't exactly make it better. Tenants using large clusters will get more resources using the same configuration. Or less resources in smaller clusters. I don't think there is any other limit having that behavior.

What about making the number of shards depend on a per instance value? suppose for example we could specify how many active series we want per ingester per user. Suppose 150k active series per user per ingester. We could have a desired ingestion_tenant_shard_size and a real ingestion_tenant_shard_size. The desired value could be bigger than the number of ingesters, thus creating a metric that could be used for Horizontal Pod Autoscalers (Or any other autoscaler) to decide more ingesters are desirable.

In the case of store-gateways and compactors we could use storage size and for queriers and rulers we could use latency or something more specific.

harry671003 · 2023-06-01T15:31:14Z

One thing that may not have been clear from the original issue is that this feature would give us the ability to set the shard sizes to say 40% for all tenants in the cluster. This means every tenant gets 40% queriers and store-gateways avoiding the need for per tenant overrides.
This can be combined with resource based scaling based on say CPU, memory and bandwidth. The HPA will scale up/down the number of pods based on utilization. Cortex can resize the tenant shard size based on the ring such that every tenant gets 40%.

@friedrichg - I'll respond to your concerns below:

Still adding percentage to it doesn't exactly make it better. Tenants using large clusters will get more resources using the same configuration. Or less resources in smaller clusters.

Larger clusters would have large tenants there or there are more tenants. Naturally, in larger clusters, there will be a higher peak QPS. That would justify giving bigger shard sizes for tenants in the query path in larger clusters and smaller shard sizes in smaller clusters. Another idea I'm looking at is provisioning (minReplicas) the queriers and store-gateways proportional to the active series in the cell.

What about making the number of shards depend on a per instance value? We could have a desired ingestion_tenant_shard_size and a real ingestion_tenant_shard_size. The desired value could be bigger than the number of ingesters, thus creating a metric that could be used for Horizontal Pod Autoscalers (Or any other autoscaler) to decide more ingesters are desirable.

For ingesters, this approach might work because we evenly divide the series on every ingester. On the query path, we cannot expect the load to be evenly distributed. Hence the per instance value would change across pod.
If we make the decision of the shard size locally on each store-gateway or querier, how would we make all the pods to agree about the shard size? Cortex needs this global agreement for sharding to work correctly.

friedrichg · 2023-06-02T17:27:10Z

I'll respond to your concerns below:

No concerns, just food for thought, ultimately your propose feature looks very simple in the implementation. If the percentage makes sense for you and your team, please go ahead.

I am happy to discuss other solutions too 😄

On the query path, we cannot expect the load to be evenly distributed. Hence the per instance value would change across pod.

I know that.

On a wild thought, we could count (or sum) more than just healthy instances in the ring

cortex/pkg/ring/lifecycler.go

Lines 826 to 829 in 220683a

    
           // Count the number of healthy instances for Write operation. 
        
           if ingester.IsHealthy(Write, i.cfg.RingConfig.HeartbeatTimeout, lastUpdated) { 
        
           	healthyInstancesCount++ 
        
           }

We probably should make it general to be used in all the type of rings.

how would we make all the pods to agree about the shard size?

We can discuss different approaches on how to do this.

harry671003 · 2023-06-09T01:09:43Z

Hi @friedrichg - I took a stab at the percentage shard size implementation based on the above proposal.

For now I've only implemented this for querier and store-gateways because those are the components we need to auto-scale.

Would like to hear your thoughts on the implementation.

friedrichg added the type/feature label Jun 1, 2023

friedrichg added the component/ring label Jun 2, 2023

harry671003 mentioned this issue Jun 9, 2023

Allow shard sizes to be percent of instances #5393

Merged

3 tasks

yeya24 closed this as completed in #5393 Jun 13, 2023

wilguo mentioned this issue Mar 17, 2025

Percentage Based Sharding for Rulers #6652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic tenant shard sizes #5374

Dynamic tenant shard sizes #5374

harry671003 commented May 31, 2023

marianafranco commented May 31, 2023

friedrichg commented Jun 1, 2023

harry671003 commented Jun 1, 2023

friedrichg commented Jun 2, 2023

harry671003 commented Jun 9, 2023

Dynamic tenant shard sizes #5374

Dynamic tenant shard sizes #5374

Comments

harry671003 commented May 31, 2023

marianafranco commented May 31, 2023

friedrichg commented Jun 1, 2023

harry671003 commented Jun 1, 2023

friedrichg commented Jun 2, 2023

harry671003 commented Jun 9, 2023