Skip to content

Dynamic tenant shard sizes #5374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
harry671003 opened this issue May 31, 2023 · 5 comments · Fixed by #5393
Closed

Dynamic tenant shard sizes #5374

harry671003 opened this issue May 31, 2023 · 5 comments · Fixed by #5393

Comments

@harry671003
Copy link
Contributor

Is your feature request related to a problem? Please describe.
AWS is working on auto-scaling various components in the query path based on resource utilization. However, auto-scaling just the number of pods will not change the total shards allocated to a tenant.

Currently, the tenant shard sizes can only be constants.

max_queriers_per_tenant: 10
store_gateway_tenant_shard_size: 30
compactor_tenant_shard_size: 5
ruler_tenant_shard_size: 3
ingestion_tenant_shard_size: 100

It'll be really great to have the option to set shard sizes that grow or shrink dynamically with the total number of pods for each component.

Describe the solution you'd like
One approach to do this would be to introduce a type param on the config which can be either percent or number.
If type is percent the shard size will dynamically grow and shrink with the total pods for the component.

max_queriers_per_tenant:
  type: percent
  value: 40 # Sets the shard size to be 40% the total queriers.

store_gateway_tenant_shard_size:
  type: percent
  value: 40 # Sets the SG shard size to be 40% of the total pods.

ingestion_tenant_shard_size:
  type: number
  value: 100 # Sets the shard size to be 100 ingesters. This will not grow dynamically.

Describe alternatives you've considered
Alternative, I've considered is to have a controller that will watch the number of pods for say querier and update the runtime-config with the shard size. This would require an additional component that is not part of Cortex to dynamically resize the shard sizes.

@marianafranco
Copy link
Contributor

k8s has a type named IntOrString that can hold an integer or a string. It is used on some resource attributes where a integer value or a percentage is allowed. For example, you can specify the RollingUpdateStrategy.MaxUnavailable attribute for Deployments as an absolute number (for example, 5) or a percentage of desired Pods (for example, "10%").

Could we do something similar here so we don't need the type param?

@friedrichg
Copy link
Member

Thanks for creating the issue!

Current situation that requires shards to be adjusted manually per tenant per component is far from ideal.

Still adding percentage to it doesn't exactly make it better. Tenants using large clusters will get more resources using the same configuration. Or less resources in smaller clusters. I don't think there is any other limit having that behavior.

What about making the number of shards depend on a per instance value? suppose for example we could specify how many active series we want per ingester per user. Suppose 150k active series per user per ingester. We could have a desired ingestion_tenant_shard_size and a real ingestion_tenant_shard_size. The desired value could be bigger than the number of ingesters, thus creating a metric that could be used for Horizontal Pod Autoscalers (Or any other autoscaler) to decide more ingesters are desirable.

In the case of store-gateways and compactors we could use storage size and for queriers and rulers we could use latency or something more specific.

@harry671003
Copy link
Contributor Author

One thing that may not have been clear from the original issue is that this feature would give us the ability to set the shard sizes to say 40% for all tenants in the cluster. This means every tenant gets 40% queriers and store-gateways avoiding the need for per tenant overrides.
This can be combined with resource based scaling based on say CPU, memory and bandwidth. The HPA will scale up/down the number of pods based on utilization. Cortex can resize the tenant shard size based on the ring such that every tenant gets 40%.

@friedrichg - I'll respond to your concerns below:

Still adding percentage to it doesn't exactly make it better. Tenants using large clusters will get more resources using the same configuration. Or less resources in smaller clusters.

Larger clusters would have large tenants there or there are more tenants. Naturally, in larger clusters, there will be a higher peak QPS. That would justify giving bigger shard sizes for tenants in the query path in larger clusters and smaller shard sizes in smaller clusters. Another idea I'm looking at is provisioning (minReplicas) the queriers and store-gateways proportional to the active series in the cell.

What about making the number of shards depend on a per instance value? We could have a desired ingestion_tenant_shard_size and a real ingestion_tenant_shard_size. The desired value could be bigger than the number of ingesters, thus creating a metric that could be used for Horizontal Pod Autoscalers (Or any other autoscaler) to decide more ingesters are desirable.

For ingesters, this approach might work because we evenly divide the series on every ingester. On the query path, we cannot expect the load to be evenly distributed. Hence the per instance value would change across pod.
If we make the decision of the shard size locally on each store-gateway or querier, how would we make all the pods to agree about the shard size? Cortex needs this global agreement for sharding to work correctly.

@friedrichg
Copy link
Member

I'll respond to your concerns below:

No concerns, just food for thought, ultimately your propose feature looks very simple in the implementation. If the percentage makes sense for you and your team, please go ahead.

I am happy to discuss other solutions too 😄


On the query path, we cannot expect the load to be evenly distributed. Hence the per instance value would change across pod.

I know that.

On a wild thought, we could count (or sum) more than just healthy instances in the ring

// Count the number of healthy instances for Write operation.
if ingester.IsHealthy(Write, i.cfg.RingConfig.HeartbeatTimeout, lastUpdated) {
healthyInstancesCount++
}

We probably should make it general to be used in all the type of rings.

how would we make all the pods to agree about the shard size?

We can discuss different approaches on how to do this.

@harry671003
Copy link
Contributor Author

Hi @friedrichg - I took a stab at the percentage shard size implementation based on the above proposal.

For now I've only implemented this for querier and store-gateways because those are the components we need to auto-scale.

Would like to hear your thoughts on the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants