Closed
Description
Bug:
There is a bug in how the sub ring for a tenant is obtained in ring.shuffleShard(). Ideally when shard size increases or decreases by one, the replicas for any block should only change at-most by one instance. But in the cortex implementation more than one instance can change if the shardSize
is not divisible by numZones
.
The code for shuffle sharding works as follows ring.shuffleShard():
- Get all the zones
r.ringZones
- Calculate
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
- For each Zone in
r.ringZones
:- Find the first
numInstancesPerZone
unique instances
- Find the first
Explanation
Assumtions:
- Assume there are 11 store-gateways sg1 to sg11.
- Assume current tenant shard size is 9 and it's increased to 10
Initial state
In the above diagram, the store-gateway tokens are arranged in the order as shown in the first figure. Assume TENANT_ID
is hashed just before sg9
's token.
When shardSize=9
- When
shardSize = 9
, the shardSize is evenly divisible by number of zones.
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
numInstancesPerZone = int(math.Ceil(9/3))
numInstancesPerZone = 3
- The algorithm for shuffle sharding will pick the first 3 unique instances from each zone with a maximum of 9 instances.
- The subRing for the tenant will be
[sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
- We'll have 3 instances per AZ:
- AZ1:
[sg9, sg2, sg6]
- AZ2:
[sg5, sg1, sg7]
- AZ3:
[sg4, sg8, sg3]
- AZ1:
- The replicas for the block will be
[sg8, sg9, sg5]
When shardSize=10
- When
shardSize=10
, the shardSize is not evenly divisible by number of zones.
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
numInstancesPerZone = int(math.Ceil(10/3)) = int(math.Ceil(3.33))
numInstancesPerZone = 4
- The algorithm for shuffle sharding will pick the first 4 unique instances from each zone with a maximum of 10 instances.
- The subRing for the tenant will be
[sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
- Two new instances were added to the subRing:
[sg10, sg11]
- We'll have 4 instances in AZ3 and AZ2 and only 2 instances in AZ1:
- AZ1:
[sg9, sg2, sg6, sg11]
- AZ2:
[sg15 sg1, sg7, sg10]
- AZ2:
[sg4, sg8]
- AZ1:
- The replicas for the block will be
[sg10, sg11, sg8]
- The block replicas also changed by 2 when shard size increased from 9 to 11.
To Reproduce
Steps to reproduce the behavior:
- To reproduce, the store-gateway tenant shard size should be increased by one from a value divisible by number of zones.
Expected behavior
- Only one replica should change for any block when tenant shard size increases/decrease by 1.
Possible Fix
- When the shardSize is not divisible by the zones, only the first zone should get the extra instance.
- The instances per zone shouldn't differ by more than one.