Bug in ring.shuffleShard()
with Zone Aware Replication
#5467
Labels
ring.shuffleShard()
with Zone Aware Replication
#5467
Bug:
There is a bug in how the sub ring for a tenant is obtained in ring.shuffleShard(). Ideally when shard size increases or decreases by one, the replicas for any block should only change at-most by one instance. But in the cortex implementation more than one instance can change if the
shardSize
is not divisible bynumZones
.The code for shuffle sharding works as follows ring.shuffleShard():
r.ringZones
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
r.ringZones
:numInstancesPerZone
unique instancesExplanation
Assumtions:
Initial state
In the above diagram, the store-gateway tokens are arranged in the order as shown in the first figure. Assume
TENANT_ID
is hashed just beforesg9
's token.When shardSize=9
shardSize = 9
, the shardSize is evenly divisible by number of zones.[sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
[sg9, sg2, sg6]
[sg5, sg1, sg7]
[sg4, sg8, sg3]
[sg8, sg9, sg5]
When shardSize=10
shardSize=10
, the shardSize is not evenly divisible by number of zones.[sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
[sg10, sg11]
[sg9, sg2, sg6, sg11]
[sg15 sg1, sg7, sg10]
[sg4, sg8]
[sg10, sg11, sg8]
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Possible Fix
The text was updated successfully, but these errors were encountered: