Skip to content

Bug in ring.shuffleShard() with Zone Aware Replication #5467

Closed
@harry671003

Description

@harry671003

Bug:

There is a bug in how the sub ring for a tenant is obtained in ring.shuffleShard(). Ideally when shard size increases or decreases by one, the replicas for any block should only change at-most by one instance. But in the cortex implementation more than one instance can change if the shardSize is not divisible by numZones.

The code for shuffle sharding works as follows ring.shuffleShard():

  • Get all the zones r.ringZones
  • Calculate numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
  • For each Zone in r.ringZones:
    • Find the first numInstancesPerZone unique instances

Explanation

Assumtions:

  • Assume there are 11 store-gateways sg1 to sg11.
  • Assume current tenant shard size is 9 and it's increased to 10

Initial state

SG_Ring(1)

In the above diagram, the store-gateway tokens are arranged in the order as shown in the first figure. Assume TENANT_ID is hashed just before sg9's token.

When shardSize=9

ShardSize=9

  • When shardSize = 9, the shardSize is evenly divisible by number of zones.
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
numInstancesPerZone = int(math.Ceil(9/3))
numInstancesPerZone = 3
  • The algorithm for shuffle sharding will pick the first 3 unique instances from each zone with a maximum of 9 instances.
  • The subRing for the tenant will be [sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
  • We'll have 3 instances per AZ:
    • AZ1: [sg9, sg2, sg6]
    • AZ2: [sg5, sg1, sg7]
    • AZ3: [sg4, sg8, sg3]
  • The replicas for the block will be [sg8, sg9, sg5]

When shardSize=10

ShardSize=10(1)

  • When shardSize=10, the shardSize is not evenly divisible by number of zones.
numInstancesPerZone = int(math.Ceil(float64(shardSize) / float64(numZones)))
numInstancesPerZone = int(math.Ceil(10/3)) = int(math.Ceil(3.33))
numInstancesPerZone = 4
  • The algorithm for shuffle sharding will pick the first 4 unique instances from each zone with a maximum of 10 instances.
  • The subRing for the tenant will be [sg9, sg5, sg2, sg4, sg1, sg6, sg7, sg8, sg3]
  • Two new instances were added to the subRing: [sg10, sg11]
  • We'll have 4 instances in AZ3 and AZ2 and only 2 instances in AZ1:
    • AZ1: [sg9, sg2, sg6, sg11]
    • AZ2: [sg15 sg1, sg7, sg10]
    • AZ2: [sg4, sg8]
  • The replicas for the block will be [sg10, sg11, sg8]
  • The block replicas also changed by 2 when shard size increased from 9 to 11.

To Reproduce
Steps to reproduce the behavior:

  • To reproduce, the store-gateway tenant shard size should be increased by one from a value divisible by number of zones.

Expected behavior

  • Only one replica should change for any block when tenant shard size increases/decrease by 1.

Possible Fix

  • When the shardSize is not divisible by the zones, only the first zone should get the extra instance.
  • The instances per zone shouldn't differ by more than one.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions