Skip to content

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

Closed
@roystchiang

Description

@roystchiang

Describe the bug
With extend-writes enabled with AZ-aware replication on ingester, remote_write can fail when multiple ingesters fail in the same AZ.

Consider a cluster with 4 ingesters. ingester-A(az-1), and ingester-B(az-1), ingester-C(az-2), ingester-D(az-3). Ingester-A is in the leaving state, while ingester-B is in unhealthy state due unclean shutdown(OOM for example), ingester-C, and ingester-D are healthy

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/ring.go#L387 we'll select all ingesters, even though ingesters A and B are in the same AZ, because ingester-A is not in a healthy state.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L36, since we pass in 4 ingesters, minSuccess is now (4/2) + 1 = 3. However, we only have 2 healthy instances, because ingesters in AZ-1 are in degraded state. This will trigger https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L54, and fail the write immediately.

There is another similar issue, but with ingester-A in leaving state, while ingester-B is in active state with unclean shutdown, and has not reached the heartbeat timeout.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L70, distributor will require 3 ingesters for a successful write because minSuccess is 3, and instances is also 3. Distributor will attempt to write to ingester-B, ingester-C, and ingester-D, and will fail since ingester-B is actually unavaible.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (SHA or version)
  2. Perform Write Operations
  3. Trigger an unclean shutdown for 1 ingester, and start shutting down another ingester in the same AZ

Expected behavior
I expect the extend-writes to work with just a quorum of available ingesters, since extend-writes should be a best-effort

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: helm

Storage Engine

  • Blocks
  • Chunks

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions