Skip to content

Ingesters failing to leave the ring in GKE #4467

Closed
@andrejbranch

Description

@andrejbranch

Describe the bug
Im seeing an issue with ingesters sometimes failing to leave the ring. This seems to happen no matter which kv store is used. It looks as though there is a race condition with closing the lifecycler loop and leaving the ring. Below is example logs of using etcd as kv store.

cortex-ingester-5 cortex level=info ts=2021-09-08T17:47:21.918300489Z caller=lifecycler.go:754 msg="changing instance state from" old_state=ACTIVE new_state=LEAVING ring=ingester
cortex-ingester-5 cortex {"level":"warn","ts":"2021-09-08T17:47:42.803Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008901c0/#initially=[cortex-etcd-0.cortex-etcd:2379;cortex-etcd-1.cortex-etcd:2379;cortex-etcd-2.cortex-etcd:2379]","attempt":0,"error":"rpc error: code = Unavailable desc = transport is closing"}

To Reproduce
Steps to reproduce the behavior:
I've been able to reproduce by starting from a completely blank deployment, spin up some ingesters and connect them to the ring. Do a rolling restart on them and all looks good. Every ingester leaves the ring and rejoins properly. After the rolling restart is done, do another rolling restart and some ingesters fail to leave the ring. It doesnt matter if I use memberlist, or etcd, or consul.

Expected behavior
Ingesters should leave the ring no matter how many times they are restarted when unregister on shutdown is true

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: N/A

Storage Engine

  • Blocks
  • Chunks

Additional Context
I found this bug when testing a lower replication factor. I'm wondering if this is missed by most deployments because the replication factor of 3 with extending writes hides the issue. With a lower replication factor if an ingester fails to leave the ring all writes fail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions