Ingesters failing to leave the ring in GKE

**Describe the bug**
Im seeing an issue with ingesters sometimes failing to leave the ring. This seems to happen no matter which kv store is used. It looks as though there is a race condition with closing the lifecycler loop and leaving the ring. Below is example logs of using etcd as kv store.

```
cortex-ingester-5 cortex level=info ts=2021-09-08T17:47:21.918300489Z caller=lifecycler.go:754 msg="changing instance state from" old_state=ACTIVE new_state=LEAVING ring=ingester
cortex-ingester-5 cortex {"level":"warn","ts":"2021-09-08T17:47:42.803Z","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008901c0/#initially=[cortex-etcd-0.cortex-etcd:2379;cortex-etcd-1.cortex-etcd:2379;cortex-etcd-2.cortex-etcd:2379]","attempt":0,"error":"rpc error: code = Unavailable desc = transport is closing"}
```

**To Reproduce**
Steps to reproduce the behavior:
I've been able to reproduce by starting from a completely blank deployment,  spin up some ingesters and connect them to the ring. Do a rolling restart on them and all looks good. Every ingester leaves the ring and rejoins properly. After the rolling restart is done, do another rolling restart and some ingesters fail to leave the ring. It doesnt matter if I use memberlist, or etcd, or consul.

**Expected behavior**
Ingesters should leave the ring no matter how many times they are restarted when unregister on shutdown is true

**Environment:**
 - Infrastructure: kubernetes
 - Deployment tool: N/A

**Storage Engine**
- [X] Blocks
- [ ] Chunks

**Additional Context**
I found this bug when testing a lower replication factor. I'm wondering if this is missed by most deployments because the replication factor of 3 with extending writes hides the issue. With a lower replication factor if an ingester fails to leave the ring all writes fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ingesters failing to leave the ring in GKE #4467

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ingesters failing to leave the ring in GKE #4467

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions