Skip to content

Commit b611bf3

Browse files
authored
Add ring multikey proposal (#4832)
Signed-off-by: Daniel Deluiggi <[email protected]> Signed-off-by: Daniel Deluiggi <[email protected]>
1 parent 5cc2a20 commit b611bf3

File tree

3 files changed

+148
-0
lines changed

3 files changed

+148
-0
lines changed

docs/proposals/ring-multikey.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
title: "Ring Multikey"
3+
linkTitle: "Ring Multikey"
4+
weight: 1
5+
slug: ring-multikey
6+
---
7+
8+
- Author: [Daniel Blando](https://github.com/danielblando)
9+
- Date: August 2022
10+
- Status: Proposed
11+
12+
## Background
13+
14+
Cortex implements a ring structure to share information of registered pods for each
15+
service. The data stored and used by the ring need to be implemented via Codec interface. Currently, the only supported
16+
Codec to the ring is [Desc](https://github.com/cortexproject/cortex/blob/c815b3cb61e4d0a3f01e9947d44fa111bc85aa08/pkg/ring/ring.proto#L10).
17+
Desc is a proto.Message with a list of instances descriptions. It is used to store the data for each pod and
18+
saved on a supported KV store. Currently, Cortex supports memberlist, consul and etcd as KV stores. Memberlist works
19+
implementing a gossip protocol while consul and etcd are a KV store service.
20+
21+
The ring is used by different services using a different ring key to store and receive the values from the KV store.
22+
For example, ingester service uses the key "ingester" to save and load data from KV. As the saved data is a Desc
23+
struct only one key is used for all the information.
24+
25+
## Problem
26+
27+
Each service using a single key to save and load information creates a concurrency issue when multiple pods are saving
28+
the same key. When using memberlist, the issue is mitigate as the information is owned by all pods and timestamp is used
29+
to confirm latest data. For consul and etcd, all pods compete to update the key at the same time causing an increase on
30+
latency and failures direct related to number of pods running. Cortex and etcd implementation use a version tag to
31+
make sure no data is being overridden causing the problem of write failures.
32+
33+
On a test running cortex with etcd, distributor was scaled to 300 pods and latency increase was noticed coming from etcd usage.
34+
We can also notice 5xx happening when etcd was running.
35+
![Latency using etcd](/images/proposals/ring-multikey-latency.png)
36+
37+
17:14 - Running memberlist, p99 around 5ms
38+
17:25 - Running etcd, p99 around 200ms
39+
17:25 to 17:34 migrating to multikey
40+
After running etcd multikey poc, p99 around 25ms
41+
42+
## Proposal
43+
44+
### Multikey interface
45+
46+
The proposal is separate the current Desc struct which contains a list of key value in multiple keys. Instead of saving one
47+
"ingester" key, the KV store will have "ingester-1", "ingester-2" keys saved.
48+
49+
Current:
50+
```
51+
Key: ingester/ring/ingester
52+
Value:
53+
{
54+
"ingesters": {
55+
"ingester-0": {
56+
"addr": "10.0.0.1:9095",
57+
"timestamp": 1660760278,
58+
"tokens": [
59+
1,
60+
2
61+
],
62+
"zone": "us-west-2b",
63+
"registered_timestamp": 1660708390
64+
},
65+
"ingester-1": ...
66+
}
67+
}
68+
```
69+
70+
Proposal:
71+
```
72+
Key: ingester/ring/ingester-0
73+
Value:
74+
{
75+
"addr": "10.0.0.1:9095",
76+
"timestamp": 1660760278,
77+
"tokens": [
78+
1,
79+
15
80+
],
81+
"zone": "us-west-2b",
82+
"registered_timestamp": 1660708390
83+
}
84+
85+
Key: ingester/ring/ingester-1
86+
Value:
87+
{
88+
"addr": "10.0.0.2:9095",
89+
"timestamp": 1660760378,
90+
"tokens": [
91+
5,
92+
28
93+
],
94+
"zone": "us-west-2b",
95+
"registered_timestamp": 1660708572
96+
}
97+
```
98+
99+
The proposal is to create an interface called MultiKey. The interface allows KV store to request the codec to split and
100+
join the values is separated keys.
101+
102+
```
103+
type MultiKey interface {
104+
SplitById() map[string]interface{}
105+
106+
JoinIds(map[string]interface{}) Multikey
107+
108+
GetChildFactory() proto.Message
109+
110+
FindDifference(MultiKey) (Multikey, []string, error)
111+
}
112+
```
113+
114+
* SplitById - responsible to split the codec in multiple keys and interface.
115+
* JoinIds - responsible to receive multiple keys and interface creating the codec objec
116+
* GetChildFactory - Allow the kv store to know how to serialize and deserialize the interface returned by “SplitById”.
117+
The interface returned by SplitById need to be a proto.Message
118+
* FindDifference - optimization used to know what need to be updated or deleted from a codec. This avoids updating all keys every
119+
time the coded change. First parameter returns a subset of the Multikey to be updated. Second is a list of keys to
120+
be deleted.
121+
122+
The codec implementation will change to support multiple keys. Currently, the codec interface for KV store supports
123+
only Encode and Decode. New methods will be added which would be used only by the KV stores implementing the multi
124+
key functionality.
125+
126+
```
127+
type Codec interface {
128+
//Existen
129+
Decode([]byte) (interface{}, error)
130+
Encode(interface{}) ([]byte, error)
131+
CodecID() string
132+
133+
//Proposed
134+
DecodeMultiKey(map[string][]byte) (interface{}, error)
135+
EncodeMultiKey(interface{}) (map[string][]byte, error)
136+
}
137+
```
138+
139+
* DecodeMultiKey - called by KV store to decode data downloaded. This function will use the JoinIds method.
140+
* EncodeMultiKey - called by KV store to encode data to be saved. This function will use the SplitById method.
141+
142+
The new KV store will know the data being saved is a reference for multikey. It will use the FindDifference
143+
to know which keys need to be updated. The codec implementation for the new methods will use the JoinIds and SplitById
144+
to know how to separate the codec in multiple keys. The DecodeMultiKey will also use GetChildFactory to know how to
145+
decode the data stored in the kv store.
146+
147+
Example of CAS being used with multikey design:
148+
![Sequence diagram](/images/proposals/ring-multikey-sequence.png)
57.9 KB
Loading
30 KB
Loading

0 commit comments

Comments
 (0)