|
| 1 | +--- |
| 2 | +title: "Ring Multikey" |
| 3 | +linkTitle: "Ring Multikey" |
| 4 | +weight: 1 |
| 5 | +slug: ring-multikey |
| 6 | +--- |
| 7 | + |
| 8 | +- Author: [Daniel Blando](https://github.com/danielblando) |
| 9 | +- Date: August 2022 |
| 10 | +- Status: Proposed |
| 11 | + |
| 12 | +## Background |
| 13 | + |
| 14 | +Cortex implements a ring structure to share information of registered pods for each |
| 15 | +service. The data stored and used by the ring need to be implemented via Codec interface. Currently, the only supported |
| 16 | +Codec to the ring is [Desc](https://github.com/cortexproject/cortex/blob/c815b3cb61e4d0a3f01e9947d44fa111bc85aa08/pkg/ring/ring.proto#L10). |
| 17 | +Desc is a proto.Message with a list of instances descriptions. It is used to store the data for each pod and |
| 18 | +saved on a supported KV store. Currently, Cortex supports memberlist, consul and etcd as KV stores. Memberlist works |
| 19 | +implementing a gossip protocol while consul and etcd are a KV store service. |
| 20 | + |
| 21 | +The ring is used by different services using a different ring key to store and receive the values from the KV store. |
| 22 | +For example, ingester service uses the key "ingester" to save and load data from KV. As the saved data is a Desc |
| 23 | +struct only one key is used for all the information. |
| 24 | + |
| 25 | +## Problem |
| 26 | + |
| 27 | +Each service using a single key to save and load information creates a concurrency issue when multiple pods are saving |
| 28 | +the same key. When using memberlist, the issue is mitigate as the information is owned by all pods and timestamp is used |
| 29 | +to confirm latest data. For consul and etcd, all pods compete to update the key at the same time causing an increase on |
| 30 | +latency and failures direct related to number of pods running. Cortex and etcd implementation use a version tag to |
| 31 | +make sure no data is being overridden causing the problem of write failures. |
| 32 | + |
| 33 | +On a test running cortex with etcd, distributor was scaled to 300 pods and latency increase was noticed coming from etcd usage. |
| 34 | +We can also notice 5xx happening when etcd was running. |
| 35 | + |
| 36 | + |
| 37 | +17:14 - Running memberlist, p99 around 5ms |
| 38 | +17:25 - Running etcd, p99 around 200ms |
| 39 | +17:25 to 17:34 migrating to multikey |
| 40 | +After running etcd multikey poc, p99 around 25ms |
| 41 | + |
| 42 | +## Proposal |
| 43 | + |
| 44 | +### Multikey interface |
| 45 | + |
| 46 | +The proposal is separate the current Desc struct which contains a list of key value in multiple keys. Instead of saving one |
| 47 | +"ingester" key, the KV store will have "ingester-1", "ingester-2" keys saved. |
| 48 | + |
| 49 | +Current: |
| 50 | +``` |
| 51 | +Key: ingester/ring/ingester |
| 52 | +Value: |
| 53 | +{ |
| 54 | + "ingesters": { |
| 55 | + "ingester-0": { |
| 56 | + "addr": "10.0.0.1:9095", |
| 57 | + "timestamp": 1660760278, |
| 58 | + "tokens": [ |
| 59 | + 1, |
| 60 | + 2 |
| 61 | + ], |
| 62 | + "zone": "us-west-2b", |
| 63 | + "registered_timestamp": 1660708390 |
| 64 | + }, |
| 65 | + "ingester-1": ... |
| 66 | + } |
| 67 | +} |
| 68 | +``` |
| 69 | + |
| 70 | +Proposal: |
| 71 | +``` |
| 72 | +Key: ingester/ring/ingester-0 |
| 73 | +Value: |
| 74 | +{ |
| 75 | + "addr": "10.0.0.1:9095", |
| 76 | + "timestamp": 1660760278, |
| 77 | + "tokens": [ |
| 78 | + 1, |
| 79 | + 15 |
| 80 | + ], |
| 81 | + "zone": "us-west-2b", |
| 82 | + "registered_timestamp": 1660708390 |
| 83 | +} |
| 84 | +
|
| 85 | +Key: ingester/ring/ingester-1 |
| 86 | +Value: |
| 87 | +{ |
| 88 | + "addr": "10.0.0.2:9095", |
| 89 | + "timestamp": 1660760378, |
| 90 | + "tokens": [ |
| 91 | + 5, |
| 92 | + 28 |
| 93 | + ], |
| 94 | + "zone": "us-west-2b", |
| 95 | + "registered_timestamp": 1660708572 |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +The proposal is to create an interface called MultiKey. The interface allows KV store to request the codec to split and |
| 100 | +join the values is separated keys. |
| 101 | + |
| 102 | +``` |
| 103 | +type MultiKey interface { |
| 104 | + SplitById() map[string]interface{} |
| 105 | +
|
| 106 | + JoinIds(map[string]interface{}) Multikey |
| 107 | +
|
| 108 | + GetChildFactory() proto.Message |
| 109 | +
|
| 110 | + FindDifference(MultiKey) (Multikey, []string, error) |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +* SplitById - responsible to split the codec in multiple keys and interface. |
| 115 | +* JoinIds - responsible to receive multiple keys and interface creating the codec objec |
| 116 | +* GetChildFactory - Allow the kv store to know how to serialize and deserialize the interface returned by “SplitById”. |
| 117 | + The interface returned by SplitById need to be a proto.Message |
| 118 | +* FindDifference - optimization used to know what need to be updated or deleted from a codec. This avoids updating all keys every |
| 119 | + time the coded change. First parameter returns a subset of the Multikey to be updated. Second is a list of keys to |
| 120 | + be deleted. |
| 121 | + |
| 122 | +The codec implementation will change to support multiple keys. Currently, the codec interface for KV store supports |
| 123 | +only Encode and Decode. New methods will be added which would be used only by the KV stores implementing the multi |
| 124 | +key functionality. |
| 125 | + |
| 126 | +``` |
| 127 | +type Codec interface { |
| 128 | + //Existen |
| 129 | + Decode([]byte) (interface{}, error) |
| 130 | + Encode(interface{}) ([]byte, error) |
| 131 | + CodecID() string |
| 132 | +
|
| 133 | + //Proposed |
| 134 | + DecodeMultiKey(map[string][]byte) (interface{}, error) |
| 135 | + EncodeMultiKey(interface{}) (map[string][]byte, error) |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +* DecodeMultiKey - called by KV store to decode data downloaded. This function will use the JoinIds method. |
| 140 | +* EncodeMultiKey - called by KV store to encode data to be saved. This function will use the SplitById method. |
| 141 | + |
| 142 | +The new KV store will know the data being saved is a reference for multikey. It will use the FindDifference |
| 143 | +to know which keys need to be updated. The codec implementation for the new methods will use the JoinIds and SplitById |
| 144 | +to know how to separate the codec in multiple keys. The DecodeMultiKey will also use GetChildFactory to know how to |
| 145 | +decode the data stored in the kv store. |
| 146 | + |
| 147 | +Example of CAS being used with multikey design: |
| 148 | + |
0 commit comments