Skip to content

Commit 4df2ff7

Browse files
committed
Move some replication guides to Administration
* Monitoring a replica set * Reseeding a replica * Recovering from a degraded state * Resolving replication conflicts
1 parent b515324 commit 4df2ff7

38 files changed

+876
-23
lines changed

doc/book/admin/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ This chapter includes the following sections:
3434
security
3535
access_control
3636
vshard_admin
37+
replication/index
3738
server_introspection
3839
daemon_supervision
3940
disaster_recovery

doc/book/admin/replication/index.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Replication
2+
===========
3+
4+
.. toctree::
5+
:maxdepth: 2
6+
7+
repl_monitoring
8+
repl_recover
9+
repl_reseed
10+
repl_problem_solving

doc/concepts/replication/repl_monitoring.rst renamed to doc/book/admin/replication/repl_monitoring.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ these instances, issue a :doc:`/reference/reference_lua/box_info/replication` re
4242
This report is for a master-master replica set of three instances, each having
4343
its own instance id, UUID and log sequence number.
4444

45-
.. image:: mm-3m-mesh.svg
45+
.. image:: /concepts/replication/images/mm-3m-mesh.svg
4646
:align: center
4747

4848
The request was issued at master #1, and the reply includes statistics for the
@@ -78,6 +78,5 @@ The primary indicators of replication health are:
7878

7979
For better understanding, see the following diagram illustrating the ``upstream`` and ``downstream`` connections within the replica set of three instances:
8080

81-
.. image:: replication.svg
81+
.. image:: /concepts/replication/images/replication.svg
8282
:align: left
83-

doc/concepts/index.rst

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,11 +82,6 @@ Several Tarantool instances can be organized in a replica set.
8282
They communicate and transfer data via the :ref:`iproto <box_protocol-iproto_protocol>` binary protocol.
8383
Learn more about Tarantool's :ref:`replication architecture <replication-architecture>`.
8484

85-
As there are usually more database reads than writes, a popular scenario is to have
86-
one writable "master" node in the database and several read-only "replicas".
87-
If the master goes down, one of the replicas becomes the new master.
88-
This is called a failover.
89-
9085
By default, replication in Tarantool is asynchronous.
9186
A transaction committed locally on the master node
9287
may not get replicated onto other instances before the client receives a success response.

doc/concepts/replication/index.rst

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,5 @@ This chapter includes the following sections:
1818
:numbered: 0
1919

2020
repl_architecture
21-
repl_bootstrap
22-
repl_add_instances
23-
repl_remove_instances
24-
repl_monitoring
25-
repl_recover
26-
repl_reseed
27-
repl_problem_solving
2821
repl_sync
2922
repl_leader_elect

doc/concepts/replication/repl_architecture.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ role is "read_only" (replica) for all but one instance in the replica set.
152152
In a master-replica configuration, every change that happens on the master will
153153
be visible on the replicas, but not vice versa.
154154

155-
.. image:: mr-1m-2r-oneway.svg
155+
.. image:: images/mr-1m-2r-oneway.svg
156156
:align: center
157157

158158
A simple two-instance replica set with the master on one machine and the replica
@@ -166,7 +166,7 @@ on a different machine provides two benefits:
166166
In a **master-master** configuration (also called "multi-master"), every change
167167
that happens on either instance will be visible on the other one.
168168

169-
.. image:: mm-3m-mesh.svg
169+
.. image:: images/mm-3m-mesh.svg
170170
:align: center
171171

172172
The failover benefit in this case is still present, and the load-balancing
@@ -211,7 +211,7 @@ makes potential failover easy.
211211
Some database products offer **cascading replication** topologies: creating a
212212
replica on a replica. Tarantool does not recommend such setup.
213213

214-
.. image:: no-cascade.svg
214+
.. image:: images/no-cascade.svg
215215
:align: center
216216

217217
The problem with a cascading replica set is that some instances have no
@@ -221,14 +221,14 @@ is an entry in ``box.space._cluster`` system space with the replica set UUID.
221221
Without knowing the replica set UUID, a master refuses to accept connections from
222222
such instances when replication topology changes. Here is how this can happen:
223223

224-
.. image:: cascade-problem-1.svg
224+
.. image:: images/cascade-problem-1.svg
225225
:align: center
226226

227227
We have a chain of three instances. Instance #1 contains entries for instances
228228
#1 and #2 in its ``_cluster`` space. Instances #2 and #3 contain entries for
229229
instances #1, #2 and #3 in their ``_cluster`` spaces.
230230

231-
.. image:: cascade-problem-2.svg
231+
.. image:: images/cascade-problem-2.svg
232232
:align: center
233233

234234
Now instance #2 is faulty. Instance #3 tries connecting to instance #1 as its
@@ -237,7 +237,7 @@ instance #3.
237237

238238
**Ring replication** topology is, however, supported:
239239

240-
.. image:: cascade-to-ring.svg
240+
.. image:: images/cascade-to-ring.svg
241241
:align: center
242242

243243
So, if you need a cascading topology, you may first create a ring to ensure all
@@ -247,7 +247,7 @@ desire.
247247
A stock recommendation for a master-master replication topology, however, is a
248248
**full mesh**:
249249

250-
.. image:: mm-3m-mesh.svg
250+
.. image:: images/mm-3m-mesh.svg
251251
:align: center
252252

253253
You then can decide where to locate instances of the mesh -- within the same

doc/how-to/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,6 @@ If you are new to Tarantool, please see our
2222
db/index
2323
vshard_quick
2424
app/index
25+
replication/index
2526
sql/index
2627
other/index

doc/how-to/replication/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. _how-to-replication:
2+
3+
Replication
4+
===========
5+
6+
.. toctree::
7+
8+
repl_bootstrap
9+
repl_add_instances
10+
repl_remove_instances
11+

doc/concepts/replication/repl_remove_instances.rst renamed to doc/how-to/replication/repl_remove_instances.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Removing instances
77
Let's assume that we have the following configured replica set with 3 instances
88
(*instance1*, *instance2* and *intance3*) and we want to remove *instance2*.
99

10-
.. image:: replication.svg
10+
.. image:: /concepts/replication/images/replication.svg
1111
:align: left
1212

1313
To remove it politely, follow these steps:
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
2+
msgid "Monitoring a replica set"
3+
msgstr "Мониторинг набора реплик"
4+
5+
msgid ""
6+
"To learn what instances belong in the replica set, and obtain statistics for"
7+
" all these instances, issue a "
8+
":doc:`/reference/reference_lua/box_info/replication` request:"
9+
msgstr ""
10+
"Чтобы узнать, какие экземпляры входят в набор реплик и получить статистику "
11+
"по всем этим экземплярам, передайте запрос "
12+
":doc:`/reference/reference_lua/box_info/replication`:"
13+
14+
msgid ""
15+
"tarantool> box.info.replication\n"
16+
"---\n"
17+
" replication:\n"
18+
" 1:\n"
19+
" id: 1\n"
20+
" uuid: b8a7db60-745f-41b3-bf68-5fcce7a1e019\n"
21+
" lsn: 88\n"
22+
" 2:\n"
23+
" id: 2\n"
24+
" uuid: cd3c7da2-a638-4c5d-ae63-e7767c3a6896\n"
25+
" lsn: 31\n"
26+
" upstream:\n"
27+
" status: follow\n"
28+
" idle: 43.187747001648\n"
29+
" peer: [email protected]:3301\n"
30+
" lag: 0\n"
31+
" downstream:\n"
32+
" vclock: {1: 31}\n"
33+
" 3:\n"
34+
" id: 3\n"
35+
" uuid: e38ef895-5804-43b9-81ac-9f2cd872b9c4\n"
36+
" lsn: 54\n"
37+
" upstream:\n"
38+
" status: follow\n"
39+
" idle: 43.187621831894\n"
40+
" peer: [email protected]:3301\n"
41+
" lag: 2\n"
42+
" downstream:\n"
43+
" vclock: {1: 54}\n"
44+
"..."
45+
msgstr ""
46+
"tarantool> box.info.replication\n"
47+
"---\n"
48+
" replication:\n"
49+
" 1:\n"
50+
" id: 1\n"
51+
" uuid: b8a7db60-745f-41b3-bf68-5fcce7a1e019\n"
52+
" lsn: 88\n"
53+
" 2:\n"
54+
" id: 2\n"
55+
" uuid: cd3c7da2-a638-4c5d-ae63-e7767c3a6896\n"
56+
" lsn: 31\n"
57+
" upstream:\n"
58+
" status: follow\n"
59+
" idle: 43.187747001648\n"
60+
" peer: [email protected]:3301\n"
61+
" lag: 0\n"
62+
" downstream:\n"
63+
" vclock: {1: 31}\n"
64+
" 3:\n"
65+
" id: 3\n"
66+
" uuid: e38ef895-5804-43b9-81ac-9f2cd872b9c4\n"
67+
" lsn: 54\n"
68+
" upstream:\n"
69+
" status: follow\n"
70+
" idle: 43.187621831894\n"
71+
" peer: [email protected]:3301\n"
72+
" lag: 2\n"
73+
" downstream:\n"
74+
" vclock: {1: 54}\n"
75+
"..."
76+
77+
msgid ""
78+
"This report is for a master-master replica set of three instances, each "
79+
"having its own instance id, UUID and log sequence number."
80+
msgstr ""
81+
"Данный отчет сгенерирован для набора реплик из трех экземпляров с "
82+
"конфигурацией мастер-мастер, у каждого из которых есть свой собственный ID "
83+
"экземпляра, UUID и номер записи в журнале."
84+
85+
msgid ""
86+
"The request was issued at master #1, and the reply includes statistics for "
87+
"the other two masters, given in regard to master #1."
88+
msgstr ""
89+
"Запрос был выполнен с мастера №1, и ответ включает в себя статистику по двум"
90+
" другим мастерам относительно мастера №1."
91+
92+
msgid "The primary indicators of replication health are:"
93+
msgstr "Основные индикаторы работоспособности репликации:"
94+
95+
msgid ""
96+
":ref:`idle <box_info_replication_upstream_idle>`, the time (in seconds) "
97+
"since the instance received the last event from a master."
98+
msgstr ""
99+
":ref:`бездействие <box_info_replication_upstream_idle>`, время (в секундах) "
100+
"с момента получения последнего события от мастера."
101+
102+
#, fuzzy
103+
msgid ""
104+
"If the master has no updates to send to the replicas, it sends heartbeat "
105+
"messages every :ref:`replication_timeout <cfg_replication-"
106+
"replication_timeout>` seconds. The master is programmed to disconnect if it "
107+
"does not see acknowledgments of the heartbeat messages within "
108+
"``replication_timeout`` * 4 seconds."
109+
msgstr ""
110+
"Если на мастере нет новых данных, требующих репликации, он отправляет на "
111+
"реплики сообщения контрольного сигнала каждые :ref:`replication_timeout "
112+
"<cfg_replication-replication_timeout>` секунд. Мастер запрограммирован на "
113+
"отключение, если он не получает сообщения контрольного сигнала дольше "
114+
"``replication_timeout`` * 4 секунд."
115+
116+
msgid ""
117+
"Therefore, in a healthy replication setup, ``idle`` should never exceed "
118+
"``replication_timeout``: if it does, either the replication is lagging "
119+
"seriously behind, because the master is running ahead of the replica, or the"
120+
" network link between the instances is down."
121+
msgstr ""
122+
"Таким образом, в работоспособном состоянии значение ``idle`` никогда не "
123+
"должно превышать значение ``replication_timeout``: в противном случае, либо "
124+
"репликация сильно отстает, поскольку мастер опережает реплику, либо "
125+
"отсутствует сетевое подключение между экземплярами."
126+
127+
msgid ""
128+
":ref:`lag <box_info_replication_upstream_lag>`, the time difference between "
129+
"the local time at the instance, recorded when the event was received, and "
130+
"the local time at another master recorded when the event was written to the "
131+
":ref:`write ahead log <internals-wal>` on that master."
132+
msgstr ""
133+
":ref:`отставание <box_info_replication_upstream_lag>`, разница во времени "
134+
"между локальным временем на экземпляре, зарегистрированным при получении "
135+
"события, и локальное время на другом мастере, зарегистрированное при записи "
136+
"события в :ref:`журнал упреждающей записи <internals-wal>` на этом мастере."
137+
138+
msgid ""
139+
"Since the ``lag`` calculation uses the operating system clocks from two "
140+
"different machines, do not be surprised if it’s negative: a time drift may "
141+
"lead to the remote master clock being consistently behind the local "
142+
"instance's clock."
143+
msgstr ""
144+
"Поскольку при расчете ``отставания`` используются часы операционной системы "
145+
"с двух разных машин, не удивляйтесь, получив отрицательное число: смещение "
146+
"во времени может привести к постоянному запаздыванию времени на удаленном "
147+
"мастере относительно часов на локальном экземпляре."
148+
149+
msgid "For multi-master configurations, ``lag`` is the maximal lag."
150+
msgstr "Для многомастерной конфигурации это максимально возможное отставание."

0 commit comments

Comments
 (0)