Supervised failover #4265

andreyaksenov · 2024-06-05T13:49:07Z

Created a sample app with the supervised failover: supervised_failover.
Created a new Supervised failover topic.
Updated the Configuration reference:
- Added a link to a new topic in the replication.failover: supervised description.
- Added a new failover section.
- Mentioned one more use case for config.etcd.*.
Added the --failover option to the tarantool CLI reference. Note that this reference is moved from the existing Starting and stopping instances topic.
Added the tt cluster failover command reference:
- A new failover subsection in the tt cluster command reference.
- New timeout and wait options (applicable to tt cluster failover): Options

doc/concepts/replication/supervised_failover.rst

Totktonada

Thank you for the links in the pull request description. It is really convenient to look over the parts that actually needs a review.

I highlighted several points in the failover algorithm overview. I think it worth to discuss f2f how to better describe it in a consice way.

At least, I see that we can discuss the following points:

How to better divide it into subsections.
Should we divide the service into more-or-less simple blocks and describe them separately or describe how the service works at whole?
Should we highlight which actions are performed by which actor (coordinator, instance) and when exactly (on appoint request, on switching to active mode and so on).

I would also like to provide some in-depth materials like ones that are placed in https://github.com/tarantool/enterprise_doc/issues/253 (plus pictures from my presentations). I open a pull request with some drafts, but I likely need your assistance to make it ready for the website.

doc/concepts/replication/supervised_failover.rst

Totktonada · 2024-06-19T17:34:24Z

doc/reference/tooling/tt_cli/cluster.rst

+
+.. code-block:: console
+
+    $ tt cluster failover switch URI INSTANCE_NAME [OPTION ...]


Maybe it worth to mention -w here explicitly, because this option is likely often used together with this command.

Totktonada

Thank you for the concise description and the reference!

p7nov

Checked the reference part. Looks good, just a couple comments.

p7nov · 2024-06-21T04:54:58Z

doc/reference/tarantool_cli_options.rst

+.. _tarantool_cli:
+.. _configuration_command_options:
+
+tarantool command-line options


There is no introduction/definition of the tarantool executable.
I'd add a sentence (possibily a note) explaining that tarantool is an executable for running a single instance and there is tt that covers much more scenarios.

p7nov · 2024-06-21T05:00:43Z

doc/reference/tooling/tt_cli/cluster.rst

+``tt cluster failover switch`` appoints the specified instance to be a master.
+This command accepts the following arguments and options:
+
+-   ``URI``: A :ref:`URI <tt-cluster-uri>` of the cluster configuration storage.


I found a possible issue that applies to the whole page.
In the tt cluster reference, the URI arguments refers to the config storage URI, while in the rest of the tt reference it's the instance URI. This can confuse readers. Perhaps we should change the argument name on this page to CONFIG_URI or smth like that.

Renamed the URI argument to CONFIG_URI

p7nov · 2024-06-21T05:02:42Z

doc/reference/tooling/tt_cli/cluster.rst

+.. code-block:: console
+
+    $ tt cluster failover switch http://localhost:2379/myapp storage-a-002
+    To check the switching status, run:


Looks like the output needs an explanation. At first, I decided that it's a part of the switch-status reference pasted here by mistake.

p7nov · 2024-06-21T05:03:46Z

doc/reference/tooling/tt_cli/cluster.rst

+This command accepts the following arguments:
+
+-   ``URI``: A :ref:`URI <tt-cluster-uri>` of the cluster configuration storage.
+-   ``TASK_ID``: An identifier of the task used to switch a master instance.


It's never introduced. Let's explain it here or above (see the comment on failover switch example).

p7nov · 2024-06-21T05:04:14Z

doc/reference/tooling/tt_cli/cluster.rst

+
+    $ tt cluster failover switch-status URI TASK_ID
+
+``tt cluster failover switch-status`` shows the status of switching a master instance.


What statuses do we have?

p7nov · 2024-06-21T05:05:24Z

doc/reference/tooling/tt_cli/cluster.rst

@@ -378,12 +447,24 @@ Options

    Skip validation when publishing. Default: `false` (validation is enabled).

+..  option:: --t, --timeout UINT


Suggested change

.. option:: --t, --timeout UINT

.. option:: -t, --timeout UINT

-t with one hyphen?

xuniq

Looks good to me!

andreyaksenov force-pushed the supervised-failover branch 2 times, most recently from a0b0108 to a13464d Compare June 10, 2024 13:51

andreyaksenov linked an issue Jun 10, 2024 that may be closed by this pull request

config: failover mode and election mode consistency #3893

Closed

andreyaksenov force-pushed the supervised-failover branch 5 times, most recently from ab931e5 to 352dd4f Compare June 14, 2024 08:42

andreyaksenov linked an issue Jun 14, 2024 that may be closed by this pull request

tt cluster failover commands #4249

Closed

andreyaksenov force-pushed the supervised-failover branch 12 times, most recently from 6d9bc82 to 18283bb Compare June 18, 2024 13:52

andreyaksenov marked this pull request as ready for review June 18, 2024 13:57

andreyaksenov force-pushed the supervised-failover branch 2 times, most recently from ea4ef0d to 06cdab2 Compare June 18, 2024 14:04

andreyaksenov removed a link to an issue Jun 18, 2024

config: failover mode and election mode consistency #3893

Closed

Totktonada reviewed Jun 18, 2024

View reviewed changes

doc/concepts/replication/supervised_failover.rst Outdated Show resolved Hide resolved

andreyaksenov added 3 commits June 19, 2024 17:24

Supervised failover

2997680

Supervised failover: reference

b097b94

Supervised failover: typo

093b0f1

andreyaksenov force-pushed the supervised-failover branch from c9de62e to 093b0f1 Compare June 19, 2024 14:24

Totktonada reviewed Jun 19, 2024

View reviewed changes

andreyaksenov force-pushed the supervised-failover branch 6 times, most recently from ff4dc5a to 71f58ec Compare June 20, 2024 13:10

Supervised failover: update per DEV review

53cdd53

andreyaksenov force-pushed the supervised-failover branch 4 times, most recently from a8e7524 to a69199e Compare June 20, 2024 15:13

Supervised failover: update per DEV review 2

e620bcc

andreyaksenov force-pushed the supervised-failover branch from a69199e to e620bcc Compare June 20, 2024 15:15

Totktonada approved these changes Jun 20, 2024

View reviewed changes

Supervised failover: typo 2

c6870ee

xuniq self-requested a review June 20, 2024 17:18

p7nov approved these changes Jun 21, 2024

View reviewed changes

Supervised failover: update per TW review 1

32dbcd4

andreyaksenov force-pushed the supervised-failover branch from 5319934 to 32dbcd4 Compare June 21, 2024 08:56

Supervised failover: update per TW review 1.1

7165c23

xuniq approved these changes Jun 21, 2024

View reviewed changes

andreyaksenov merged commit 8ff1dd9 into latest Jun 21, 2024
1 check passed

andreyaksenov deleted the supervised-failover branch June 21, 2024 11:57


		.. code-block:: console

		$ tt cluster failover switch URI INSTANCE_NAME [OPTION ...]


		$ tt cluster failover switch-status URI TASK_ID

		``tt cluster failover switch-status`` shows the status of switching a master instance.

		@@ -378,12 +447,24 @@ Options

		Skip validation when publishing. Default: `false` (validation is enabled).

		.. option:: --t, --timeout UINT

	.. option:: --t, --timeout UINT
	.. option:: -t, --timeout UINT

Supervised failover #4265

Supervised failover #4265

Uh oh!

Conversation

andreyaksenov commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Totktonada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Totktonada left a comment

Choose a reason for hiding this comment

Uh oh!

p7nov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuniq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andreyaksenov commented Jun 5, 2024 •

edited

Loading