Skip to content

Database hot standby #4436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
hot_standby: true
wal:
dir: /tmp/wals
snapshot:
dir: /tmp/snapshots
iproto:
listen:
- uri: '127.0.0.1:3301'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
instance001:
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
groups:
group001:
replicasets:
replicaset001:
instances:
instance001:
database:
hot_standby: true
wal:
dir: /tmp/wals
snapshot:
dir: /tmp/snapshots
iproto:
listen:
- uri: '127.0.0.1:3301'
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
instance001:
34 changes: 34 additions & 0 deletions doc/reference/configuration/configuration_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1428,6 +1428,40 @@ The ``database`` section defines database-specific configuration parameters, suc

.. confval:: database.hot_standby

Whether to start the server in the hot standby mode.
This mode can be used to provide failover without :ref:`replication <replication>`.

Suppose there are two cluster applications.
Each cluster has one instance with the same configuration:
Copy link
Contributor

@p7nov p7nov Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't guess from this example the correct way to use this mode in multi-instance apps. Set it for all instances of two identical apps? Or maybe select a subset of instances to run in standby?
Not sure though if the deeper explanation is needed in the reference. Perhaps somewhere in a task-oriented page in the admin's guide (out of this PR's scope).

Copy link
Contributor Author

@andreyaksenov andreyaksenov Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't guess from this example the correct way to use this mode in multi-instance apps.

The correct way is using two cluster applications with each cluster having one instance. And the GitHub examples at the end should help a reader. I agree that we can document this feature better. But, honestly, I don't think it is a very popular use case for Tarantool (maybe, I'm wrong).

Set it for all instances of two identical apps? Or maybe select a subset of instances to run in standby?

This sounds like a feature request for a new configuration approach. For example, we can allow a user to add two instances, set replication.failover to smth like hot_standby, and provide the same path to WALs/snapshots for both instances. But still not sure that this scenario might be popular. cc @Totktonada


.. literalinclude:: /code_snippets/snippets/config/instances.enabled/hot_standby_1/config.yaml
:language: yaml
:dedent:

In particular, both instances use the same directory for storing write-ahead logs and snapshots.

When you start both cluster applications on the same machine, the instance from the first one will be the primary instance and the second will be the standby instance.
In the :ref:`logs <configuration_reference_log>` of the second cluster instance, you should see a notification:

.. code-block:: text

main/104/interactive I> Entering hot standby mode

This means that the standby instance is ready to take over if the primary instance goes down.
The standby instance initializes and tries to take a lock on a directory for storing write-ahead logs
but fails because the primary instance has made a lock on this directory.

If the primary instance goes down for any reason, the lock is released.
In this case, the standby instance succeeds in taking the lock and becomes the primary instance.

``database.hot_standby`` has no effect:

* If :ref:`wal.mode <configuration_reference_wal_mode>` is set to ``none``.
* If :ref:`wal.dir_rescan_delay <configuration_reference_wal_dir_rescan_delay>` is set to a large value on macOS or FreeBSD. On these platforms, the hot standby mode is designed so that the loop repeats every ``wal.dir_rescan_delay`` seconds.
* If spaces are created with :ref:`engine <space_opts_engine>` set to ``vinyl``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old wording was clearer:
for spaces created with engine = ‘vinyl’

Now it's ambiguous:

  • if there are vinyl spaces?
  • if all spaces are vinyl?
  • has no effect only on vinyl spaces?


Examples on GitHub: `hot_standby_1 <https://github.com/tarantool/doc/tree/latest/doc/code_snippets/snippets/config/instances.enabled/hot_standby_1>`_, `hot_standby_2 <https://github.com/tarantool/doc/tree/latest/doc/code_snippets/snippets/config/instances.enabled/hot_standby_2>`_

|
| Type: boolean
| Default: false
Expand Down
Loading