Skip to content

Update the leader election article with the info on resolving the split-brain #2983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/book/replication/repl_leader_elect.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,20 @@ When the fencing is on, the leader resigns its leadership if it has less than th
of alive connections to the cluster nodes. The resigning leader receives the status of a follower in the current election term and becomes read-only.
Fencing applies to the instances that have the :ref:`election_mode <repl_leader_elect_config>` set to "candidate" or "manual".

.. _repl_leader_elect_splitbrain:

There can still be a situation when a replica set has two leaders working independently (so called *split-brain*).
It can happen, for example, if a user mistakenly lowered the :ref:`replication_synchro_quorum <repl_leader_elect_config>` below ``N / 2 + 1``.
In this situation, to preserve the data integrity, if an instance detects the split-brain anomaly in the incoming replication data,
it breaks the connection with the instance sending the data and writes the ``ER_SPLIT_BRAIN`` error in the log.

Eventually, there will be two sets of nodes with the diverged data,
and any node from one set is disconnected from any node from the other set with the ``ER_SPLIT_BRAIN`` error.

Once noticing the error, a user can choose any representative from each of the sets and inspect the data on them.
To correlate the data, the user should remove it from the nodes of one set, and reconnect them to the nodes from the other set that have the correct data.


Also, if election is enabled on the node, it won't replicate from any nodes except
the newest leader. This is done to avoid the issue when a new leader is elected,
but the old leader has somehow survived and tries to send more changes
Expand Down