DISTMYSQL-466: RestartReplicationQuick called even from Orchestrator cluster where recovery has been globally disabled #51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://perconadev.atlassian.net/browse/DISTMYSQL-466
Problem:
When recovery is disabled globally, UnreachableMasterWithLaggingReplicas and UnreachableIntermediateMasterWithLaggingReplicas cases cause replica io thread to be restarted.
Cause:
Commit 98bd7f0 added the feature allowing global recovery disable
Commit fc33f3e moved the implementation to
executeCheckAndRecoverFunction
Commit b761fa3 introduced runEmergencyOperations() function. Its purpose was to read topology instance to speed up recovery. The instance was read, then recovery was skipped if disabled globally.
Commit 464a3c1 and 684d6e2 caused the regression. They introduced the call to emergentlyRestartReplicationOnTopologyInstance() from runEmergencyOperations().
openark/orchestrator#572 and openark/orchestrator#1005 provide the detailed explanation, why it was done.
Solution:
If recovery was disabled globally, and this is not forced discovery, skip restart of replicas.
Additionally fixed Instance object read from Orchestrator's backend DB. Such and object was missing QSP member (Query String Provider). As the consequence any query related to master/slave <-> source/replica could not be resolved and failed (because nil string query was executed)
Related issue: https://github.com/openark/orchestrator/issues/0123456789
Description
This PR [briefly explain what is does]