Skip to content

REP-6088 Tolerate high numbers of mismatches #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

FGasper
Copy link
Collaborator

@FGasper FGasper commented Jun 2, 2025

Previously all document mismatches were recorded directly in the verification task. This meant, though, that if a task encompassed a large number of mismatched or missing documents, the verifier could fail to persist all of the mismatches, which caused a crash.

(The usual cause of excess mismatched/missing documents is starting migration-verifier before initial sync finishes, but it can also reasonably happen without REP-6129’s fix for queries against pre-v5 servers. See HELP-75910.)

This changeset makes the verifier save mismatches to a dedicated collection instead, one document per mismatch.

This change upends some familiar workflows for investigating mismatches: it’s no longer sufficient just to query the verification_tasks collection for mismatch information since the actual mismatches are recorded in a separate collection. To address this, the documentation now gives an aggregation pipeline that yields a similarly-useful result.

This entails a metadata version change. Because that’s happening, this also changes the task type verify to verifyDocuments. (That required some sorting workarounds in tests, which were tight-coupled to the task type strings.)

@FGasper FGasper requested review from khodakovski and tdq45gj June 3, 2025 09:55
@FGasper FGasper marked this pull request as ready for review June 3, 2025 09:56
Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. I've left some comments on small things.

return errors.Wrapf(err, "starting session")
}

sctx := mongo.NewSessionContext(ctx, sess)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we reading in a session just to get the cluster time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That, per the driver team, is the approved way to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment to explain the purpose of a session here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@FGasper FGasper requested a review from tdq45gj June 3, 2025 14:35
Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Collaborator

@khodakovski khodakovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@FGasper FGasper merged commit d7b456e into mongodb-labs:main Jun 4, 2025
50 checks passed
@FGasper FGasper deleted the REP-6088-tolerate-excess-mismatches branch June 4, 2025 20:10
FGasper added a commit that referenced this pull request Jun 23, 2025
- Mismatches were previously shown in an indeterminate order. Now they’re
  consistently sorted by the mismatched documents’ `_id`.
- Documents with missing fields were being logged as entirely missing.
  That logic is corrected here.
- The logic to create the table of missing/changed documents previously
  iterated through the _ids persisted in the task rather than the actual
  missing/changed documents. This was appropriate when that list stored
  mismatches but is no longer correct since the list now always stores
  the list of documents to check in the task. Thus, if there were only a
  handful of missing documents in a recheck task that contained thousands
  of document IDs, all of that task’s document IDs would be logged as
  missing. This was an oversight from PR #117, which should have updated
  the logic to build that table as it migrated that for the
  mismatched-documents table. This changeset does the necessary update.
FGasper added a commit that referenced this pull request Jun 23, 2025
- Mismatches were previously shown in an indeterminate order. Now they’re
  consistently sorted by the mismatched documents’ `_id`.
- Documents with missing fields were being logged as entirely missing.
  That logic is corrected here.
- The logic to create the table of missing/changed documents previously
  iterated through the _ids persisted in the task rather than the actual
  missing/changed documents. This was appropriate when that list stored
  mismatches but is no longer correct since the list now always stores
  the list of documents to check in the task. Thus, if there were only a
  handful of missing documents in a recheck task that contained thousands
  of document IDs, all of that task’s document IDs would be logged as
  missing. This was an oversight from PR #117, which should have updated
  the logic to build that table as it migrated that for the
  mismatched-documents table. This changeset does the necessary update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants