Skip to content

Reduce variance of model evaluation in references #4730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NicolasHug opened this issue Oct 25, 2021 · 0 comments · Fixed by #5819
Closed

Reduce variance of model evaluation in references #4730

NicolasHug opened this issue Oct 25, 2021 · 0 comments · Fixed by #5819

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Oct 25, 2021

There can be a bit of variance in the model evaluation, due to different things (see #4559, although the timeline can be a bit confusing to follow because I was relying on incorrect assumptions).

We addressed it in #4609 for the classification reference. We should try doing the same for the rest of the references (detection, segmentation, similarity, video_classification):

  • remove the cudnn auto benchmarking when test-only is True.
  • set shuffle=False for the test_dataloader
  • Add a --use-deterministic-algorithms flag to the scripts
  • Add a warning when the number of processed samples in the validation is different from len(dataset) (this one might not be relevant for the detection scripts)

Tackling this issue requires access to at least 1 GPU to make sure the new evaluation scores are similar and more stable than the previous ones.

cc @datumbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants