Skip to content

Reduce variance of model evaluation in references #4730

Closed
@NicolasHug

Description

@NicolasHug

There can be a bit of variance in the model evaluation, due to different things (see #4559, although the timeline can be a bit confusing to follow because I was relying on incorrect assumptions).

We addressed it in #4609 for the classification reference. We should try doing the same for the rest of the references (detection, segmentation, similarity, video_classification):

  • remove the cudnn auto benchmarking when test-only is True.
  • set shuffle=False for the test_dataloader
  • Add a --use-deterministic-algorithms flag to the scripts
  • Add a warning when the number of processed samples in the validation is different from len(dataset) (this one might not be relevant for the detection scripts)

Tackling this issue requires access to at least 1 GPU to make sure the new evaluation scores are similar and more stable than the previous ones.

cc @datumbox

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions