You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There can be a bit of variance in the model evaluation, due to different things (see #4559, although the timeline can be a bit confusing to follow because I was relying on incorrect assumptions).
We addressed it in #4609 for the classification reference. We should try doing the same for the rest of the references (detection, segmentation, similarity, video_classification):
remove the cudnn auto benchmarking when test-only is True.
set shuffle=False for the test_dataloader
Add a --use-deterministic-algorithms flag to the scripts
Add a warning when the number of processed samples in the validation is different from len(dataset) (this one might not be relevant for the detection scripts)
Tackling this issue requires access to at least 1 GPU to make sure the new evaluation scores are similar and more stable than the previous ones.
There can be a bit of variance in the model evaluation, due to different things (see #4559, although the timeline can be a bit confusing to follow because I was relying on incorrect assumptions).
We addressed it in #4609 for the classification reference. We should try doing the same for the rest of the references (detection, segmentation, similarity, video_classification):
--use-deterministic-algorithms
flag to the scriptslen(dataset)
(this one might not be relevant for the detection scripts)Tackling this issue requires access to at least 1 GPU to make sure the new evaluation scores are similar and more stable than the previous ones.
cc @datumbox
The text was updated successfully, but these errors were encountered: