-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
There is a subtle known bug in the evaluation code of the classification references (and other references as well, but not all):
vision/references/classification/train.py
Lines 65 to 66 in 261cbf7
# FIXME need to take into account that the datasets | |
# could have been padded in distributed setup |
It deserves some attention, because it's easy to miss and yet can impact our reported results, and those of research papers.
As the comment above describes, when computing the accuracy of the model on a validation set in a distributed setting, some images will be counted more than once if len(dataset)
isn't divisible by batch_size * world_size
1.
On top of that, since the test_sampler
uses shuffle=True
by default, the duplicated images aren't even the same across executions, which means that evaluating the same model on the same dataset can lead to different results every time.
Should we try to fix this, or should we just leave it and wait for the new lightning recipes to handle it? And as a follow-up question, is there a builtin way in lightning to mitigate this at all? (I'm not familiar with lightning, so this one may not make sense.)
cc @datumbox
Footnotes
-
For example if we have 10 images and 2 workers with a batch_size of 3, we will have something like:
↩worker1: img1, img2, img3 worker2: img4, img5, img6 worker1: img7, img8, img9 worker2: img10, **img1, img2** ^^^^^^^^^ "padding": duplicated images which will affect the validation accuracy