Randomness in reference scripts with --test-only #4587

NicolasHug · 2021-10-11T12:01:00Z

While running some experiments related to #4559, I tried the following:

diff --git a/references/classification/train.py b/references/classification/train.py
index a71d337a..1a429801 100644
--- a/references/classification/train.py
+++ b/references/classification/train.py
@@ -147,7 +147,7 @@ def load_data(traindir, valdir, args):
     print("Creating data loaders")
     if args.distributed:
         train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
-        test_sampler = torch.utils.data.distributed.DistributedSampler(dataset_test)
+        test_sampler = torch.utils.data.distributed.DistributedSampler(dataset_test, shuffle=False)
     else:
         train_sampler = torch.utils.data.RandomSampler(dataset)
         test_sampler = torch.utils.data.SequentialSampler(dataset_test)

torchrun --nproc_per_node=1 references/classification/train.py --model resnet18 --test-only --pretrained

I was hoping to get reproducible results across executions, i.e. always the same accuracy, but it seems like I still get a bit of variations across a few runs:

Test:  Acc@1 69.766 Acc@5 89.068
Test:  Acc@1 69.764 Acc@5 89.072
Test:  Acc@1 69.764 Acc@5 89.072
Test:  Acc@1 69.766 Acc@5 89.068

Does anyone know where this randomness might come from?

The text was updated successfully, but these errors were encountered:

datumbox · 2021-10-11T12:08:36Z

Have you tried setting batch_size to 1 to see if it's related to the padding discussed at #4559? The alternative faster way to confirm this might be to drop_last=True cause testing the whole imagenet val one record at a time might be too slow.

NicolasHug · 2021-10-11T12:31:48Z

ah, yes, thanks.

With batch_size=1 I get the same results

Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072

Which confirms that we can't get exact results unless we have len(dataset) % (batch_size * world_size) == 0.

Just setting the number of GPUs to 1 isn't enough.

NicolasHug · 2021-10-11T12:33:02Z

BTW looks like the actual accuracy of resnet18 is a fair bit higher than what we report on the docs: 69.494 | 88.882

EDIT: I was looking at an old doc version, the difference isn't that big (69.758 | 89.078)

fmassa · 2021-10-11T14:27:56Z

Just setting the number of GPUs to 1 isn't enough.

Yep, looks like I was wrong here, my bad!

NicolasHug mentioned this issue Oct 11, 2021

Evaluation code of references is slightly off #4559

Closed

NicolasHug closed this as completed Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness in reference scripts with --test-only #4587

Randomness in reference scripts with --test-only #4587

NicolasHug commented Oct 11, 2021

datumbox commented Oct 11, 2021 •

edited

Loading

NicolasHug commented Oct 11, 2021 •

edited

Loading

NicolasHug commented Oct 11, 2021 •

edited

Loading

fmassa commented Oct 11, 2021

Randomness in reference scripts with --test-only #4587

Randomness in reference scripts with --test-only #4587

Comments

NicolasHug commented Oct 11, 2021

datumbox commented Oct 11, 2021 • edited Loading

NicolasHug commented Oct 11, 2021 • edited Loading

NicolasHug commented Oct 11, 2021 • edited Loading

fmassa commented Oct 11, 2021

datumbox commented Oct 11, 2021 •

edited

Loading

NicolasHug commented Oct 11, 2021 •

edited

Loading

NicolasHug commented Oct 11, 2021 •

edited

Loading