Skip to content

Randomness in reference scripts with --test-only #4587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NicolasHug opened this issue Oct 11, 2021 · 4 comments
Closed

Randomness in reference scripts with --test-only #4587

NicolasHug opened this issue Oct 11, 2021 · 4 comments

Comments

@NicolasHug
Copy link
Member

While running some experiments related to #4559, I tried the following:

diff --git a/references/classification/train.py b/references/classification/train.py
index a71d337a..1a429801 100644
--- a/references/classification/train.py
+++ b/references/classification/train.py
@@ -147,7 +147,7 @@ def load_data(traindir, valdir, args):
     print("Creating data loaders")
     if args.distributed:
         train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
-        test_sampler = torch.utils.data.distributed.DistributedSampler(dataset_test)
+        test_sampler = torch.utils.data.distributed.DistributedSampler(dataset_test, shuffle=False)
     else:
         train_sampler = torch.utils.data.RandomSampler(dataset)
         test_sampler = torch.utils.data.SequentialSampler(dataset_test)
torchrun --nproc_per_node=1 references/classification/train.py --model resnet18 --test-only --pretrained

I was hoping to get reproducible results across executions, i.e. always the same accuracy, but it seems like I still get a bit of variations across a few runs:

Test:  Acc@1 69.766 Acc@5 89.068
Test:  Acc@1 69.764 Acc@5 89.072
Test:  Acc@1 69.764 Acc@5 89.072
Test:  Acc@1 69.766 Acc@5 89.068

Does anyone know where this randomness might come from?

@datumbox
Copy link
Contributor

datumbox commented Oct 11, 2021

Have you tried setting batch_size to 1 to see if it's related to the padding discussed at #4559? The alternative faster way to confirm this might be to drop_last=True cause testing the whole imagenet val one record at a time might be too slow.

@NicolasHug
Copy link
Member Author

NicolasHug commented Oct 11, 2021

ah, yes, thanks.

With batch_size=1 I get the same results

Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072
Test:  Acc@1 69.758 Acc@5 89.072

Which confirms that we can't get exact results unless we have len(dataset) % (batch_size * world_size) == 0.

Just setting the number of GPUs to 1 isn't enough.

@NicolasHug
Copy link
Member Author

NicolasHug commented Oct 11, 2021

BTW looks like the actual accuracy of resnet18 is a fair bit higher than what we report on the docs: 69.494 | 88.882

EDIT: I was looking at an old doc version, the difference isn't that big (69.758 | 89.078)

@fmassa
Copy link
Member

fmassa commented Oct 11, 2021

Just setting the number of GPUs to 1 isn't enough.

Yep, looks like I was wrong here, my bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants