-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Reduce variance of classification references evaluation #4609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
references/classification/train.py
Outdated
# See FIXME above | ||
num_processed_samples = utils.reduce_across_processes(num_processed_samples) | ||
if hasattr(data_loader.dataset, "__len__") and len(data_loader.dataset) != num_processed_samples: | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to only warn is rank == 0
to avoid having world_size
warnings? It's not really a problem when running with submitit, but it can be a bit too much when running with torchrun
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with warning only when rank == 0
.
@@ -277,6 +297,10 @@ def main(args): | |||
model_ema.load_state_dict(checkpoint["model_ema"]) | |||
|
|||
if args.test_only: | |||
# We disable the cudnn benchmarking because it can noticeably affect the accuracy | |||
torch.backends.cudnn.benchmark = False | |||
torch.backends.cudnn.deterministic = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately choose not to set torch.use_deterministic_algorithms(True)
here because:
- just removing the cudnn bencharmking is enough to get constant results, at least for resnet18 (maybe not for others)
- using
torch.use_deterministic_algorithms(True)
forces the user to set some env variables likeCUBLAS_WORKSPACE_CONFIG=:4096:8
, otherwise the script would crash. - Users can always set the new --use-deterministic-algorithms flag if they really want to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Nicolas.
I left minor comments which are non-blocking and purely optional.
Edit: Do you plan to bring similar updates on the rest of the references in separate PRs?
references/classification/train.py
Outdated
# See FIXME above | ||
num_processed_samples = utils.reduce_across_processes(num_processed_samples) | ||
if hasattr(data_loader.dataset, "__len__") and len(data_loader.dataset) != num_processed_samples: | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with warning only when rank == 0
.
Thanks for the review!
Yes, I just wanted to test the waters with this one first :) |
Reviewed By: fmassa Differential Revision: D31649961 fbshipit-source-id: 34d72840adcc84a50d8b96c2cf4776e5450343e8
Closes #4559
Closes #4600
This is a follow up to #4559 (comment). This PR does a few things:
--use-deterministic-algorithms
flag to the scriptslen(dataset)
.Out of these, I think only the first one is really important. I don't feel strongly about the rest, LMK what you think.
cc @datumbox