Added Exponential Moving Average support to classification reference script #4381

prabhat00155 · 2021-09-07T22:34:50Z

Resolves #4346, resolves #4281

…script

datumbox

Thanks @prabhat00155, overall it looks good. I left a couple of comments for your consideration. Let me know what you think.

Also do you plan to investigate how we can avoid loading two models on the GPU?

references/classification/train.py

NicolasHug

Thanks @prabhat00155 , I made some minor comments / questions

Also, it'd be good to validate the new code somehow. We don't really have tests for the references but perhaps it would be relevant to report the result of a model relying on the --model_avg option to compare to the baseline?

references/classification/utils.py

prabhat00155 · 2021-09-08T09:30:05Z

Thanks @prabhat00155 , I made some minor comments / questions

Also, it'd be good to validate the new code somehow. We don't really have tests for the references but perhaps it would be relevant to report the result of a model relying on the --model_avg option to compare to the baseline?

Thanks @NicolasHug! I ran tests locally(on CPU) on a toy dataset and on aws cluster for 5 epochs. I will run it to completion to verify the results before merging this PR.

prabhat00155 · 2021-09-08T09:32:11Z

Thanks @prabhat00155, overall it looks good. I left a couple of comments for your consideration. Let me know what you think.

Also do you plan to investigate how we can avoid loading two models on the GPU?

Thanks @datumbox! Yes, I was thinking of doing that in a follow-up PR.

datumbox · 2021-09-08T09:54:51Z

@prabhat00155 sounds good to me.

The only thing worth addressing here is to pass the non parallel model in ema (see my earlier comment). I believe what you have now will fail to handle properly the checkpoints on a multi gpu setup. Worth confirming by doing two epochs on AWS and resuming from checkpoint. If it works you can leave as is.

Everything else are nits that can be done later on separate PRs.

datumbox

@prabhat00155 As we discussed offline, this is a great contribution so let's unblock the merge and push further investigations on follow up PRs.

prabhat00155 · 2021-09-09T15:56:54Z

@prabhat00155 As we discussed offline, this is a great contribution so let's unblock the merge and push further investigations on follow up PRs.

Makes sense, thanks!

…eference script (#4381) Summary: * Added Exponential Moving Average support to classification reference script * Addressed review comments * Updated model argument Reviewed By: kazhang Differential Revision: D30898332 fbshipit-source-id: 1c9aaa2b9b1e8773fce155063bfa4de32c4c1c1e

…65495) Summary: While implementing [EMA](pytorch/vision#4381 extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](pytorch/vision#4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: pytorch/vision#4406 (review) Pull Request resolved: #65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2

…ytorch#65495) Summary: While implementing [EMA](pytorch/vision#4381 extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](pytorch/vision#4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: pytorch/vision#4406 (review) Pull Request resolved: pytorch#65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2 (cherry picked from commit 2ea724b)

…65495) (#65755) * Added option to update parameters using state_dict in AveragedModel (#65495) Summary: While implementing [EMA](pytorch/vision#4381 extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](pytorch/vision#4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: pytorch/vision#4406 (review) Pull Request resolved: #65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2 (cherry picked from commit 2ea724b) * Added validation of mode parameter in AveragedModel (#65921) Summary: Discussion: #65495 (comment) Pull Request resolved: #65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3 (cherry picked from commit c7748fc)

Added Exponential Moving Average support to classification reference …

3943ca9

…script

facebook-github-bot added the cla signed label Sep 7, 2021

Merge branch 'master' into prabhat00155/ema_support

1d7ed26

datumbox self-requested a review September 7, 2021 22:36

prabhat00155 added the module: reference scripts label Sep 7, 2021

datumbox reviewed Sep 7, 2021

View reviewed changes

references/classification/train.py Show resolved Hide resolved

references/classification/train.py Outdated Show resolved Hide resolved

NicolasHug reviewed Sep 8, 2021

View reviewed changes

references/classification/utils.py Outdated Show resolved Hide resolved

NicolasHug reviewed Sep 8, 2021

View reviewed changes

references/classification/utils.py Outdated Show resolved Hide resolved

Addressed review comments

8ca6212

datumbox mentioned this pull request Sep 9, 2021

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed

16 tasks

datumbox approved these changes Sep 9, 2021

View reviewed changes

Updated model argument

7985619

Merge branch 'master' into prabhat00155/ema_support

9d6bec9

prabhat00155 merged commit 12fd3a6 into pytorch:main Sep 9, 2021

prabhat00155 deleted the prabhat00155/ema_support branch September 9, 2021 16:05

This was referenced Sep 9, 2021

Investigate if model_without_ddp is needed #4385

Closed

Investigate Exponential Moving Average result in classification script #4391

Closed

prabhat00155 mentioned this pull request Sep 22, 2021

Added option to update parameters using state_dict in AveragedModel pytorch/pytorch#65495

Closed

prabhat00155 mentioned this pull request Sep 28, 2021

Added option to update parameters using state_dict in AveragedModel (#65495) pytorch/pytorch#65755

Merged

prabhat00155 added the enhancement label Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Exponential Moving Average support to classification reference script #4381

Added Exponential Moving Average support to classification reference script #4381

prabhat00155 commented Sep 7, 2021 •

edited by datumbox

Loading

datumbox left a comment

NicolasHug left a comment •

edited

Loading

prabhat00155 commented Sep 8, 2021

prabhat00155 commented Sep 8, 2021

datumbox commented Sep 8, 2021

datumbox left a comment

prabhat00155 commented Sep 9, 2021

Added Exponential Moving Average support to classification reference script #4381

Added Exponential Moving Average support to classification reference script #4381

Conversation

prabhat00155 commented Sep 7, 2021 • edited by datumbox Loading

datumbox left a comment

Choose a reason for hiding this comment

NicolasHug left a comment • edited Loading

Choose a reason for hiding this comment

prabhat00155 commented Sep 8, 2021

prabhat00155 commented Sep 8, 2021

datumbox commented Sep 8, 2021

datumbox left a comment

Choose a reason for hiding this comment

prabhat00155 commented Sep 9, 2021

prabhat00155 commented Sep 7, 2021 •

edited by datumbox

Loading

NicolasHug left a comment •

edited

Loading