Skip to content

Fixed width multiplier #1005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 2, 2019
Merged

Fixed width multiplier #1005

merged 3 commits into from
Jul 2, 2019

Conversation

yaysummeriscoming
Copy link
Contributor

Layer channels are now rounded to a multiple of 8, as per the official tensorflow implementation. I found this fix when looking through: https://github.com/d-li14/mobilenetv2.pytorch

Fixes #973

Layer channels are now rounded to a multiple of 8, as per the official tensorflow implementation.  I found this fix when looking through: https://github.com/d-li14/mobilenetv2.pytorch
@codecov-io
Copy link

codecov-io commented Jun 8, 2019

Codecov Report

Merging #1005 into master will increase coverage by 1.2%.
The diff coverage is 72.72%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1005     +/-   ##
=========================================
+ Coverage   62.63%   63.83%   +1.2%     
=========================================
  Files          65       66      +1     
  Lines        5101     5301    +200     
  Branches      765      800     +35     
=========================================
+ Hits         3195     3384    +189     
- Misses       1683     1684      +1     
- Partials      223      233     +10
Impacted Files Coverage Δ
torchvision/models/mobilenet.py 86.66% <72.72%> (-3.04%) ⬇️
torchvision/datasets/mnist.py 51.18% <0%> (-12.06%) ⬇️
torchvision/transforms/transforms.py 81.76% <0%> (-0.14%) ⬇️
torchvision/datasets/coco.py 29.26% <0%> (ø) ⬆️
torchvision/datasets/voc.py 22.1% <0%> (ø) ⬆️
torchvision/models/__init__.py 100% <0%> (ø) ⬆️
torchvision/models/mnasnet.py 82.71% <0%> (ø)
torchvision/models/detection/roi_heads.py 56.89% <0%> (+0.24%) ⬆️
torchvision/models/resnet.py 88.27% <0%> (+0.45%) ⬆️
torchvision/datasets/cityscapes.py 20% <0%> (+0.55%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a64dbc...9b6b1bb. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

I think that we should let the rounding be parametrized by the user.

Another thought (but I'm not 100% sure about this): maybe we should keep as default value divisor = 1 for backwards-compatibility?

Thoughts?

@@ -74,12 +94,12 @@ def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=N
"or a 4-element list, got {}".format(inverted_residual_setting))

# building first layer
input_channel = int(input_channel * width_mult)
self.last_channel = int(last_channel * max(1.0, width_mult))
input_channel = _make_divisible(input_channel * width_mult, 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the 8 as a parameter, potentially letting the user choose different values?
I believe the only reason they used 8 here was because the underlying kernels might be optimized for multiples of 8

@yaysummeriscoming
Copy link
Contributor Author

Good point on the divisor, I'll make that an option.

Regarding the default value, I would suggest to keep it at 8? Many people including myself are using this for benchmarking new algorithms, for which it's necessary to have this implementation 100% as per the official tensorflow implementation. Only problem with this is pretrained models. I guess I can convert these from the official tensorflow implementation, otherwise I have a training recipe that should provide slightly better results.

@fmassa
Copy link
Member

fmassa commented Jun 11, 2019

@yaysummeriscoming for pre-trained models, can we use the scripts from references/classification? I used those scripts to training those models, and I should share the command-line arguments that I used to train mobilenet

@yaysummeriscoming
Copy link
Contributor Author

@fmassa you were able to get up to par mobilenet results using that script, inc stepped LR? If so I'm quite envious, I had to use some tricks to get up to scratch results. I still think there's an argument to use the tensorflow weights however? I forget where but there was some commotion about keras & pytorch models having slightly different accuracy levels.

In any case, pls share the command line arguments :)

@fmassa
Copy link
Member

fmassa commented Jun 11, 2019

Yes, the model that I released uses that script.
Here are the arguments I used:

--model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004 --lr-step-size 1 --lr-gamma 0.98

With the following results

Top 1 Top 5
71.878 90.286

And I'd like to use as much as possible weights trained in PyTorch. The reason being that reproducibility is very important, and importing weights from a different framework means that reproducing that model is not straightforward.

@Mxbonn Mxbonn mentioned this pull request Jun 19, 2019
@fmassa
Copy link
Member

fmassa commented Jun 19, 2019

@yaysummeriscoming could you expose the 8 as a parameter to be passed? We can discuss about which default it should have, but I think it is important to expose it (at least for some backwards-compatibility)

The official tensorflow slim mobilenet v2 implementation rounds the number of channels in each layer to a multiple of 8.  This is now user configurable - 1 turns off rounding
@yaysummeriscoming
Copy link
Contributor Author

Apologies for the delay, just put the change in.

@fmassa
Copy link
Member

fmassa commented Jul 2, 2019

@yaysummeriscoming there is a linter error

./torchvision/models/mobilenet.py:152:1: W293 blank line contains whitespace

could you fix it?

Fixed error: ./torchvision/models/mobilenet.py:152:1: W293 blank line contains whitespace
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Just waiting for CI to finish

@fmassa fmassa merged commit 8350645 into pytorch:master Jul 2, 2019
@meijieru
Copy link

meijieru commented Jul 8, 2019

@fmassa Have you reproduce the result with width_multiplier 1.4 using that script?

@fmassa
Copy link
Member

fmassa commented Jul 9, 2019

@meijieru I haven't tried any trainings other than the width_multiplier=1.0 yet, and won't probably have time before August.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mobilenet v2 width multiplier incorrect
4 participants