-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fixed width multiplier #1005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed width multiplier #1005
Conversation
Layer channels are now rounded to a multiple of 8, as per the official tensorflow implementation. I found this fix when looking through: https://github.com/d-li14/mobilenetv2.pytorch
Codecov Report
@@ Coverage Diff @@
## master #1005 +/- ##
=========================================
+ Coverage 62.63% 63.83% +1.2%
=========================================
Files 65 66 +1
Lines 5101 5301 +200
Branches 765 800 +35
=========================================
+ Hits 3195 3384 +189
- Misses 1683 1684 +1
- Partials 223 233 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
I think that we should let the rounding be parametrized by the user.
Another thought (but I'm not 100% sure about this): maybe we should keep as default value divisor = 1
for backwards-compatibility?
Thoughts?
torchvision/models/mobilenet.py
Outdated
@@ -74,12 +94,12 @@ def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=N | |||
"or a 4-element list, got {}".format(inverted_residual_setting)) | |||
|
|||
# building first layer | |||
input_channel = int(input_channel * width_mult) | |||
self.last_channel = int(last_channel * max(1.0, width_mult)) | |||
input_channel = _make_divisible(input_channel * width_mult, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the 8
as a parameter, potentially letting the user choose different values?
I believe the only reason they used 8
here was because the underlying kernels might be optimized for multiples of 8
Good point on the divisor, I'll make that an option. Regarding the default value, I would suggest to keep it at 8? Many people including myself are using this for benchmarking new algorithms, for which it's necessary to have this implementation 100% as per the official tensorflow implementation. Only problem with this is pretrained models. I guess I can convert these from the official tensorflow implementation, otherwise I have a training recipe that should provide slightly better results. |
@yaysummeriscoming for pre-trained models, can we use the scripts from |
@fmassa you were able to get up to par mobilenet results using that script, inc stepped LR? If so I'm quite envious, I had to use some tricks to get up to scratch results. I still think there's an argument to use the tensorflow weights however? I forget where but there was some commotion about keras & pytorch models having slightly different accuracy levels. In any case, pls share the command line arguments :) |
Yes, the model that I released uses that script.
With the following results
And I'd like to use as much as possible weights trained in PyTorch. The reason being that reproducibility is very important, and importing weights from a different framework means that reproducing that model is not straightforward. |
@yaysummeriscoming could you expose the |
The official tensorflow slim mobilenet v2 implementation rounds the number of channels in each layer to a multiple of 8. This is now user configurable - 1 turns off rounding
Apologies for the delay, just put the change in. |
@yaysummeriscoming there is a linter error
could you fix it? |
Fixed error: ./torchvision/models/mobilenet.py:152:1: W293 blank line contains whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Just waiting for CI to finish
@fmassa Have you reproduce the result with width_multiplier 1.4 using that script? |
@meijieru I haven't tried any trainings other than the |
Layer channels are now rounded to a multiple of 8, as per the official tensorflow implementation. I found this fix when looking through: https://github.com/d-li14/mobilenetv2.pytorch
Fixes #973