Skip to content

[RFC] Use pretrained=True to load the best available pre-trained weights #5015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datumbox opened this issue Nov 30, 2021 · 7 comments
Closed

Comments

@datumbox
Copy link
Contributor

datumbox commented Nov 30, 2021

🚀 RFC

Background Info

To access pre-trained models in TorchVision, one needs to pass pretrained=True on the model builders. Example:

from torchvision.models import resnet50

# With weights:
model = resnet50(pretrained=True)

# Without weights:
model = resnet50(pretrained=False)

Unfortunately the above API does not allow us to support multiple pre-trained weights. This feature is necessary when we want to provide improved weights on the same dataset (for example better Acc@1 on ImageNet) or additional weights trained on a different dataset (for example in Object Detection use VOC instead of COCO). With the completion of the Multi-weight support prototype the TorchVision model builders can now support more than 1 set of weights:

from torchvision.prototype.models import resnet50, ResNet50_Weights

# Old weights:
model = resnet50(weights=ResNet50_Weights.ImageNet1K_V1)

# New weights:
model = resnet50(weights=ResNet50_Weights.ImageNet1K_V2)

# No weights:
model = resnet50(weights=None)

The above prototype API is now available on nightly builds where users can test it and gather feedback. Once the feedback is gathered and acted upon, we will consider releasing the new API on the main area.

What should be the behaviour of pretrained=True?

Upon release, the legacy pretrained=True parameter will be deprecated and it will be removed on a future version of TorchVision (TBD when). The question of this RFC is what the behaviour of the pretrained=True should be until its removal. There are currently two obvious candidates:

Option 1: Using the Legacy weights

Using pretrained=True the new API should return the same legacy weights as the one used by the current API.

This is how the prototype is currently implemented. The following calls are all equivalent:

# Legacy weights with accuracy 76.130%
model = resnet50(weights=ResNet50_Weights.ImageNet1K_V1) 
model = resnet50(pretrained=True)
model = resnet50(True)

Why to select this option:

  • It is aligned with TorchVision's strong Backwards Compatibility guarantees
  • It requires a "manual opt-in" from users to switch to the new weights
  • It's the safest option

Option 2: Using the Best available weights

Using pretrained=True the new API should return the best available weights.

The following calls will be made equivalent:

# New weights with accuracy 80.674%
model = resnet50(weights=ResNet50_Weights.ImageNet1K_V2)
model = resnet50(weights=ResNet50_Weights.default)
model = resnet50(pretrained=True)
model = resnet50(True)

Why to select this option:

  • The users will benefit automatically from the major accuracy improvement.
  • In practice TorchVision didn't actually offer BC guarantees on the weights. There are several instances where we modified in-place the weights previously [1, 2, 3, 4]. Due to this, one could make the argument that the semantics of pretrained=True always meant "give me the current best weights".
  • The in-place modification of weights is commonplace in other libraries [1, 2].
  • It emphasises the fact that ResNet50 and other older architectures achieve very high accuracies if trained with modern approaches and this can have positive influence in research [1].

To address some of the cons of adopting this option we will:

  • Raise warnings to inform users that they access new weights. Provide information within the warning on how to switch to the old behaviour.
  • Inform downstream libraries and users about the upcoming change via blogposts, social media and even by opening PRs to their projects (especially for Meta-backed projects).

Feedback

We would love to hear your thoughts on the matter. You can vote on this topic on Twitter or explain your position in the comments.

@fibbonnaci
Copy link

fibbonnaci commented Dec 1, 2021

I like option 2 and is inline with common user experience. As long as we inform the user that we have upgraded the weights to reflect the latest ones and document how they could use the old weights if needed, this is the preferred approach.

@fmassa
Copy link
Member

fmassa commented Dec 1, 2021

I lean towards option 1.

The reason being that paper implementations don't generally get updated once they are released to the public. Reproducing research is fundamental, and option 2 provides a silent breakage in BC that can make reproducing the results of the paper not possible, without any warnings to the user of why (and raising warnings all the time is very annoying).

In the same way, if a downstream user is on an older version of torchvision (say 0.11), and is pulling a repository which used torchvision 0.13 (with pretrained=True meaning get the new weights), they will also not be able to reproduce the results, adding one extra layer of complication of why they weren't able to reproduce the results.

@davidpicard
Copy link

This is a complex problem, because both options have pros and cons:

Option 1:

  • Does not break legacy code with a simple update. There are probably some repositories that use unit testing to check that the weights are consistent with what's expected (like checking a hash or checking the output of a very specific image). Changing the default weights without planning is not nice to people that care about maintaining products.
  • Forces research project to consciously select better weights.
  • We're living in the past.

Option 2:

  • Get the better model without having to add boilerplate for it, as any improvement should be.
  • Probably breaks some code because the change is unexpected.
  • Allows some research to unknowingly report better results just because of better backbone, leading to less reproducibility.

Honestly, I don't think it is the role of a toolkit to fix the lack of reproducibility or robustness of current research papers. Because there is no clear view of the time at which the option will be removed, I would vote for a hybrid solution with a warning associated to the option pretrained=True like "WARNING: pretrained=True will change the default weights on July 15th 2021. This significantly increases the accuracy on ImageNet and changes many published results." Or something alike. There will be a warning for when the option is removed, right? So why not have one before the keyword changes meaning?
If the option is to be removed much quicker than I anticipate, then option 1 is fine.

@yassersouri
Copy link

Since the pretrained=True is getting deprecated, I suggest giving some hints regarding v2 weights in the deprecation message and going with option 1.
Mainly the reason is reproducibility.

@lkskstlr
Copy link

lkskstlr commented Dec 1, 2021

I lean towards option 1 for similar reasons as @fmassa. I would maybe propose the following two additions:

  • Consider adding a warning for pretrained=True, which states that the weights are legacy with suboptimal performance and how to use current ones. Also, add an option to silence this warning (switching to weights=ResNet50_Weights.ImageNet1K_V1). I also agree with the problem of the constant warnings, however, I don't have a clear solution here. From my perspective, there could also be a global flag that disables all similar warnings.
  • Consider adding an option that lets one get the current SOTA weights without specifying them exactly. Either weights="best", weights=ResNet50_Weights.BEST or pretrained=True, best=True, or something similar. This could be an easy fix for users who want to just get the best performance. It would also future-proof libraries that want to always use the best weights at any time, which is not possible if the weights have to be specified explicitly.

Thank you all for working on PyTorch, it is overall one of the best libraries I know, often because the user interface is carefully designed.

@jamt9000
Copy link
Contributor

jamt9000 commented Dec 1, 2021

I have quite often needed to precompute features using the torchvision resnets (such as for indexing with FAISS) and relied on being able to get a comparable feature for new images by just creating a new resnet18(pretrained=True) so this sort of use case would break quite badly if the weights changed. (I would at least want loud warnings so that incompatible features aren't silently added to an index after upgrading torchvision).

@voldemortX
Copy link
Contributor

voldemortX commented Dec 6, 2021

I'd suggest using the legacy weights. Reproducibility/Fairness and BC is crucial for research. Imagine a 2021 paper said it used the torchvision ImageNet pre-trained weights and a 2022 paper that use better weights also say the same thing, how can we compare what's what? Of course the new weights are also important for people to move forward, so maybe just keep the legacy weights as default and throw a warning for the existence of much better weights. This way people can find out usage of better weights by looking at the code instead of the dependency versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants