Skip to content

new model weights loading breaks backwards compatibility for existing models #5836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
d4l3k opened this issue Apr 19, 2022 · 1 comment · Fixed by #5992
Closed

new model weights loading breaks backwards compatibility for existing models #5836

d4l3k opened this issue Apr 19, 2022 · 1 comment · Fixed by #5992

Comments

@d4l3k
Copy link
Member

d4l3k commented Apr 19, 2022

🐛 Describe the bug

The new model loading format breaks numerous models that depend on custom loading logic via model_urls. This is used a lot for various academic repos and will make many of them no longer run on PT 1.12.

import torchvision

torchvision.models.resnet.model_urls["resnet18"]

Examples from notable papers/partners:

Looking at all of github for just resnet there's hundreds of usages of resnet.model_urls: https://cs.github.com/?q=resnet.model_urls[

16,460 code results that use model_urls https://github.com/search?p=1&q=torchvision+model_urls&type=Code

Versions

PyTorch version: 1.12.0.dev20220419+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 11.2.0
Clang version: Could not collect
CMake version: version 3.22.2
Libc version: glibc-2.35

Python version: 3.10.2 (main, Mar 15 2022, 12:43:00) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.16.11-arch1-1-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 11.6.55
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.3.1
/usr/lib/libcudnn_adv_infer.so.8.3.1
/usr/lib/libcudnn_adv_train.so.8.3.1
/usr/lib/libcudnn_cnn_infer.so.8.3.1
/usr/lib/libcudnn_cnn_train.so.8.3.1
/usr/lib/libcudnn_ops_infer.so.8.3.1
/usr/lib/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0.dev20220419+cu113
[pip3] torchaudio==0.12.0.dev20220419+cu113
[pip3] torchvision==0.13.0.dev20220419+cu113
[conda] blas 1.0 mkl
[conda] magma-cuda110 2.5.2 1 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-include 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] mypy_extensions 0.4.3 py39h06a4308_0
[conda] numpy 1.20.3 py39hf144106_0
[conda] numpy-base 1.20.3 py39h74d4b33_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1

@datumbox
Copy link
Contributor

Thanks for reporting.

For the benefit of the community, let me provide a summary on why we decided to remove the model_urls parameter. The specific "mechanism" was an internal implementation detail and since it wasn't declared in __all__, it wasn't properly unit-tested or kept consistent across the code-base. As a result, it was neither available for all models (for example 1, 2, 3), nor was using consistent API and naming conventions (for example 4, 5, 6).

Concerning its use on downstream projects, this is a classic example of Hyrum's Law. Though the quoted numbers are blown up by the number of forks or copy-pasted implementations, we are aware that some downstream projects ended up using it, so we have already taken action to inform them and work with them to move way from it (for example see open-mmlab/mmcv#1848). So far we've found that migrating the users to the new solution is very straightforward but we are open to examine reinstating it temporarily with deprecation warnings.

The reason we rolled out this change on main branch several months prior the next release was to have sufficient time to gather feedback and iron out the details. We actually have a dedicated issue for gathering feedback for the new API at #5088. If that's alright with you, I recommend closing this one and move your input on that ticket to keep them in one place and make your feedback more prominent to the community. Up to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants