Skip to content

0.14.0 Regression in wheel metadata #6869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
groodt opened this issue Oct 30, 2022 · 17 comments
Closed

0.14.0 Regression in wheel metadata #6869

groodt opened this issue Oct 30, 2022 · 17 comments

Comments

@groodt
Copy link

groodt commented Oct 30, 2022

🐛 Describe the bug

The wheel metadata for the 0.14.0 release seems corrupt. It is not installable through installer, which means that it probably isn't installable using Hatch or PDM either.

The error is:

AssertionError: In /Users/groodt/Downloads/torchvision-0.14.0-cp39-cp39-manylinux1_x86_64.whl, torchvision-0.14.0.dist-info/RECORD is not mentioned in RECORD

Full reproduction and traceback here:

python3 -m pip install installer
python3 -m installer --destdir . ~/Downloads/torchvision-0.14.0-cp39-cp39-manylinux1_x86_64.whl
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/private/tmp/blah/.venv/lib/python3.10/site-packages/installer/__main__.py", line 85, in <module>
    _main(sys.argv[1:], "python -m installer")
  File "/private/tmp/blah/.venv/lib/python3.10/site-packages/installer/__main__.py", line 81, in _main
    installer.install(source, destination, {})
  File "/private/tmp/blah/.venv/lib/python3.10/site-packages/installer/_core.py", line 96, in install
    for record_elements, stream, is_executable in source.get_contents():
  File "/private/tmp/blah/.venv/lib/python3.10/site-packages/installer/sources.py", line 158, in get_contents
    assert record is not None, "In {}, {} is not mentioned in RECORD".format(
AssertionError: In /Users/groodt/Downloads/torchvision-0.14.0-cp39-cp39-manylinux1_x86_64.whl, torchvision-0.14.0.dist-info/RECORD is not mentioned in RECORD

Versions

0.14.0

@groodt
Copy link
Author

groodt commented Oct 31, 2022

I've discovered that the following steps corrects the issue.

python3 -m wheel unpack --dest=/Users/groodt/Downloads/ ~/Downloads/torchvision-0.14.0-cp39-cp39-manylinux1_x86_64.whl
python3 -m wheel pack --dest-dir=/Users/groodt/Downloads/ ~/Downloads/torchvision-0.14.0/

@biphasic
Copy link

biphasic commented Nov 1, 2022

thanks for clarifying this. The PDM install indeed fails with that error

@ofek
Copy link

ofek commented Nov 1, 2022

tried Hatch?

@1e100
Copy link
Contributor

1e100 commented Nov 11, 2022

Bazel also fails with the same error when using pip_parse rule from rules_python.

@weiwangmeta
Copy link

Looks like the fix would be using @groodt 's command and re-upload the binaries?

@malfet malfet assigned atalman and unassigned weiwangmeta Nov 14, 2022
@malfet
Copy link
Contributor

malfet commented Nov 14, 2022

Do we know what have caused the regression?
Also, imo we should produce correct binaries and publish the fix for 1.13.1 (if we are doing it).
Likely affects other wheels that we publish

@groodt
Copy link
Author

groodt commented Nov 14, 2022

I don’t think my “fix” would work exactly as is. It changes the wheel name. I’ve not looked too much into it, but something has changed that prevents this line from executing. I agree with finding the cause of the regression.

f.write(f"{rel_file},,\n")

@weiwangmeta
Copy link

Confirmed 0.13.1 works "python3 -m installer --destdir . torchvision-0.13.1-cp39-cp39-manylinux1_x86_64.whl", so indeed a regression.
We will compare the difference between 0.13.1 and 0.14.0 cc @atalman

@bingoct
Copy link

bingoct commented Nov 17, 2022

version 0.13.1 is still installed fail by pdm.
Python version 3.8.10
Pdm version 2.2.1

 File "C:\Users\bingo\AppData\Roaming\Python\Python38\site-packages\installer\sources.py", line 158, in get_contents
    assert record is not None, "In {}, {} is not mentioned in RECORD".format(
AssertionError: In C:\Users\bingo\AppData\Local\Temp\pdm-build-5er6om93\torchvision-0.13.1-cp38-cp38-win_amd64.whl, torchvision-0.13.1.dist-info/LICENSE is not
mentioned in RECORD

Confirmed 0.13.1 works "python3 -m installer --destdir . torchvision-0.13.1-cp39-cp39-manylinux1_x86_64.whl", so indeed a regression. We will compare the difference between 0.13.1 and 0.14.0 cc @atalman

@groodt
Copy link
Author

groodt commented Nov 25, 2022

I'm running this now to see if I can identify any issues:
env PYTHON_VERSION=3.9 PYTORCH_VERSION=1.13.0 UNICODE_ABI=no CU_VERSION=cpu packaging/build_wheel.sh

I added some print output to see if the conditional would match.

full_file: /vision/dist/.wheel-process/torchvision-0.15.0.dev20221125+cpu.dist-info/RECORD
record_file: /vision/dist/.wheel-process/torchvision-0.15.0.dev20221125+cpu.dist-info/RECORD
rel_file: torchvision-0.15.0.dev20221125+cpu.dist-info/RECORD

Interestingly, the wheel built from master seems to install just fine.

python3.9 -m installer --destdir /tmp/blah ./dist/torchvision-0.15.0.dev20221125+cpu-cp39-cp39-linux_x86_64.whl

The above command has no issues. I'll try with the specific v.0.14.0 tag to see if I can reproduce a broken wheel.

Interestingly, this works as well. 🤷

[root@f6a287c09ebc vision]# git describe --tags
v0.14.0
[root@f6a287c09ebc vision]# env PYTHON_VERSION=3.9 PYTORCH_VERSION=1.13.0 UNICODE_ABI=no CU_VERSION=cpu packaging/build_wheel.sh
....
[root@f6a287c09ebc vision]# python3.9 -m installer --destdir /tmp/blah dist/torchvision-0.14.0.dev20221125+cpu-cp39-cp39-linux_x86_64.whl

Perhaps the uploaded wheel was just created incorrectly or uploaded incorrectly?

@malfet
Copy link
Contributor

malfet commented Nov 29, 2022

This affects only binaries uploaded to official pypi index, which makes me suspect that it might be due to bug in https://github.com/pytorch/builder/blob/85fc34387957e3810602fdeea97c1a9687af04b3/release/pypi/prep_binary_for_pypi.sh#L93

@groodt
Copy link
Author

groodt commented Nov 29, 2022

Ah, @malfet That looks interesting. I didn't know about that script. I'll try it when I get a chance. Unless somebody else gets to it first. It does look like this PR could be a potential candidate for the regression: pytorch/builder#1164

@groodt
Copy link
Author

groodt commented Nov 29, 2022

I think you're on to something @malfet

That script definitely corrupts the archive. I took the working wheel I produced above with packaging/build_wheel.sh and passed it through the ./release/pypi/prep_binary_for_pypi.sh script as follows:

./release/pypi/prep_binary_for_pypi.sh ../vision/dist/torchvision-0.14.0.dev20221125+cpu-cp39-cp39-linux_x86_64.whl

I then tried to use installer on the newly compressed wheel and was able to reproduce the problem.

(.venv) [root@d9bbef4fac9a builder]# python3.9 -m installer --destdir /tmp/blah torchvision-0.14.0.dev20221125-cp39-cp39-linux_x86_64.whl
Traceback (most recent call last):
  File "/opt/_internal/cpython-3.9.15/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/_internal/cpython-3.9.15/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/builder/.venv/lib/python3.9/site-packages/installer/__main__.py", line 85, in <module>
    _main(sys.argv[1:], "python -m installer")
  File "/builder/.venv/lib/python3.9/site-packages/installer/__main__.py", line 81, in _main
    installer.install(source, destination, {})
  File "/builder/.venv/lib/python3.9/site-packages/installer/_core.py", line 96, in install
    for record_elements, stream, is_executable in source.get_contents():
  File "/builder/.venv/lib/python3.9/site-packages/installer/sources.py", line 158, in get_contents
    assert record is not None, "In {}, {} is not mentioned in RECORD".format(
AssertionError: In torchvision-0.14.0.dev20221125-cp39-cp39-linux_x86_64.whl, torchvision-0.14.0.dev20221125.dist-info/RECORD is not mentioned in RECORD

Notice that I did need to remove the +cpu local version identifier from the wheel name to reproduce the error.

@groodt
Copy link
Author

groodt commented Dec 1, 2022

I believe I identified the issue and have submitted a PR with a proposed fix here: pytorch/builder#1215

The issue is that the RECORD file is deleted before calling make_wheel_record. The make_wheel_record function expects to write a RECORD file. By changing the deletion to a truncate, the logic works correctly.

@groodt
Copy link
Author

groodt commented Dec 9, 2022

The fix to the build script has been merged. Are there any plans to release a new version of torchvision?

@malfet
Copy link
Contributor

malfet commented Dec 14, 2022

Yes, 1.13.1, which among another things will contain the change will likely be available tomorrow

@groodt
Copy link
Author

groodt commented Dec 16, 2022

I can confirm this is now fixed in torch==1.13.1 and torchvision==0.14.1

@groodt groodt closed this as completed Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants