Skip to content

Should manylinux1 comes before linux_* in precedence? #3844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dholth opened this issue Jul 14, 2016 · 13 comments
Closed

Should manylinux1 comes before linux_* in precedence? #3844

dholth opened this issue Jul 14, 2016 · 13 comments
Labels
auto-locked Outdated issues that have been locked by automation C: wheel The wheel format and 'pip wheel' command state: needs discussion This needs some more discussion type: enhancement Improvements to functionality

Comments

@dholth
Copy link
Member

dholth commented Jul 14, 2016

I was surprised to see 'manylinux1' come first in pip.pep425tags.get_supported(). Would this cause pip to prefer a pypi-hosted manylinux1 wheel over a locally built linux_x86_64 wheel? I would expect the
'linux_x86_64' wheel to count as "more specific to the current system" and needing to come first.

Propose switching the order of each pair of elements in this list from pip.pep425tags.get_supported():

[('cp35', 'cp35m', 'manylinux1_x86_64'),
 ('cp35', 'cp35m', 'linux_x86_64'),
 ('cp35', 'abi3', 'manylinux1_x86_64'),
 ('cp35', 'abi3', 'linux_x86_64'),
 ('cp35', 'none', 'manylinux1_x86_64'),
 ('cp35', 'none', 'linux_x86_64'),
 ('py3', 'none', 'manylinux1_x86_64'),
 ('py3', 'none', 'linux_x86_64'),
@dholth
Copy link
Member Author

dholth commented Dec 16, 2016

I designed this tagging system and understand how it was intended to work. Putting manylinux1 first goes against the intent of the design, and has personally caused me problems in practice because a locally built wheel cannot override a broken remote manylinux1 wheel. Instead, manylinux1 has to be completely disabled to fix a problem with a pypi wheel.

@dstufft
Copy link
Member

dstufft commented Dec 16, 2016

Congratulations? I'm not sure what relevance that has to anything.

Later comment was edited to include rest of statement.

@dstufft
Copy link
Member

dstufft commented Dec 16, 2016

In #3921 you have two pip core developers who are against putting generic linux wheels after the more specific manylinux1 wheels. You yourself in both the PEP, this issue and in #3921 state that the platform tags are intended to prefer the most specific to the least specific.

Given that you did not design the manylinux1 tag your appeal to authority here is not particularly relevant since it post-dates your design and whether or not tags should be preferred in the most specific to least specific is not being debated. However, it is the opinion of myself, @xavfernandez and apparently the folks who did design manylinux1 (given the PR that implemented the PEP explicitly called out the fact that it put manylinux1 first as it was more specific, added tests for it, and they participated in said PR and never made any claim otherwise).

The fact that wheel currently defaults to a more general tag when compiling on Linux does not magically mean that tag is somehow more specific. If you want a more specific tag to be used by the default, the suggest linux_$(cat /etc/machine-id) seems like a good design to me.

@dholth
Copy link
Member Author

dholth commented Dec 16, 2016

I'm describing a practical problem and I had not seen the recent PR resolution. It's discouraging to argue with you, and I usually step away from attempts to contribute to Python packaging for a while after doing so, so you won't have to worry about me for a while. A random machine id tag would be a different way to solve the same problem of selectively overriding broken pypi wheels, with its own workarounds for sharing wheels between similar machines.

@dstufft
Copy link
Member

dstufft commented Dec 16, 2016

I'm describing a practical problem and I had not seen the recent PR resolution. ... A random machine id tag would be a different way to solve the same problem of selectively overriding broken pypi wheels, with its own workarounds for sharing wheels between similar machines.

I agree this is a problem, and had offered two solutions in #3921 that would, in my opinion, solve it better, one that is automatic and allows a "zero configuration" approach and one which is opt-in and allows a "shared amongst many machines" approach. I believe these options are better because they don't involve making something that is logically less specific be considered more specific (and ontop of that, they remove the requirement to manage compatibility externally to pip).

I believe that the only reason it's defensible at all to swap the order of these two things, is because generally people will put manylinux1 wheels on PyPI and generally they will not put the linux_* wheels on PyPI, but wheel will generate them by default. However there is nothing stopping them from publishing a linux_* wheel to PyPI today (and in fact, it could be super useful in non C library cases, like if you want to ship man pages to Linux but not to Windows for instance).

It's discouraging to argue with you, and I usually step away from attempts to contribute to Python packaging for a while after doing so, so you won't have to worry about me for a while.

I was going to just ignore this, but I don't think that's a very good thing to do. First off, I do not want anyone to feel discouraged to contributing to Python packaging. If you can point out what it is that I do (either here or privately is fine) that you find discouraging I can try to work on it. I suspect that it largely has to do with my own personal frustration over my perception that your solutions to problems and feature ideas tend to focus on expediency and what you think would be cool, rather than long term maintainability and an overall consistency of the related concepts. I don't know if that's how you actually think/view it or not, but that is how it comes across to me, and that frustrates me and I suspect that frustration might come out.

@dholth
Copy link
Member Author

dholth commented Dec 19, 2016

That is a pretty good summary. You probably could not point to something I've done that is not the expedient solution. I like expedient solutions because I need a solution. From my point of view the roof is leaking and I have a tarp. Someone may eventually have to replace the roof, but in the meantime the tarp will prevent the house from rotting. Here's the problem: if tarps are not allowed, does that make it more likely that someone will repair the roof in time? Or will the house be destroyed? You consistently bet that the house will still exist by the time someone can afford to replace the roof, while I wonder why you want to destroy houses. The tarp is not the enemy of the roof. Without the tarp there will never be a new roof, because the house will be gone.
It's not just you, I have repeatedly observed the 'incremental fixes are not allowed' problem in the Python community, but it could be just me :) It is a mechanism for coercing people to do pending cleanup work as compensation for having their (orthogonal) problem addressed, or a way for an overworked maintainer to delay having to cope with a pull request. Or it is just one of the people (not you) who disagree with everything as a hobby and who the new contributor must learn to ignore.
An example of a subject I'm interested in but have not spent that much time on is "useful Python applications" that are not libraries. It is painful to use virtualenv and Python packaging to distribute these apps. No one is left to advocate for them, because they are being written in nodejs or possibly Rust, so if you ask about it on our Python packaging list it appears there is no problem.
An Alex Martelli PyCon talk about perfection https://youtu.be/yo4Uqq7NXQc

@dholth
Copy link
Member Author

dholth commented Dec 21, 2016

@dstufft The most frustrating thing is feeling like I am not given the benefit of the doubt when discussing my own (successful) system. I understood it before writing it down. It is difficult for me to accept that others now understand the system better than I do after reading my imperfect prose, or just think that it is shit. On the other hand it is very understandable that the non-intuitive tagging system confuses people, I have been told repeatedly that it is hard to understand. For example, people expect that OSX version parsing should be a part of the wheel ranking algorithm instead of the supported tag generation code, but it works in the opposite way, because version parsing is complicated and set membership queries are fast.
I did not participate in the manylinux1 design, but it is also very understandable that the inventor of a tag would expect it to come first without anticipating the broken wheel takes precedence over local wheel problem.
From my point of view linux_x86_64 has always meant 'that is all we know, don't share' and is the tag used for wheels built 'here'. This has more to do with the behavior of bdist_wheel and is not related to the spelling of the tag. The PEP should say 'most likely to work on this machine' sorts first and that is most definitely 'compiled on this machine', not 'tag with the best specification'. Magic!
I did not know that linux_x86_64 wheels were allowed on PyPI now. Weren't they blocked?
Perhaps an acceptable solution that avoids philosophical debate would be to introduce a WHEEL_PLATFORM environment variable that bdist_wheel uses to generate wheels, and that pip places on the top. It would provide a workaround for the not that serious problem of broken manylinux1 wheels and make it easier to generate manylinux1 wheels without retagging.
I would prefer an opaque tag over linux_(machine-id) because I have a thing for long hexadecimal strings, but the recommendation should at least include the arch in case you are running a 32 and 64 bit Python on the same OS.

@pfmoore
Copy link
Member

pfmoore commented Dec 21, 2016

@dholth I have to admit that I always found the tagging and tag precedence system confusing, even back when I was working with you on the early wheel stuff. I recall a long debate we had over why consumers had to specify precisely what they wanted to accept, rather than saying what they "were" and having the matching algorithm check for a match. I never really felt comfortable with what we have - maybe because I didn't understand your intentions properly.

With that in mind, I always assumed that linux_x86_64 would work the same as win_amd64 - meaning "for the general Linux (Windows) 64-bit platform". If as you say you always intended linux_x86_64 to mean "not going to work on any other machine" then your suggestion here makes sense - but I don't see how to reconcile it with the form the Windows tags take.

I get the impression that most other people also use the analogy with the Windows tag and assume the Linux tag is intended as "generic" - sure, that makes it fairly useless in practice, that's the whole issue with Linux ABI incompatibilities. Given that is the case, maybe we should simply accept that the linux_x86_64 tag is a source of confusion, and switch to some other tag specific to the current system - such as the proposed linux_$(cat /etc/machine-id) - and retire the linux_x86_64 tag. It's a shame if we lose ability to use the "simple" tag, but if we gain by users being less confused, maybe the result is a win anyway. From what I understand of your position, linux_$(cat /etc/machine-id) would mean exactly the same as what you actually intended linux_x86_64 to mean, but with the added benefit that if people don't respect the "do not share" rule that you mentioned (but people don't seem to realise applies), then the code will enforce it for them.

@dholth
Copy link
Member Author

dholth commented Dec 21, 2016

I'm OK with changing the #1 Linux tag now by a config option. The reconciliation with Windows is that two Windows machines are more similar than two Linux machines, win is more like manylinux1 than it is like generic Linux. Imagine trying hard enough and producing a binary that crashed unless Photoshop 4.3 was installed, except that is easy to do on Linux. Sometimes generic Linux binaries work but fail often enough that a pip user would rather wait for the compile than roll the dice on someone else's binary.
Even the machine id is not perfect, you might install or uninstall things. RPM deals with that by recording a list of filenames linked to by the package as part of its metadata as well as the explicit dependencies. Giving up, having it break sometimes, more caching and less sharing seemed reasonable. Even if you could express the library and ABI dependency metadata in a tag or otherwise, what would the point be to share a binary that worked on so few machines.

@mboisson
Copy link

I ran into the same problem today when I tried installing a wheel compiled locally using pip.

Running pip install in verbose mode, I see :

Local files found: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic/numpy-1.12.0-cp27-cp27mu-linux_x86_64.whl
Using version 1.12.0 (newest of versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.12.0)
"GET /packages/cb/47/19e96945ee6012459e85f87728633f05b1e8791677ae64370d16ac4c849e/numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl HTTP/1.1" 200 16497183
Downloading numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl (16.5MB)

which makes no sense. It finds an acceptable link locally, it is the most recent version and yet retrieve the online version.

@mboisson
Copy link

Note that using some machine ID would not work. On HPC clusters, those would likely be different, yet the environment is the same and the local wheel should still be prefered. Letting one specify a prefered tag in the pip config would be best.

@dstufft dstufft changed the title manylinux1 comes first? Should manylinux1 comes before linux_* in precedence? Mar 31, 2017
@pradyunsg pradyunsg added type: enhancement Improvements to functionality C: wheel The wheel format and 'pip wheel' command state: needs discussion This needs some more discussion labels Mar 5, 2018
@pradyunsg
Copy link
Member

This is a little outside my ballpark here but I'm willing to give this a shot at some point in the future -- given that #3921 is now outdated.

I'll be more than happy to let someone who is more comfortable with the details and nuances of wheel tags take this up if I don't come around to this soon enough. :)

@pradyunsg pradyunsg self-assigned this Mar 5, 2018
@pradyunsg pradyunsg removed their assignment Feb 3, 2019
@chrahunt
Copy link
Member

Given that #3921 was closed in favor of #6523, which seemed to reach a consensus (and with a PR pending a PEP update), should this be closed?

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Aug 9, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Aug 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: wheel The wheel format and 'pip wheel' command state: needs discussion This needs some more discussion type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests

6 participants