-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Using multiple PIP indexes on the same hostname with different credentials does not work #10902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is expected behaviour. I suggest reaching out to Azure support, asking them to explore solutions for this on their end. Our authentication mechanisms have the requirement/assumption baked in that each domain will only have a single username-password pair (or auth token) associated with it. A PR documenting this would be welcome.
Can you confirm that the behaviour is same with the current release of pip? |
Just tested and confirm that the behaviour is the same on the current release. Let me also raise a PR to document this. I will expore the Azure support suggestion too. It would be nice if it worked as it is possible to pass different authentication to a same domain and it's natural to expect it to work. It took me around an hour to understand why my pipeline was not working as this was all hidden behind something like this in Azure pipelines that automatically populates the
|
I believe it's something that shoud be fixed on the
|
In which case, a PR fixing this would be welcome. :) |
Actually I've planned to take a look at adding it :) |
I looked a bit at the code, I think the fix should be simple by replacing netloc with the "root repository URL" when we cache credentials over here: pip/src/pip/_internal/network/auth.py Line 202 in ec8edbf
What I am not clear about however is how does one define a "root repository URL" exactly. PEP503 seems to define it to stop after However, in the Azure feed case, my index URL looks like Two issues with this:
|
Yep. looks like the right place.
I believe PEP503 does not define the "root" repository to have "simple". The "simple is just an example (this is how PyPI indesx define it) but it can be arbitrary path (also one including '/' - that's how I read it at least). However it does define that there must be a "project" as the last part of the path or how long the "root" URL is.
I think that should not really matter. From https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 - the "security" context can also match the prefix - not the whole "URL" including the path (also if you look further - it does not decide what to do in case of overlapping contexts - it leaves it to the decision of the client). So I think what should happen here, the authentication information should be kept in an an array with (prefix, authinfo) tuples, not a dictionary. and the check should not find the exact match, but it should find the longest match from all registered prefixes. |
PEP 503 defines the API in tems of the root URL. I don't know enough about URL semantics to be definitive, but I'm pretty sure the expectation was that a root URL would just be something like So yeah, if you're looking at "what does the PEP say we should do", I think the answer isn't clear. I feel as though the Azure behaviour is unusual enough that this is likely to be just the tip of the iceberg here, and we may want to hold off until we better understand how (or even if) their behaviour conforms to the relevant standards - possibly even updating the standards as needed. I don't want pip's behaviour to end up being a "de facto" requirement for all tools, so I'm concerned about making behavioural decisions here as opposed to just following the spec. Ping @zooba for his Azure expertise. |
Any comments on the RFC7617 @pfmoore ? For me this is a "lower-level" standard (pretty established) about generic basic-authentication behaviour, which includes authentication scopes. At least as I read it it is quite clear - this is a basic authentication, and 'path' is part of the authentication scope, so if a user matches "URL/path". it does not match "URL/other_path". At least this is what I read from it. I am not sure PEP 503 is in any relation to that? |
(BTW. I am just about to submit PR "fixing" it so we can discuss looking at the code proposal) |
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
Submitted #10904 - which should be pretty complete implementation of multi-domain matching (@roman-kouzmenko - you might want to install via |
That's how I interpret this as well but I don't think it will solve the azure issue as mentioned above as it might transform URL/path to URL/better_path_for_azure (still not sure if it's done by pip through redirects or HATEOAS). Anyway, will try your PR tomorrow morning (CET) and report back. |
I am also not sure if this is a "full" solution (for me this is a first deeper look at There is a part about redirect and it is HTTP 401 unauthorized - so I am pretty sure there is no HATEOAS. It works in the way that it stored the authentication URL also for the redirect URL, which I am not sure is the best thing to do so it might need some slight changes too:
UPDATE: Not sure about the redirect (401 is unauthorized of course, silly me) - but I am almost sure there is no HATEOAS and the prefix matching should work just fine (providing of course that the direction of solving the problem will be accepted). |
Also @roman-kouzmenko - it shoudl not matter in your case because I am matching the longest prefix and take the original urls only (which are - I understand |
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
Nope, I have no knowledge of (or interest in) the intricacies of URL handling. Note that the documentation on how pip handles multiple indexes is here. It's not very much, as the algorithm is pretty simple currently. This discussion adds a lot of complexity, and whatever conclusion we come to should be incorporated in that section of the docs. And personally, I'm of the view that if we can't describe the behaviour sufficiently concisely (by which I mean, if we double the size of that section it's too much) then the behaviour is too complex. Also, I should say that I'm not even convinced that we should be supporting this. I appreciate that refusing to support Azure would be a pretty big issue, but conversely, I think we should be discussing whether Azure could implement something that requires less complexity from client tools (and remember, pip is not the only client in this situation - devpi-server, for example, will presumably have the same sort of issues mirroring Azure indexes). |
@potiuk, thanks for the PR, I like your solution. Note there is a bug here I believe, this does not compute the length of the matching prefix but the number of matching characters at the same position anywhere within the two strings: We could use something like this instead
|
I don't think anything would need to change in the documentation, the proposal simply improves how credentials are cached: right now, it is already possible to use two indexes on the same domain (such as https://pkgs.dev.azure.com/feed1 and https://pkgs.dev.azure.com/feed2) but only if credentials are the same for the two indexes. If they are different, one of them will randomly overwrite the other as the cache key is simply the domain name. |
I suspect most likely requests is following redirects automatically which is its default behaviour. |
Yeah. I fixed it in the latest fixup - and added a test there. This is now quite a bit simpler but also strictly follows the RFC (i.e. only full match of the URL witht authentication is used. Previously partial matches were also possible and I've also proposed a documentation update and NEWS update to explain the change in behaviour |
I've only skimmed the discussion, but probably it should be caching anything feed related against the index URL as provided by the caller rather than just the netloc. I noted this when implementing the keyring interface, and there's logic somewhere that keeps track of the original index URL leading to a request for this purpose, but I didn't have a reason to change the netloc-based caching at the time. |
Yep. That's what my proposed PR #10904 does (and it also follows the RFC 3986 with regards to prefix-matching (though that part might not really by needed, I hardly imagine two |
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: pypa#10902
I think this issue here should also be closed since it seems like the plan is to keep the way it is right now #10904 (comment) |
Please to not close this issue. It affects also users of gitlab where different groups in gitlab have their own package repositories, each using their own credentials. It used to work, but no longer does (don't know since what version of pip). It now forces us to use separate requirements files for a single project, since each file basically support one set of credentials. |
The answer is here. Long story short, we (yes, I work at GitLab) deployed a security fix where we now enforce the credentials in the file download endpoints. Due to the situation described in this issue (multiple indexes with different credentials), I described a possible workaround that might work depending on your situation: using a group deploy token where the target group contains all projects targeted by the index urls. |
I just stumbled on those comments while catching up with some Note - this is not a complaints or critique (so pleasee I initilaly raised this issue as a security issue initially (and formally using responsible disclosure) before I opened the PR. I still think this is an exploitable security vulnerability that is CVE worthy. but it's my personal opinion. In Airflow I am one of the PMC members deciding on the fate of security issues raised, last few months I assigned, solved and published some 6 of CVS reported to us, but I have no powers to decide in In private discussions that followed the responsible disclosure of mine and also confirmed in the public comment #10904 (comment) mentioned above, it's been downplayed by I think maintainers in any project (we've done that in Airflow too) have the full right to declare they do not follow some standards (RFC7617 in this case), and just leave the problem to the companies implementing So after long discussions and iterating on the PR, to try to fix it and especially last comment, mentioned - I decided to close it. I generally like to make the world a bit better place one-commit-at-a-time but, in this case if the maintainers declared it as a no-security issue and since it does not impact my or my project's use of I have no good advice for others - other than offering this summary of the state of the issue (at least this is how I see and remember it). That was my own decision for my PR - but - of course - anyone is free to pick my PR of course and maybe they will have better luck. Or maybe approach it differently learning from my story. I hope this will be helpful summary for anyone interested in this issue. |
This is an active problem affecting users in Azure DevOps. I have faced this exact same issue this week, trying to connect to two feeds from the same Azure domain but different organizations. More specifically:
Workarounds for those coming from Azure DevOps: |
There's another workaround, one can give a direct link to a wheel using EDIT: Example:
|
I probably made this possible with https://github.com/pypa/pip/pull/11698/files#diff-a88f002c8cad3308467fd2d7f55ae33f8e0538dfa275d1052797fbd93e0e3099R120, which is available since 23.1. I tested it across two ADO organizations, with |
Works for me with pip 23.3.1. Thank you. Where previously we had to do something like this (dependencies installed separately first):
We can now do this (install just the package we need with dependencies being pulled in by pip):
While the two options look very similar, the first one rarely worked properly and usually required specifying versions for the dependencies that matched exactly what package3 needed. |
My scenario is a lil bit different; here is the tldr I am trying to white-list multiple Gitlab package registry groups (they have different deploy tokens) as part of the databricks init-scripts. MY_FIRST_INDEX_URL = "https://__token__:[email protected]/api/v4/groups/MY_FIRST_GROUP/-/packages/pypi/simple"
MY_SECOND_INDEX_URL = "https://__token__:[email protected]/api/v4/groups/MY_SECOND_GROUP/-/packages/pypi/simple" Now, I am leveraging pip config --global set global.extra-index-url "${MY_FIRST_INDEX_URL} ${MY_SECOND_INDEX_URL}" Now, I can see that, only reference to the Any thoughts ? or maybe progress on this |
Just adding a comment to +1 this issue, we are encountering it with JFrog Artifactory, which is deployed on-prem. I encountered this using Python 3.11.1 with pip 24.3.1 (the latest as of this writing) on Linux. Our use-case is the following:
We are trying to set up a [global]
index-url = https://artifactory.my.company.com/api/pypi/virtual-pypi/simple
extra-index-url =
https://user1:[email protected]/api/pypi/restricted-pypi-repo1/simple
https://user2:[email protected]/api/pypi/restricted-pypi-repo2/simple As noted by others, what ends up happening is that when we try to This is really unintuitive to me as a user and it took me a while to figure out what was going on and find this issue. I would expect that either the credentials are not cached at all, or that the credential caching takes into account the full URL, not just the domain portion. I think that the current assumption of one user per domain completely ignores the use-case of tools like Artifactory that allow you to host multiple, distinct repositories with completely unique permissions applied to them in the same place. As a workaround, we were able to embed the private repositories in the domain portion of the URL. This causes [global]
index-url = https://virtual-pypi.artifactory.my.company.com/api/pypi/virtual-pypi/simple
extra-index-url =
https://user1:[email protected]/api/pypi/restricted-pypi-repo1/simple
https://user2:[email protected]/api/pypi/restricted-pypi-repo2/simple |
One question - does this problem affect |
I've heard of, but never used, SetupFirst, I got a copy of Then, I restored the [global]
index-url = https://artifactory.my.company.com/api/pypi/virtual-pypi/simple
extra-index-url =
https://user1:[email protected]/api/pypi/restricted-pypi-repo1/simple
https://user2:[email protected]/api/pypi/restricted-pypi-repo2/simple Here is my [[index]]
name = "virtual-pypi"
url = "https://artifactory.my.company.com/api/pypi/virtual-pypi/simple"
default = true
[[index]]
name = "restricted-pypi-repo1"
url = "https://user1:[email protected]/api/pypi/restricted-pypi-repo1/simple"
[[index]]
name = "restricted-pypi-repo2"
url = "https://user2:[email protected]/api/pypi/restricted-pypi-repo2/simple" I tried two different tests, making sure to uninstall & clear both the Test 1Install package With
With
Test 2Install With
With
|
The proposed change at https://github.com/pypa/pip/pull/10904/files#diff-e0745416d0b420cb690b37e3c49a9ad53eb59da87eb553840b2ebece989da9ceL187-R237 should fix it. The issue is entirely on pip caching credentials using |
Description
I have a need to simultaneously access two PIP indexes from the same hostname (pkgs.dev.azure.com) but using different credentials.
When configuring it like this:
pip seems to try credentials for feed2 for both feed1 and feed2 failing my builds.
I've worked around this for now by setting the same credentials for both feeds.
Expected behavior
feed1 credentials are used with feed1 and feed2 credentials are used with feed2
pip version
21.1.3
Python version
3.9
OS
linux
How to Reproduce
PIP_EXTRA_INDEX_URL=https://build:[email protected]/org2/_packaging/org-feed/pypi/simple
pip install package1 package2
Output
pip interactively prompts for username breaking the build instead of installing the two packages.
Code of Conduct
The text was updated successfully, but these errors were encountered: