Skip to content

Commit cf285b9

Browse files
committed
Implement RFC7617-compliant multi-domain basic authentication
The https://datatracker.ietf.org/doc/html/rfc7617#section-2.2 defines multi-domain authentication behaviour and authentication scopes for basic authentication. This change improves the implementation of the multi-domain matching to be RC7617 compliant * path matching (including longest match) * scheme validation matching Closes: #10902
1 parent 3820b0e commit cf285b9

File tree

5 files changed

+223
-20
lines changed

5 files changed

+223
-20
lines changed

docs/html/topics/authentication.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,54 @@ token as the "username" and do not provide a password:
1616
https://[email protected]/simple
1717
```
1818

19+
When you specify several indexes, each of them can come with its own
20+
authentication information. When the domains and schemes of multiple
21+
indexes partially overlap, you can specify different authentication for each of them
22+
For example you can have:
23+
24+
```
25+
PIP_INDEX_URL=https://build:[email protected]/feed1
26+
PIP_EXTRA_INDEX_URL=https://build:[email protected]/feed2
27+
```
28+
29+
If you specify multiple identical index URLs with different authentication information,
30+
authentication from the first index will be used.
31+
32+
```{versionchanged} 22.1
33+
The basic authentication is now compliant with RFC 7617
34+
```
35+
36+
In compliance with [RFC7617](https://datatracker.ietf.org/doc/html/rfc7617#section-2.2) if the indexes
37+
overlap, the authentication information from the prefix-match will be reused for the longer index if
38+
the longer index does not contain the authentication information. In case multiple indexes are
39+
prefix-matching, the authentication of the first of the longest matching prefixes is used.
40+
41+
For example in this case, build:password authentication will be used when authenticating with the extra
42+
index URL.
43+
44+
```
45+
PIP_INDEX_URL=https://build:[email protected]/
46+
PIP_EXTRA_INDEX_URL=https://pkgs.dev.azure.com/feed1
47+
```
48+
49+
```{note}
50+
Prior to version 22.1 reusing of basic authentication between URLs was not RFC7617 compliant.
51+
This could lead to the situation that custom-built indexes could receive the authentication
52+
provided for the index path, to download files outside fof the security domain of the path.
53+
54+
For example if your index at https://username:[email protected]/simple served files from
55+
https://pypi.company.com/file.tar.gz - the username and password provided for the "/simple" path
56+
was also used to authenticate download of the `file.tar.gz`. This is not RFC7617 compliant and as of
57+
version 22.1 it will not work automatically. If you encounter a problem where your files are being
58+
served from different security domain than your index and authentication is not used for them, you
59+
should (ideally) fix it on your server side or (as temporary workaround)
60+
specify your file download location as extra index url:
61+
62+
PIP_EXTRA_INDEX_URL=https://username:[email protected]/
63+
64+
```
65+
66+
1967
### Percent-encoding special characters
2068

2169
```{versionadded} 10.0

news/10904.feature.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
When you specify multiple indexes with common domains and schemes, basic authentication for each of
2+
the indexes can be different - compliant with RFC 7617. Previous versions of pip reused basic
3+
authentication credentials for all urls within the same domain.

src/pip/_internal/network/auth.py

Lines changed: 57 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"""
66

77
import urllib.parse
8-
from typing import Any, Dict, List, Optional, Tuple
8+
from typing import Any, List, Optional, Tuple
99

1010
from pip._vendor.requests.auth import AuthBase, HTTPBasicAuth
1111
from pip._vendor.requests.models import Request, Response
@@ -76,7 +76,7 @@ def __init__(
7676
) -> None:
7777
self.prompting = prompting
7878
self.index_urls = index_urls
79-
self.passwords: Dict[str, AuthInfo] = {}
79+
self.passwords: List[Tuple[str, AuthInfo]] = []
8080
# When the user is prompted to enter credentials and keyring is
8181
# available, we will offer to save them. If the user accepts,
8282
# this value is set to the credentials they entered. After the
@@ -163,6 +163,49 @@ def _get_new_credentials(
163163

164164
return username, password
165165

166+
def _get_matching_credentials(self, original_url: str) -> Optional[AuthInfo]:
167+
"""
168+
Find matching credentials based on the longest matching prefix found.
169+
170+
According to https://datatracker.ietf.org/doc/html/rfc7617#section-2.2
171+
the authentication scope is defined by removing the last part after
172+
the last "/" of the resource - but in case of `pip` the end of path is always
173+
"/project", so we can treat the `original_url` as full `authentication scope'
174+
for the request. The RFC specifies that prefix matching of the scope is
175+
also within the protection scope specified by the realm value, so we can
176+
safely assume that if we find any match, the longest matching prefix is
177+
the right authentication information we should use.
178+
179+
The specification does not decide which of the matching scopes should be
180+
used if there are more of them. Our decision is to choose the longest
181+
matching one. In case exactly the same prefix will be added several times,
182+
the authentication information from the first one is used.
183+
184+
According to the RFC, the authentication scope includes both scheme and netloc,
185+
Both have to match in order to share the authentication scope.
186+
187+
:param original_url: original URL of the request
188+
:return: Stored Authentication info matching the authentication scope or
189+
None if not found
190+
"""
191+
max_matched_prefix_length = 0
192+
matched_auth_info = None
193+
no_auth_url = remove_auth_from_url(original_url)
194+
parsed_original = urllib.parse.urlparse(no_auth_url)
195+
for prefix, auth_info in self.passwords:
196+
parsed_stored = urllib.parse.urlparse(prefix)
197+
if (
198+
parsed_stored.netloc != parsed_original.netloc
199+
or parsed_stored.scheme != parsed_original.scheme
200+
):
201+
# only consider match when both scheme and netloc match
202+
continue
203+
if no_auth_url.startswith(prefix):
204+
if len(prefix) > max_matched_prefix_length:
205+
matched_auth_info = auth_info
206+
max_matched_prefix_length = len(prefix)
207+
return matched_auth_info
208+
166209
def _get_url_and_credentials(
167210
self, original_url: str
168211
) -> Tuple[str, Optional[str], Optional[str]]:
@@ -184,12 +227,14 @@ def _get_url_and_credentials(
184227
# Do this if either the username or the password is missing.
185228
# This accounts for the situation in which the user has specified
186229
# the username in the index url, but the password comes from keyring.
187-
if (username is None or password is None) and netloc in self.passwords:
188-
un, pw = self.passwords[netloc]
189-
# It is possible that the cached credentials are for a different username,
190-
# in which case the cache should be ignored.
191-
if username is None or username == un:
192-
username, password = un, pw
230+
if username is None or password is None:
231+
matched_credentials = self._get_matching_credentials(original_url)
232+
if matched_credentials:
233+
un, pw = matched_credentials
234+
# It is possible that the cached credentials are for a
235+
# different username, in which case the cache should be ignored.
236+
if username is None or username == un:
237+
username, password = un, pw
193238

194239
if username is not None or password is not None:
195240
# Convert the username and password if they're None, so that
@@ -200,7 +245,7 @@ def _get_url_and_credentials(
200245
password = password or ""
201246

202247
# Store any acquired credentials.
203-
self.passwords[netloc] = (username, password)
248+
self.passwords.append((remove_auth_from_url(url), (username, password)))
204249

205250
assert (
206251
# Credentials were found
@@ -273,7 +318,9 @@ def handle_401(self, resp: Response, **kwargs: Any) -> Response:
273318
# Store the new username and password to use for future requests
274319
self._credentials_to_save = None
275320
if username is not None and password is not None:
276-
self.passwords[parsed.netloc] = (username, password)
321+
self.passwords.append(
322+
(remove_auth_from_url(resp.url), (username, password))
323+
)
277324

278325
# Prompt to save the password to keyring
279326
if save and self._should_save_password_to_keyring():

tests/functional/test_install_config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,7 @@ def test_prioritize_url_credentials_over_netrc(
432432
server.mock.side_effect = [
433433
package_page(
434434
{
435-
"simple-3.0.tar.gz": "/files/simple-3.0.tar.gz",
435+
"simple-3.0.tar.gz": "/simple/files/simple-3.0.tar.gz",
436436
}
437437
),
438438
authorization_response(str(data.packages / "simple-3.0.tar.gz")),

tests/unit/test_network_auth.py

Lines changed: 114 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
import functools
22
from typing import Any, List, Optional, Tuple
3+
from unittest.mock import Mock, PropertyMock
34

45
import pytest
56

67
import pip._internal.network.auth
78
from pip._internal.network.auth import MultiDomainBasicAuth
9+
from src.pip._vendor.requests import PreparedRequest
810
from tests.lib.requests_mocks import MockConnection, MockRequest, MockResponse
911

1012

@@ -50,40 +52,143 @@ def test_get_credentials_parses_correctly(
5052
(username is None and password is None)
5153
or
5254
# Credentials were found and "cached" appropriately
53-
auth.passwords["example.com"] == (username, password)
55+
(url, (username, password)) in auth.passwords
5456
)
5557

5658

59+
def test_handle_prompt_for_password_successful() -> None:
60+
auth = MultiDomainBasicAuth()
61+
resp = Mock()
62+
resp.status_code = 401
63+
resp.url = "http://example.com"
64+
resp.raw = Mock()
65+
resp.content = PropertyMock()
66+
resp.content = ""
67+
resp.request = PreparedRequest()
68+
resp.request.headers = {}
69+
auth._prompt_for_password = Mock() # type: ignore[assignment]
70+
auth._prompt_for_password.return_value = ("user", "password", True)
71+
auth.handle_401(resp)
72+
auth._prompt_for_password.assert_called_with("example.com")
73+
expected = ("http://example.com", ("user", "password"))
74+
assert len(auth.passwords) == 1
75+
assert auth.passwords[0] == expected
76+
77+
78+
def test_handle_prompt_for_password_unsuccessful() -> None:
79+
auth = MultiDomainBasicAuth()
80+
resp = Mock()
81+
resp.status_code = 401
82+
resp.url = "http://example.com"
83+
resp.raw = Mock()
84+
resp.content = PropertyMock()
85+
resp.content = ""
86+
resp.request = PreparedRequest()
87+
resp.request.headers = {}
88+
auth._prompt_for_password = Mock() # type: ignore[assignment]
89+
auth._prompt_for_password.return_value = (None, None, False)
90+
auth.handle_401(resp)
91+
auth._prompt_for_password.assert_called_with("example.com")
92+
assert len(auth.passwords) == 0
93+
94+
5795
def test_get_credentials_not_to_uses_cached_credentials() -> None:
5896
auth = MultiDomainBasicAuth()
59-
auth.passwords["example.com"] = ("user", "pass")
97+
auth.passwords.append(("http://example.com", ("user", "pass")))
6098

6199
got = auth._get_url_and_credentials("http://foo:[email protected]/path")
62100
expected = ("http://example.com/path", "foo", "bar")
63101
assert got == expected
64102

65103

66-
def test_get_credentials_not_to_uses_cached_credentials_only_username() -> None:
104+
def test_get_credentials_not_to_use_cached_credentials_only_username() -> None:
105+
auth = MultiDomainBasicAuth()
106+
auth.passwords.append(("https://example.com", ("user", "pass")))
107+
108+
got = auth._get_url_and_credentials("https://[email protected]/path")
109+
expected = ("https://example.com/path", "foo", "")
110+
assert got == expected
111+
112+
113+
def test_multi_domain_credentials_match() -> None:
114+
auth = MultiDomainBasicAuth()
115+
auth.passwords.append(("http://example.com", ("user", "pass")))
116+
auth.passwords.append(("http://example.com/path", ("user", "pass2")))
117+
118+
got = auth._get_url_and_credentials("http://[email protected]/path")
119+
expected = ("http://example.com/path", "user", "pass2")
120+
assert got == expected
121+
122+
123+
def test_multi_domain_credentials_longest_match() -> None:
124+
auth = MultiDomainBasicAuth()
125+
auth.passwords.append(("http://example.com", ("user", "pass")))
126+
auth.passwords.append(("http://example.com/path", ("user", "pass2")))
127+
auth.passwords.append(("http://example.com/path/subpath", ("user", "pass3")))
128+
129+
got = auth._get_url_and_credentials("http://[email protected]/path")
130+
expected = ("http://example.com/path", "user", "pass2")
131+
assert got == expected
132+
133+
134+
def test_multi_domain_credentials_partial_match_only() -> None:
67135
auth = MultiDomainBasicAuth()
68-
auth.passwords["example.com"] = ("user", "pass")
136+
auth.passwords.append(("http://example.com/path1", ("user", "pass")))
69137

70-
got = auth._get_url_and_credentials("http://foo@example.com/path")
71-
expected = ("http://example.com/path", "foo", "")
138+
got = auth._get_url_and_credentials("http://example.com/path2")
139+
expected = ("http://example.com/path2", None, None)
72140
assert got == expected
73141

74142

75143
def test_get_credentials_uses_cached_credentials() -> None:
76144
auth = MultiDomainBasicAuth()
77-
auth.passwords["example.com"] = ("user", "pass")
145+
auth.passwords.append(("https://example.com", ("user", "pass")))
146+
147+
got = auth._get_url_and_credentials("https://example.com/path")
148+
expected = ("https://example.com/path", "user", "pass")
149+
assert got == expected
150+
151+
152+
def test_get_credentials_not_uses_cached_credentials_different_scheme_http() -> None:
153+
auth = MultiDomainBasicAuth()
154+
auth.passwords.append(("http://example.com", ("user", "pass")))
155+
156+
got = auth._get_url_and_credentials("https://example.com/path")
157+
expected = ("https://example.com/path", None, None)
158+
assert got == expected
159+
160+
161+
def test_get_credentials_not_uses_cached_credentials_different_scheme_https() -> None:
162+
auth = MultiDomainBasicAuth()
163+
auth.passwords.append(("https://example.com", ("user", "pass")))
78164

79165
got = auth._get_url_and_credentials("http://example.com/path")
80-
expected = ("http://example.com/path", "user", "pass")
166+
expected = ("http://example.com/path", None, None)
81167
assert got == expected
82168

83169

84170
def test_get_credentials_uses_cached_credentials_only_username() -> None:
85171
auth = MultiDomainBasicAuth()
86-
auth.passwords["example.com"] = ("user", "pass")
172+
auth.passwords.append(("http://example.com", ("user", "pass")))
173+
174+
got = auth._get_url_and_credentials("http://[email protected]/path")
175+
expected = ("http://example.com/path", "user", "pass")
176+
assert got == expected
177+
178+
179+
def test_get_credentials_uses_cached_credentials_wrong_username() -> None:
180+
auth = MultiDomainBasicAuth()
181+
auth.passwords.append(("http://example.com", ("user", "pass")))
182+
183+
got = auth._get_url_and_credentials("http://[email protected]/path")
184+
expected = ("http://example.com/path", "user2", "")
185+
assert got == expected
186+
187+
188+
def test_get_credentials_added_multiple_times() -> None:
189+
auth = MultiDomainBasicAuth()
190+
auth.passwords.append(("http://example.com", ("user", "pass")))
191+
auth.passwords.append(("http://example.com", ("user", "pass2")))
87192

88193
got = auth._get_url_and_credentials("http://[email protected]/path")
89194
expected = ("http://example.com/path", "user", "pass")

0 commit comments

Comments
 (0)