-
Notifications
You must be signed in to change notification settings - Fork 1.2k
remote: http: raise exception when response with error status code #2794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ae7fe0
to
4f898fe
Compare
dvc/exceptions.py
Outdated
@@ -338,3 +338,11 @@ def __init__(self, url, cause=None): | |||
), | |||
cause=cause, | |||
) | |||
|
|||
|
|||
class HTTPErrorStatusCodeException(DvcException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to invent a new exception class for each situation? This would be a nightmare if people will some time in the future use Repo
class and try to handle errors. Also creates bloat.
This is question goes to everybody not only @pared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that might be excessive in some situations, though, what alternative do we have?
Do you think I should use requests.HTTPError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably discuss it separately. I didn't mean this a blocker for this particular PR, just used it as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it can't be requests.HTTPError
as we need to descend this from DvcException()
.
dvc/exceptions.py
Outdated
"Server responded with error status code: '{}' and message: " | ||
"'{}'".format(code, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and reason should go next to each other, separate by space. This how it is presented everywhere. We might also want to say that this is an HTTP error in the message.
tests/func/test_repro.py
Outdated
@@ -43,6 +43,7 @@ | |||
from tests.func.test_data_cloud import get_ssh_url | |||
from tests.func.test_data_cloud import TEST_AWS_REPO_BUCKET | |||
from tests.func.test_data_cloud import TEST_GCP_REPO_BUCKET | |||
from tests.utils.httpd import ContentMD5Handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't split imports like that anymore)
tests/unit/remote/test_http.py
Outdated
@@ -13,3 +17,23 @@ def test_no_traverse_compatibility(dvc_repo): | |||
|
|||
with pytest.raises(ConfigError): | |||
RemoteHTTP(dvc_repo, config) | |||
|
|||
|
|||
@pytest.mark.parametrize("response_code", [404, 403, 500]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in testing it 3 times?
tests/unit/remote/test_http.py
Outdated
|
||
with pytest.raises(HTTPErrorStatusCodeException): | ||
remote._download( | ||
URLInfo(os.path.join(url, "file.txt")), "file.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply use URLInfo(url) / "file.txt"
or use URLInfo
from the start. This will save you from an error of using os.path
instead of posixpath
.
tests/utils/httpd.py
Outdated
self._lock.acquire() | ||
handler_class = ETagHandler if handler == "etag" else ContentMD5Handler | ||
self.response_handler = handler_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to store this to an attribute?
5e06a19
to
aa5ca84
Compare
@pared Please rebase 😉 |
dvc/exceptions.py
Outdated
class HTTPError(DvcException): | ||
def __init__(self, code, reason): | ||
super(HTTPError, self).__init__( | ||
"HTTP error: '{} {}'".format(code, reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will show:
HTTPError: HTTP Error: 404 Not Found
dvc/remote/http.py
Outdated
request = self._request("GET", from_info.url, stream=True) | ||
response = self._request("GET", from_info.url, stream=True) | ||
if response.status_code != 200: | ||
raise HTTPError(response.status_code, response.reason) | ||
with Tqdm( | ||
total=None if no_progress_bar else self._content_length(from_info), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total=None if no_progress_bar else self._content_length(from_info), | |
total=None if no_progress_bar else self._content_length(response), |
I wonder why we are still doing this extra request, also ._content_length()
doesn't need to work with url really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outside of your PR actually. But since you are here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't the extra request done only if response does not contain Content-Length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass url (path_info) and not response then you always make a second request. _content_length()
should stop accepting urls and might be even inlined, this will prevent this inefficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right, sorry, I'll change accordingly. Though, I would leave _content_length
as it is, in order to not obstruct _download
method content.
dvc/remote/http.py
Outdated
@@ -45,7 +47,7 @@ def _download(self, from_info, to_file, name=None, no_progress_bar=False): | |||
disable=no_progress_bar, | |||
) as pbar: | |||
with open(to_file, "wb") as fd: | |||
for chunk in request.iter_content(chunk_size=self.CHUNK_SIZE): | |||
for chunk in response.iter_content(chunk_size=self.CHUNK_SIZE): | |||
fd.write(chunk) | |||
fd.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need flush. It is probably a remnant from the time we did http resume.
tests/unit/remote/test_http.py
Outdated
class ErrorStatusRequestHandler(BaseHTTPRequestHandler): | ||
def do_GET(self): | ||
self.send_response(404, message="Not found") | ||
self.end_headers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need it? Can we simply ask for some missing url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, 404 is quite easy to generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question now is, should we leave changes made to StaticFileServer.
@Suor what do you think? I think they should stay, as the previous version was mapping string to its handler class, which was unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cleaner the new way.
tests/unit/remote/test_http.py
Outdated
remote = RemoteHTTP(dvc_repo, config) | ||
|
||
with pytest.raises(HTTPError): | ||
remote._download(URLInfo(url) / "file.txt", "file.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use "missing.txt"
to make it more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
thanks, @pared ! looks good to me) |
@Suor Please take a look 🙂 |
dvc/remote/http.py
Outdated
request = self._request("GET", from_info.url, stream=True) | ||
response = self._request("GET", from_info.url, stream=True) | ||
if response.status_code != 200: | ||
raise HTTPError(response.status_code, response.reason) | ||
with Tqdm( | ||
total=None if no_progress_bar else self._content_length(from_info), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass url (path_info) and not response then you always make a second request. _content_length()
should stop accepting urls and might be even inlined, this will prevent this inefficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
dvc/remote/http.py
Outdated
self._request("HEAD", url_or_request).headers, | ||
) | ||
def _content_length(self, response): | ||
headers = getattr(response, "headers", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this getattr()
anymore, simply request.headers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-Authored-By: Alexander Schepanovski <[email protected]>
❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addresses. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Fixes #2510