-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix get-url not fetching from url that does not support HEAD request #4646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bug description: 1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'. 2, operation: python setup.py install Traceback (most recent call last): File "setup.py", line 149, in <module> long_description=open("README.rst", "r").read(), UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence
Hi, @hl-a-k. Thanks for contributing. π π It'd be helpful if you could provide more information regarding the PR/fix in the description above. Thanks, we'll review it soon. |
dvc/repo/get_url.py
Outdated
if in_repo: | ||
dep.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unfortunate issue with not calling save
is that our exceptions in the logic down below are not well unified, so it won't present a nice error when, for example, the source doesn't exist π I.e. download
will trigger a method in one of the Tree classes (in dvc/tree/), and they all might throw a different backend-specific error, which is pretty ugly π We are on our way to unifying tree exceptions, but we are not there yet.
Might be missing something, but the only reason we do this save()
call here is to check if the source exists. If we replace this with if not dep.exists(): raise DependencyDoesNotExistError(...)
, that should do the trick for now
Also, if I understand correctly, you've introduced this flag to avoid changing the API behavior, right? So like for a backward compatibility? If so, we could totally simply switch to the new behaviour and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. By introducing this flag, we can keep the original logic unaffected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. So let's get rid of it and just replace
if in_repo:
dep.save()
with
if not dep.exists:
raise dep.DoesNotExistError(dep)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes committed. Seems nothing happened. ^_^
@@ -147,7 +147,7 @@ def run(self): | |||
name="dvc", | |||
version=version, | |||
description="Git for data scientists - manage your code and data together", | |||
long_description=open("README.rst", "r").read(), | |||
long_description=open("README.rst", "r", encoding="UTF-8").read(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice catch!
@hl-a-k Thank you for your patience and contribution π A few comments in addition to the ones above:
Thank you! π Btw, the reason we didn't reply earlier is that we were waiting for the PR description update and when you did edit it, github didn't send a notification (it doesn't for message edits, so that's expected). Please feel free to leave a comment in such cases next time, we'll be sure to get to it ASAP. π |
|
I have completed the changes. |
@hl-a-k Looks like you didn't add |
Sorry, I don't use |
@@ -16,6 +16,7 @@ def get_url(url, out=None): | |||
|
|||
(dep,) = dependency.loads_from(None, [url]) | |||
(out,) = output.loads_from(None, [out], use_cache=False) | |||
dep.save() | |||
if not dep.exists: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I totally forgot that the issue itself was about exist
not working for some urls π So this won't actually solve it. My mistake, sorry for the confusion.
Ok, let me see if the exception approach mentioned above is feasible to implement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, this approach work for #4251 . I have tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the current situationοΌ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get to research this yet. Need to confirm this and we'll merge if it is indeed fixed. Looks good as is. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hl-a-k Thanks for your patience! Indeed, it was the get_hash
that was causing that issue, so the current solution works like charm. Thank you! π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! π
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Fix #4251
The reason we do this
save()
call here is just to check if the source exists. So I replace this withif in_repo: dep.save()
By the way, fixed a minor bug.
bug description:
1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'.
2, operation:
python setup.py install
Traceback (most recent call last):
File "setup.py", line 149, in
long_description=open("README.rst", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence