Skip to content

fix get-url not fetching from url that does not support HEAD request #4646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 12, 2020

Conversation

hl-a-k
Copy link
Contributor

@hl-a-k hl-a-k commented Sep 30, 2020

Fix #4251
The reason we do this save() call here is just to check if the source exists. So I replace this with
if in_repo: dep.save()
By the way, fixed a minor bug.
bug description:
1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'.
2, operation:
python setup.py install
Traceback (most recent call last):
File "setup.py", line 149, in
long_description=open("README.rst", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence

zwh added 3 commits September 30, 2020 15:14
bug description:
1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'.
2, operation:
python setup.py install
Traceback (most recent call last):
  File "setup.py", line 149, in <module>
    long_description=open("README.rst", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence
@skshetry
Copy link
Collaborator

skshetry commented Sep 30, 2020

Hi, @hl-a-k. Thanks for contributing. πŸŽ‰ πŸ™‚

It'd be helpful if you could provide more information regarding the PR/fix in the description above. Thanks, we'll review it soon.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Oct 3, 2020
Comment on lines 19 to 20
if in_repo:
dep.save()
Copy link
Contributor

@efiop efiop Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unfortunate issue with not calling save is that our exceptions in the logic down below are not well unified, so it won't present a nice error when, for example, the source doesn't exist πŸ™ I.e. download will trigger a method in one of the Tree classes (in dvc/tree/), and they all might throw a different backend-specific error, which is pretty ugly πŸ™ We are on our way to unifying tree exceptions, but we are not there yet.

Might be missing something, but the only reason we do this save() call here is to check if the source exists. If we replace this with if not dep.exists(): raise DependencyDoesNotExistError(...), that should do the trick for now

Also, if I understand correctly, you've introduced this flag to avoid changing the API behavior, right? So like for a backward compatibility? If so, we could totally simply switch to the new behaviour and remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. By introducing this flag, we can keep the original logic unaffected.

Copy link
Contributor

@efiop efiop Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. So let's get rid of it and just replace

if in_repo:
    dep.save()

with

if not dep.exists:
    raise dep.DoesNotExistError(dep)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes committed. Seems nothing happened. ^_^

@@ -147,7 +147,7 @@ def run(self):
name="dvc",
version=version,
description="Git for data scientists - manage your code and data together",
long_description=open("README.rst", "r").read(),
long_description=open("README.rst", "r", encoding="UTF-8").read(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch!

@efiop
Copy link
Contributor

efiop commented Oct 3, 2020

@hl-a-k Thank you for your patience and contribution πŸ™‚ A few comments in addition to the ones above:

  1. Looks like github is not able to associate your commits with your github profile. Please go to https://github.com/settings/emails and add the email used in your commits.

  2. Please adjust the PR title to something more meaningful.

  3. Please use Fix #issue-number (e.g. Fix get-url should just GET filesΒ #4251) in your PR description, so github will automatically close the issue when your PR is merged.

Thank you! πŸ™

Btw, the reason we didn't reply earlier is that we were waiting for the PR description update and when you did edit it, github didn't send a notification (it doesn't for message edits, so that's expected). Please feel free to leave a comment in such cases next time, we'll be sure to get to it ASAP. πŸ™‚

@hl-a-k hl-a-k changed the title #4251 bug fixed Fixed the bug #4251 by a compatible way, and fixed a minor bug which causes of default character. Oct 3, 2020
@hl-a-k hl-a-k changed the title Fixed the bug #4251 by a compatible way, and fixed a minor bug which causes of default character. Fixed the bug #4251 by a backward compatible way, and fixed a minor bug which causes of default character. Oct 3, 2020
@hl-a-k hl-a-k changed the title Fixed the bug #4251 by a backward compatible way, and fixed a minor bug which causes of default character. Fixed the bug #4251 by a backward compatible way, and fixed a minor bug which causes of OS default character. Oct 3, 2020
@hl-a-k
Copy link
Contributor Author

hl-a-k commented Oct 3, 2020

@hl-a-k Thank you for your patience and contribution πŸ™‚ A few comments in addition to the ones above:

  1. Looks like github is not able to associate your commits with your github profile. Please go to https://github.com/settings/emails and add the email used in your commits.
  2. Please adjust the PR title to something more meaningful.
  3. Please use Fix #issue-number (e.g. Fix get-url should just GET filesΒ #4251) in your PR description, so github will automatically close the issue when your PR is merged.

Thank you! πŸ™

Btw, the reason we didn't reply earlier is that we were waiting for the PR description update and when you did edit it, github didn't send a notification (it doesn't for message edits, so that's expected). Please feel free to leave a comment in such cases next time, we'll be sure to get to it ASAP. πŸ™‚

@hl-a-k
Copy link
Contributor Author

hl-a-k commented Oct 3, 2020

I have completed the changes.

@efiop
Copy link
Contributor

efiop commented Oct 3, 2020

@hl-a-k Looks like you didn't add [email protected](used in your commits) to https://github.com/settings/emails . If you did, then maybe it is just github that needs some time to reparse this πŸ™‚

hl-a-k added 3 commits October 3, 2020 12:37
We can check if the source exists by using `dep.exists`
@hl-a-k hl-a-k changed the title Fixed the bug #4251 by a backward compatible way, and fixed a minor bug which causes of OS default character. Fixed the bug #4251, and fixed a minor bug which causes of OS default character. Oct 3, 2020
@hl-a-k
Copy link
Contributor Author

hl-a-k commented Oct 3, 2020

@hl-a-k Looks like you didn't add [email protected](used in your commits) to https://github.com/settings/emails . If you did, then maybe it is just github that needs some time to reparse this πŸ™‚

Sorry, I don't use [email protected] anymore. I have updated my git configuration now.

@@ -16,6 +16,7 @@ def get_url(url, out=None):

(dep,) = dependency.loads_from(None, [url])
(out,) = output.loads_from(None, [out], use_cache=False)
dep.save()
if not dep.exists:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I totally forgot that the issue itself was about exist not working for some urls πŸ™ So this won't actually solve it. My mistake, sorry for the confusion.

Ok, let me see if the exception approach mentioned above is feasible to implement...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, this approach work for #4251 . I have tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the current situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get to research this yet. Need to confirm this and we'll merge if it is indeed fixed. Looks good as is. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hl-a-k Thanks for your patience! Indeed, it was the get_hash that was causing that issue, so the current solution works like charm. Thank you! πŸ™

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! πŸš€

@efiop efiop merged commit 59105ec into iterative:master Oct 12, 2020
@efiop efiop added bugfix fixes bug and removed awaiting response we are waiting for your reply, please respond! :) labels Oct 12, 2020
@skshetry skshetry changed the title Fixed the bug #4251, and fixed a minor bug which causes of OS default character. fix get-url not fetching from url that does not support HEAD request Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get-url should just GET files
3 participants