Skip to content

repo: Support streaming and pulling files on RepoTree/DvcTree.open() #3810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 18, 2020

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented May 18, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

  • RepoTree now supports all BaseTree methods
  • Adds support for fetching or streaming DVC outs directly from remote on RepoTree/DvcTree.open().
    • If an out is already in local cache, it will always be opened from cache.
    • For outs which are not in local cache, if DvcTree was initialized with fetch=True or stream=True, the out will be fetched or streamed, otherwise an error will be raised
  • Repo.open_by_relpath() now uses RepoTree.open()
  • RepoTree/DvcTree.walk() now support walking dir contents for dir outs
    • If tree was initialized with fetch=True or stream=True, tree will pull dir cache for dir outs on walk when needed
    • Otherwise, dir contents will not be walked (same as existing behavior)

Related to #3811.

@pmrowla pmrowla self-assigned this May 18, 2020
if out.is_dir_checksum and (self.fetch or self.stream):
# will pull dir cache if needed
with self.repo.state:
cache = out.collect_used_dir_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slow to collect. You only need out.dir_cache, so maybe it is worth creating extracting some method to fetch .dir for it into an out method. But I'm ok with keeping it as is for now.

@efiop
Copy link
Contributor

efiop commented May 18, 2020

Ok, looks great. We are on the right track. Merging.

@efiop efiop merged commit afbb3a0 into iterative:master May 18, 2020
@pmrowla pmrowla deleted the repotree-streaming branch May 19, 2020 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants