Skip to content

import: git sparse-checkout #3438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
casperdcl opened this issue Mar 4, 2020 · 5 comments
Closed

import: git sparse-checkout #3438

casperdcl opened this issue Mar 4, 2020 · 5 comments
Assignees
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks

Comments

@casperdcl
Copy link
Contributor

dvc import https://some/git/repo/ some_file should probably do a sparse checkout.

Currently looks like it does a full clone:

self.deps[0].download(self.outs[0])
would call
def download(self, to):
with self._make_repo() as repo:
if self.def_repo.get(self.PARAM_REV_LOCK) is None:
self.def_repo[self.PARAM_REV_LOCK] = repo.scm.get_rev()
if hasattr(repo, "cache"):
repo.cache.local.cache_dir = self.repo.cache.local.cache_dir
repo.pull_to(self.def_path, to.path_info)

On a related note, is the repo locally cached by default

if hasattr(repo, "cache"):
repo.cache.local.cache_dir = self.repo.cache.local.cache_dir
so that dvc update won't re-clone?

@ghost ghost added the triage Needs to be triaged label Mar 4, 2020
@shcheklein
Copy link
Member

@Suor you should have more info about his I think?

@efiop
Copy link
Contributor

efiop commented Mar 17, 2020

Related to #3473

@efiop efiop added the enhancement Enhances DVC label Mar 17, 2020
@ghost ghost removed the triage Needs to be triaged label Mar 17, 2020
@efiop efiop added performance improvement over resource / time consuming tasks triage Needs to be triaged labels Mar 17, 2020
@ghost ghost removed the triage Needs to be triaged label Mar 17, 2020
@efiop efiop added p2-medium Medium priority, should be done, but less important triage Needs to be triaged labels Mar 17, 2020
@ghost ghost removed the triage Needs to be triaged label Mar 17, 2020
@Suor
Copy link
Contributor

Suor commented Mar 17, 2020

Local caches only used within one dvc execution or one python session.

Sparse checkout or at least shallow checkouts might be used, this will complicate out external_repo() abstraction, including its caching. But looks like there is no way around.

@casperdcl
Copy link
Contributor Author

I think we had a discussion with @efiop and @shcheklein about which is more important to fix first (persistent repo cache or sparse checkout) and there was no clear outcome. Will open a new issue for persistent repo cache...

@skshetry
Copy link
Collaborator

We don't have a persistent repo checkout, but we do shallow clone these days. Closing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important performance improvement over resource / time consuming tasks
Projects
None yet
Development

No branches or pull requests

5 participants