Skip to content

dvc get from subdir #2847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
dmpetrov opened this issue Nov 26, 2019 · 5 comments
Closed
1 task

dvc get from subdir #2847

dmpetrov opened this issue Nov 26, 2019 · 5 comments
Labels
enhancement Enhances DVC feature request Requesting a new feature

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Nov 26, 2019

$ dvc -V
0.70.0+38be14

Regular pull works just fine for subdirs:

$ git clone https://github.com/dmpetrov/dataset
$ cd dataset
$ dvc pull dir1.dvc
$ ls dir1
file1 file2

dvc get does not work:

$ dvc get https://github.com/dmpetrov/dataset dir1/file1
ERROR: failed to get 'dir1/file1' from 'https://github.com/dmpetrov/dataset' - Output 'dir1/file1' not found in target repository 'https://github.com/dmpetrov/dataset'


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

This is quite a basic scenario - high priority bug.

  • Please make sure it covered by unit tests

EDITED: It worked before - regression.

@dmpetrov dmpetrov added bug Did we break something? p0-critical labels Nov 26, 2019
@dmpetrov
Copy link
Member Author

It seems like it is not a regression. It fails only on data dirs, not in data files in a regular dir (which worked):

$ dvc get https://github.com/dmpetrov/dataset mydir/myfile

I'm removing high-pri tag.

@shcheklein
Copy link
Member

@dmpetrov is dataset public to reproduce? It looks like I don't have access to it.

@shcheklein
Copy link
Member

If I get it right.

Yes, I think it was never implemented. DVC in general never provided file-level granularity for directories. E.g. you can't dvc pull dir1/file1 you have to pull the whole dir1. More or less for the same reason dvc get operates with the same assumptions - directory is a single manageable entity, you can't split it.

So, I would consider this as a feature request. That makes sense to me, even though would be the first time we go to intra-directory level of data management with DVC.

Initial implementation would be inefficient due to the reasons I described - we don't have a mean to fetch into cache a single file for a directory.

Also, should we support subdirectories? wildcards? any other considerations?

@dmpetrov dmpetrov changed the title dvc get from subdir got broken dvc get from subdir is broken Nov 26, 2019
@dmpetrov dmpetrov added enhancement Enhances DVC and removed bug Did we break something? labels Nov 26, 2019
@dmpetrov dmpetrov changed the title dvc get from subdir is broken dvc get from subdir Nov 26, 2019
@efiop efiop added the feature request Requesting a new feature label Nov 26, 2019
@efiop
Copy link
Contributor

efiop commented Nov 26, 2019

Related #2458

@efiop
Copy link
Contributor

efiop commented Nov 26, 2019

Closing as a duplicate of #2458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

3 participants