Skip to content

dvc: support .dvcignore #1499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
efiop opened this issue Jan 14, 2019 · 7 comments · Fixed by #1820
Closed

dvc: support .dvcignore #1499

efiop opened this issue Jan 14, 2019 · 7 comments · Fixed by #1820
Assignees
Labels
enhancement Enhances DVC feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@efiop
Copy link
Contributor

efiop commented Jan 14, 2019

Same as .gitignore, but for dvc. It will tell dvc which paths to ignore when caching data. Would be extremely useful for ignoring auto-generated/temporary files/directories that are created as a side effect and don't carry anything useful.

First iteration should make patterns listed in dvcignore be ignored on Repo.stages() when collecting stages.

echo '/directory_with_millions_of_files' > .dvcignore
dvc status # should not enter `directory_with_millons_of_files`

Second iteration should support dvc add/run. We have a separate issue for it at #1876

@efiop efiop added enhancement Enhances DVC feature request Requesting a new feature labels Jan 14, 2019
@dmpetrov
Copy link
Member

dmpetrov commented Jan 14, 2019

@efiop could you please clarify a bit more? Is it for users mostly or for internal DVC usage? Should we include .dvc/ into .dvcignore in the case of internal usage?

@efiop
Copy link
Contributor Author

efiop commented Jan 14, 2019

It is for users mostly as a convenient way to ignore some paths/files/dirs from being cached/tracked by dvc. Good point about .dvc/ though, we will act as if it is in .dvcignore by default :)

@ghost
Copy link

ghost commented Jan 14, 2019

Is this somehow related #1471 ?

I saw on the chat that the problem was that the MD5 of a directory changed because the file system created a .DS_Store on it and detected it as changed.

I can see the same stuff happening for other things if you are not careful enough (e.g. vim swap files, file locking mechanisms that create dotfiles, IDE specific files / .tags, etc.)

If we are introducing .dvcignore it would also be great to have a global .dvcignore the same way you can have a global .gitignore

@efiop
Copy link
Contributor Author

efiop commented Jan 14, 2019

@MrOutis Not related to #1471 , it is exclusively about dvc tracking files.

Great idea about a global .dvcignore! I didn't know .gitignore could be global.

@mhham
Copy link
Contributor

mhham commented Feb 1, 2019

I confirm that .DS_Store files are a pain when dvc adding directories on mac Os, and completely agree with the need of a .dvcignore file.

@alguevara7
Copy link

alguevara7 commented Mar 12, 2019

I think an equally pressing issue is that having a large (4 million files) un-cached folder slows down dvc, as it needs to traverse the whole folder before executing any command.

Adding support for .dvcignore would add the required capability to address this issue.

@shcheklein
Copy link
Member

Please, create a ticket or a page to document the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants