Skip to content

dvc: use dvcignore when adding data with dvc add/run #1876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
efiop opened this issue Apr 11, 2019 · 7 comments · Fixed by #2161
Closed

dvc: use dvcignore when adding data with dvc add/run #1876

efiop opened this issue Apr 11, 2019 · 7 comments · Fixed by #2161
Assignees
Labels
enhancement Enhances DVC feature request Requesting a new feature p1-important Important, aka current backlog of things to do

Comments

@efiop
Copy link
Contributor

efiop commented Apr 11, 2019

echo 'dir/foo' > .dvcignore
dvc add dir # should ignore foo and not take it into account when computing mtime and size
@efiop efiop added c13-half-a-week enhancement Enhances DVC feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Apr 11, 2019
@pared
Copy link
Contributor

pared commented Apr 23, 2019

Note about get_mtime_and_size:
Initial idea and implementation (avialable at https://github.com/pared/dvc/tree/1499_original) was to calculate dict_md5 of map(path : mtime) for each file in directory. That way we could detect new files and files modification, yet still "dvcignore" files.

@PeterFogh
Copy link
Contributor

HI, This issue relates to my question regarding dvcignore from this Discord message.

Can it be used for ignoring .pyc files in a folder which a DVC file is dependent on?

I more details my use case is about having a DVC stage depending on a folder containing python code. When executing the stage the python code in the folder will produce the "pycache" folder with the ".pyc" files. However, when reproducing the DVC stage these ".pyc" files may change even if the python code did not change, this forces DVC stage to be reproduced when it is not needed.

Therefore, I think the dvcignore feature should exclude these ".pyc" files when computing the DVC stage dependency checksum, if the .dvcignore file defines the exclusion of the "pycache" folder or "*.pyc" files.

@pared
Copy link
Contributor

pared commented May 29, 2019

Well then, I guess we should get to spreading dvc ignore to all usages of walk in project. We will need to revisit get_mtime_and_size for directories and come up with a way of calculating dir checksum that will not be updated by ignored files/dirs (mostly mtime will be a problem).

@nik123
Copy link
Contributor

nik123 commented Jun 6, 2019

I've been sent here by one of the developers from this conversation on StackOverflow. In general I really need dvc status command to ignore .DS_Store files automatically created by MacOS.

However I'm not sure if my request is a part of this issue or that should be a separate feature request.

@efiop
Copy link
Contributor Author

efiop commented Jun 6, 2019

Hi @nik123 !

It is indeed part of this issue, so just hang on tight! 🙂

@efiop efiop added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Jun 18, 2019
@efiop
Copy link
Contributor Author

efiop commented Jun 18, 2019

Related to #2010 , maybe worth switching from dulwich to pathspec along the way.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 19, 2019

@nik123 BTW in case you didn't notice DVC can now ignore .DS_Store as you suggested. See https://stackoverflow.com/a/58016907/761963.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants