Skip to content

project: Support .gitignore to exclude directories while searching for stage files #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jan 5, 2019 · 9 comments
Labels
feature request Requesting a new feature

Comments

@ghost
Copy link

ghost commented Jan 5, 2019

This prevents walking directories like node_modules.

Maybe this is not the way to speed up the stage-file search but it may help.
It could be more harmful than helpful, so I open this issue for discussion.

@ghost ghost added the feature request Requesting a new feature label Jan 5, 2019
@ghost
Copy link
Author

ghost commented Jan 5, 2019

Another strat would be to store every stage created in the state, but DVC would need a command to refresh or sync the state file with the current workspace (maybe is not worth trading the ease of usage for some speed)

@efiop
Copy link
Contributor

efiop commented Jan 6, 2019

Using state is a great idea. Introducing refresh or sync commands doesn't seem right though :). If we were to go with this strat, we could just save mtimes for directories and rescan them only if they've changed. This is the same as we track dep/out directories right now.

That being said, using .gitignore seems like a very reasonable, simple and elegant approach. Especially since it also covers directories that change frequently and won't require us to rescan it every time, like the state approach would.

@shcheklein
Copy link
Member

I like the idea with .gitignore!

I think we can parallelize the search (if it's not yet, of course :)) and cache mtimes indeed, that alone should make some difference.

@ghost ghost mentioned this issue Jan 14, 2019
@efiop
Copy link
Contributor

efiop commented Apr 4, 2019

Related to #1820 . Adding node_modules to .dvcignore would prevent dvc from searching for dvcfiles there.

It seems unlikely, but I could imagine someone adding dvcfiles to gitignore and yet want dvc to discover them. Using .dvcignore seems like a more direct and clear approach.

@ghost
Copy link
Author

ghost commented Apr 6, 2019

Indeed, @efiop , but it is like an extra step that the user needs to do.

There are a lot of people adding their virtual environments to docker images because they forgot or might not know about .dockerignore. Is not like a huge deal, in this case, searching through the virtual environment, node modules, or even the .git directory would add max a few seconds, maybe something unnoticeable.

@ghost
Copy link
Author

ghost commented Apr 6, 2019

Maybe we can include it in an "Advanced Users" guide, with information about reflinks, and more performance enhancements.

@efiop
Copy link
Contributor

efiop commented Apr 6, 2019

Indeed, @efiop , but it is like an extra step that the user needs to do.

@MrOutis True. If I remember correctly, in our .dvcignore implementation we have !path supported, which would allow to explicitly stop ignoring something. If we decide to go with the approach that you suggest, we could make that !path in dvcignore make dvc stop ignoring particular path even if it is in .gitignore. I'm just not sure if that approach is ok to make default. Need to give it a better thought...

@ghost
Copy link
Author

ghost commented Apr 7, 2019

Need to give it a better thought...

Agree, @efiop , let's leave this open and see how it goes with .dvcignore

@efiop
Copy link
Contributor

efiop commented Jul 23, 2019

Seems like .dvcignore does the job. Let's close this for now, I don't see any users requesting this specifically. If anyone will request it - please feel free to reopen.

@efiop efiop closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

2 participants