-
Notifications
You must be signed in to change notification settings - Fork 1.2k
LocalRemoteTree: use repo tree as work_tree with local outputs #4125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -173,3 +173,27 @@ def test_ignore_blank_line(tmp_dir, dvc): | |||
tmp_dir.gen(DvcIgnore.DVCIGNORE_FILE, "foo\n\ndir/ignored") | |||
|
|||
assert _files_set("dir", dvc.tree) == {"dir/other"} | |||
|
|||
|
|||
def test_ignore_in_added_dir(tmp_dir, dvc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider making test for LocalRemoteTree only, but I guess, in the end, we want to be sure that one can actually ignore something in added directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the real underlying issue in #4110 is that we still use "remotes" in DVC outputs rather than purely using trees.
This causes problems for LocalOutput, since we treat regular local DVC output paths the same way as a local external dependency (we handle them both using LocalRemoteTree) even though regular output paths should be repo tree paths rather than remote paths.
So for a regular dvc add
ed dir output, right now when we run
def save_info(self):
return self.remote.save_info(self.path_info)
we eventually walk the LocalRemoteTree to determine what files should go in dir cache and generate our dir hash. This is what causes the bug w/not using DVC ignore for that directory, and why wrapping the remote work tree w/cleantree fixes the bug.
I think what we should really be doing is treating regular outputs separately from local external dependencies.
We do want to use CleanTree for regular outputs, since we are dealing with actual DVC repo paths (I'm not sure whether that means we use a RepoTree or wrapping LocalRemoteTree work tree w/CleanTree for regular outputs).
But for other types of local "remote" paths (including external dependencies) we should not need to use CleanTree.
dvc/ignore.py
Outdated
@@ -216,7 +216,7 @@ def _parents_exist(self, path): | |||
[os.path.abspath(path), self.tree_root] | |||
) | |||
): | |||
return False | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as a temporary solution, but it still seems unintuitive.
It seems to me that dvcignore should only ever apply to a DVC repo, and by definition anything outside the repo root directory should not exist in the clean tree.
dvc/remote/local.py
Outdated
@@ -66,7 +67,7 @@ def work_tree(self): | |||
# GitTree arbitarily. When repo.tree is GitTree, local cache needs to | |||
# use its own WorkingTree instance. | |||
if self.repo: | |||
return WorkingTree(self.repo.root_dir) | |||
return CleanTree(WorkingTree(self.repo.root_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, this works as part of a temporary fix but I'm not sure it makes sense as a real solution.
For local remotes, and DVC local cache we should need to wrap the work tree with CleanTree. With a local remote, it should only contain files which have been pushed to it, and we should not care about filtering/ignoring anything inside the remote. Likewise, for cache, we should not care about filtering anything inside .dvc/cache
.
The main point of this tree refactor was that local remotes and local cache should only be dealing with paths inside the actual remote or cache. We run into issues when we start mixing remote/cache/repo tree paths and treating them all as just "local filesystem paths".
These changes do fix the user issue, and CleanTree used to allow paths outside the repo root before the remote tree refactoring, so I'd be ok with merging this as a bug fix for now. But long term I think we should keep the other things I mentioned in mind. |
e311fa1
to
8000d80
Compare
8000d80
to
ce61e28
Compare
@pmrowla I tried to make it less hackish behavior, and determine what tree should be used on I do believe that current behavior is still hackish.
|
dvc/output/local.py
Outdated
@cached_property | ||
def remote(self): | ||
if self._remote: | ||
return self._remote | ||
|
||
work_tree = None | ||
if self.repo and self.is_in_repo and is_working_tree(self.repo.tree): | ||
work_tree = self.repo.tree | ||
tree = LocalRemoteTree(self.repo, {}, work_tree) | ||
return LocalRemote(tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this remote setting in __init__
? Just feels a bit weird introducing cached property & _remote mix in both here and base class for this hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but in that case, we will set remote inside parent constructor and override it here. Thats what I wanted to avoid creating cached_property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared but we can set it after super()
, no? Or pass the new instance to the constructor.
Had a similar issue with config and it really hurts π Thinking that maybe we should tackle #4050 right away and get rid of that is_working_tree mess once and for all. Otherwise, we might be introducing much more hidden issues than we are solving (though in this case this is clearly very serious). Let's discuss this during planning today, maybe someone will be able to look into this part of trees deeply ASAP. If not, we'll have to merge the hack, indeed. π |
ce61e28
to
aaea4e2
Compare
aaea4e2
to
99ad242
Compare
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
EDIT:
Fixes #4110
Fixes #4197