-
Notifications
You must be signed in to change notification settings - Fork 1.2k
dvcignore: Fix incorrect ignored output computation #4986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvcignore: Fix incorrect ignored output computation #4986
Conversation
c24980a
to
0d9ca7b
Compare
@karajan1001 Could you please take a look? π |
@anotherbugmaster
Of course, I'm a little busy today and will do this tomorrow. |
@@ -329,10 +329,14 @@ def run_copy(tmp_dir, dvc): | |||
) | |||
|
|||
def run_copy(src, dst, **run_kwargs): | |||
wdir = pathlib.Path(run_kwargs.get("wdir", ".")) | |||
wdir = pathlib.Path("../" * len(wdir.parts)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I didn't understand here. In these codes and the result,
We are inside a sub dir ( dvc_root/copy/
) running a script on the top of a DVC repo (dvc_root/copy.py
) I just can't find where did we change our working directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, but the question is that here
tmp_dir.gen({".dvcignore": "/*.log", "copy": {"foo": "foo content"}})
It seems we are inside tmp_dir
.
Then by calling
run_copy("foo", "foo.log", name="copy", wdir="copy")
wdir
equals to ..
and the parameter of the following dvc.run
are
cmd=f"python ../copy.py foo foo.log",
outs=["foo.log"],
deps=["foo", f"../copy"],
Here it seems that we are inside dir tmp_dir/copy
, I just didn't found where had we changed our working dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I got you. run_kwargs
contains wdir
key, so when we execute dvc.run
, the params are following:
cmd=f"python ../copy.py foo foo.log",
outs=["foo.log"],
deps=["foo", f"../copy"],
wdir="copy",
Oh, sorry. I looked into it yesterday but forgot to |
Sorry, for the late response, this issue is more complex than I used to think. Actually, there are two issues with it.
Two wrongs adding up to one right, it passed all of our tests. But now they make a deadlock in merging, fix any of them would cause failures in some tests. Any way your dvcignore refactoring is great, they are more elegant than my old ones, but we shouldn't change the logic inside. In my opinion, we can keep the relative path fixing and the dvcignore refactoring, but restore to the original dvcignore logic. Then find some way to merge these two PRs. @efiop |
@karajan1001 Great research! Thank you so much for looking into this! π There are two options:
|
@@ -340,7 +350,11 @@ def check_ignore(self, target): | |||
def is_ignored(self, path): | |||
# NOTE: can't use self.check_ignore(path).match for now, see | |||
# https://github.com/iterative/dvc/issues/4555 | |||
return self.is_ignored_dir(path) or self.is_ignored_file(path) | |||
if os.path.isfile(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see us using os.path
in a few other places instead of using the self.tree
. Might be a potential bug in non-local trees (e.g. GitTree). Need to double check that, this PR might worsen it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, these two ifs
are clearly an optimization, right? For use in OutputBase
or elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see us using
os.path
in a few other places instead of using theself.tree
. Might be a potential bug in non-local trees (e.g. GitTree). Need to double check that, this PR might worsen it.
Yes, but the question is that our self.tree.isfile
and self.tree.isdir
rely on dvcignore
itself. If we use self.tree
here, they would become circular dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, these two
ifs
are clearly an optimization, right? For use inOutputBase
or elsewhere?
The original one is a bug, a file might match a dir-only pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop @karajan1001, maybe we need to get back to using use_dvcignore
flag?
tests/func/test_ignore.py
Outdated
if raise_error: | ||
with pytest.raises(OutputIsIgnoredError): | ||
run_copy("foo", "foo.log", name="copy") | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to parametrize this test if we need if
condition. We can split it into two tests, or go with
try/catch
block, and verifying type of exception thrown, that way we don't have to put condition depending on parameters.
@anotherbugmaster @karajan1001 Hi guys! Thanks for working on this! Just wanted to check, what's the current status of this PR? |
Currently, two PR had been merged, but this PR had changed the logic of I used to believe that @anotherbugmaster would do the following work? If he didn't have time, I can do it myself. |
Yeah, I'm going to work on it on Wednesday |
@anotherbugmaster |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@karajan1001, could you please rebase? The test failures should be fixed. |
Unignored patterns were shown in the dvc status output and caused Error. Also made a bit of a refactoring in `matches` and `ignore`.
Not sure if I follow, but tests assume that unignored files should trigger `check-ignore`. What was the intention?
1. use is_ignored to judge DVC ignore status. 2. use check_ignore to show ignore patterns.
a80de64
to
0e2f97b
Compare
Sorry guys, overestimated myself :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @karajan1001 and @anotherbugmaster ! π
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Fixes #4985
_validate_output_path
dvc check-ignore
, when it lists unignored files as ignored and returns non-zero exit code