Skip to content

dvcignore: Fix incorrect ignored output computation #4986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

anotherbugmaster
Copy link
Contributor

@anotherbugmaster anotherbugmaster commented Nov 28, 2020

Fixes #4985

  • Failing test created
  • Fixed _validate_output_path
  • Fix awkward behavior of dvc check-ignore, when it lists unignored files as ignored and returns non-zero exit code
  • More tests

@anotherbugmaster anotherbugmaster force-pushed the 4985_dvcignore_incorrect_outputs branch 6 times, most recently from c24980a to 0d9ca7b Compare November 29, 2020 11:06
@anotherbugmaster anotherbugmaster marked this pull request as ready for review November 29, 2020 14:02
@efiop efiop requested a review from karajan1001 November 29, 2020 17:46
@efiop
Copy link
Contributor

efiop commented Nov 29, 2020

@karajan1001 Could you please take a look? πŸ™

@karajan1001
Copy link
Contributor

@anotherbugmaster
Thanks for your contribution, Looks pretty cool.

@efiop

@karajan1001 Could you please take a look?

Of course, I'm a little busy today and will do this tomorrow.

@@ -329,10 +329,14 @@ def run_copy(tmp_dir, dvc):
)

def run_copy(src, dst, **run_kwargs):
wdir = pathlib.Path(run_kwargs.get("wdir", "."))
wdir = pathlib.Path("../" * len(wdir.parts))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I didn't understand here. In these codes and the result,
We are inside a sub dir ( dvc_root/copy/ ) running a script on the top of a DVC repo (dvc_root/copy.py) I just can't find where did we change our working directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@karajan1001 karajan1001 Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but the question is that here

    tmp_dir.gen({".dvcignore": "/*.log", "copy": {"foo": "foo content"}})

It seems we are inside tmp_dir.
Then by calling

run_copy("foo", "foo.log", name="copy", wdir="copy")

wdir equals to ..
and the parameter of the following dvc.run are

            cmd=f"python ../copy.py foo foo.log",
            outs=["foo.log"],
            deps=["foo", f"../copy"],

Here it seems that we are inside dir tmp_dir/copy, I just didn't found where had we changed our working dir?

By the way, so strange I can't @ you
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I got you. run_kwargs contains wdir key, so when we execute dvc.run, the params are following:

	cmd=f"python ../copy.py foo foo.log",
	outs=["foo.log"],
	deps=["foo", f"../copy"],
	wdir="copy",

@karajan1001
Copy link
Contributor

Oh, sorry. I looked into it yesterday but forgot to finish my review. In this case, the comments can only be seen by myself.

@karajan1001
Copy link
Contributor

karajan1001 commented Dec 11, 2020

@anotherbugmaster

Sorry, for the late response, this issue is more complex than I used to think.

Actually, there are two issues with it.

First is this one dvc outputs computes a wrong relative path.
Second is the #5046, we didn't use a dvcignore correctly. We use check_ignore which should only be used in dvc check-ignore cmd instead is_ignored.

Two wrongs adding up to one right, it passed all of our tests. But now they make a deadlock in merging, fix any of them would cause failures in some tests.

Any way your dvcignore refactoring is great, they are more elegant than my old ones, but we shouldn't change the logic inside. In my opinion, we can keep the relative path fixing and the dvcignore refactoring, but restore to the original dvcignore logic. Then find some way to merge these two PRs.

@efiop
Excuse me, how can we solve this deadlock, Only merging simultaneously can they prevent tests fails. Or first, merge #5070 into this one, and then merge into the master branch?

@pared pared self-requested a review December 11, 2020 14:52
@efiop
Copy link
Contributor

efiop commented Dec 11, 2020

@karajan1001 Great research! Thank you so much for looking into this! πŸ™

There are two options:

  1. You have maintainer rights, so you could clone @anotherbugmaster 's repo and apply your patch on top of his and push into his branch. Here is how I usually do it:
mkdir anotherbugmaster
cd anotherbugmaster
git clone [email protected]:anotherbugmaster/dvc
cd dvc
git checkout 4985_dvcignore_incorrect_outputs
# do the necessary changes and commit them
dvc push
  1. If you approve this PR as is, we could merge both PRs at the same time into master.

@karajan1001
Copy link
Contributor

@anotherbugmaster

Sorry for the late response, I got some unexpected events over the last weekend. Now I had merged my branch into this one. I had closed #5070, and relinked #5046 to this PR. We can continue this work now.

@karajan1001 karajan1001 linked an issue Dec 14, 2020 that may be closed by this pull request
@@ -340,7 +350,11 @@ def check_ignore(self, target):
def is_ignored(self, path):
# NOTE: can't use self.check_ignore(path).match for now, see
# https://github.com/iterative/dvc/issues/4555
return self.is_ignored_dir(path) or self.is_ignored_file(path)
if os.path.isfile(path):
Copy link
Contributor

@efiop efiop Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see us using os.path in a few other places instead of using the self.tree. Might be a potential bug in non-local trees (e.g. GitTree). Need to double check that, this PR might worsen it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these two ifs are clearly an optimization, right? For use in OutputBase or elsewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see us using os.path in a few other places instead of using the self.tree. Might be a potential bug in non-local trees (e.g. GitTree). Need to double check that, this PR might worsen it.

Yes, but the question is that our self.tree.isfile and self.tree.isdir rely on dvcignore itself. If we use self.tree here, they would become circular dependent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, these two ifs are clearly an optimization, right? For use in OutputBase or elsewhere?

The original one is a bug, a file might match a dir-only pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop @karajan1001, maybe we need to get back to using use_dvcignore flag?

Comment on lines 416 to 419
if raise_error:
with pytest.raises(OutputIsIgnoredError):
run_copy("foo", "foo.log", name="copy")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to parametrize this test if we need if condition. We can split it into two tests, or go with
try/catch block, and verifying type of exception thrown, that way we don't have to put condition depending on parameters.

@efiop
Copy link
Contributor

efiop commented Dec 20, 2020

@anotherbugmaster @karajan1001 Hi guys! Thanks for working on this! Just wanted to check, what's the current status of this PR?

@karajan1001
Copy link
Contributor

@anotherbugmaster @karajan1001 Hi guys! Thanks for working on this! Just wanted to check, what's the current status of this PR?

Currently, two PR had been merged, but this PR had changed the logic of check-ignore and also the tests of it. we need to revert to the original logic and tests. (Didn't recommend keeping the current logic, and change the documents on dvc.org, because it is not related to these two issues.)

I used to believe that @anotherbugmaster would do the following work? If he didn't have time, I can do it myself.

@anotherbugmaster
Copy link
Contributor Author

Yeah, I'm going to work on it on Wednesday

@karajan1001
Copy link
Contributor

karajan1001 commented Jan 14, 2021

@anotherbugmaster
I'm sorry, this issue causes some severe issues #5256. Can't wait for your work and I had solved it myself, thanks again for your contributions.

@skshetry skshetry assigned skshetry and karajan1001 and unassigned skshetry Jan 14, 2021
@skshetry

This comment has been minimized.

@mergify

This comment has been minimized.

@skshetry
Copy link
Collaborator

@karajan1001, could you please rebase? The test failures should be fixed.

anotherbugmaster and others added 8 commits January 20, 2021 17:56
Unignored patterns were shown in the dvc status output and caused Error.

Also made a bit of a refactoring in `matches` and `ignore`.
Not sure if I follow, but tests assume that unignored files should trigger `check-ignore`. What was the intention?
1. use is_ignored to judge DVC ignore status.
2. use check_ignore to show ignore patterns.
@karajan1001 karajan1001 force-pushed the 4985_dvcignore_incorrect_outputs branch from a80de64 to 0e2f97b Compare January 20, 2021 09:58
@karajan1001 karajan1001 self-requested a review January 20, 2021 10:22
@karajan1001 karajan1001 requested a review from pared January 20, 2021 10:34
@anotherbugmaster
Copy link
Contributor Author

Sorry guys, overestimated myself :D

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @karajan1001 and @anotherbugmaster ! πŸ™

@efiop efiop merged commit 60d83a7 into iterative:master Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dvc ignore "!" syntax different behavior from git DVC incorrectly ignores outputs
5 participants