Skip to content

dvc check-ignore command #4282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 3, 2020
Merged

dvc check-ignore command #4282

merged 17 commits into from
Aug 3, 2020

Conversation

karajan1001
Copy link
Contributor

@karajan1001 karajan1001 commented Jul 25, 2020

update some tests first
fix #3736

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

@karajan1001 karajan1001 changed the title Update some tests first dvc check-ignore command Jul 25, 2020
karajan1001 and others added 2 commits July 26, 2020 17:42
@efiop efiop requested a review from pared July 27, 2020 00:27
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Some notes, one change request (_get_normalize_path)

"file,ret,output",
[
("file", 0, "{}:1:f*\tfile\n".format(DvcIgnore.DVCIGNORE_FILE)),
("foo", 0, "{}:2:!foo\tfoo\n".format(DvcIgnore.DVCIGNORE_FILE)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we return 1 in this use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just tested it with git and it seems that even if you exclude file, 0 is returned.
According to doc 0 should be returned when given path is ignored. So I would say that this is a bug in git. @karajan1001 @efiop what do you think?

Copy link
Contributor Author

@karajan1001 karajan1001 Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared
This is what I would like to discuss.

Yes In Git, 0 should be returned when a given path is ignored. But 0 returns on any pattern match the file either included or excluded, 1 returns when none of the pattern match. (It might be a bug or a feature not clearly documented. I sent an email to Git last time but no reply)
But I felt illogical on this results. 0 for ignored and 1 for included or not matched might be more reasonable.

As a result. I left matches in

CheckIgnoreResult = namedtuple(
    "CheckIgnoreResult", ["file", "matches", "patterns"]
)

which could be used for marking whether it is ignored or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001
To me, it seems that either docs is wrong (because it says that 0 is returned if file is ignored) or implementation (which returns 0 upon any matching spec). I'll try to reach git-devel IRC, maybe we can get an answer faster.
For now, we can leave it as is.

Comment on lines +119 to +121
# skip system pattern
if not pattern.file_info:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't that result in no matches for .git and return 1 when we do check-ignore .git?
I guess in one way this is right behavior, since its not specified in any file, but, its also against tree.walk behavior which will ignore .git.
But I guess main point of check-ignore is to debug .dvcignores, so its fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed Git.
image
Maybe I can return

$ dvc check-ignore .git
::System pattern   .git
$ echo $?
0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's not too much trouble, sure. But if its, feel free to keep it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I think this is the same problem with the above one.
If 0 or 1 depends on whether the file is excluded, then we should return 0 here, but If it depends on whether any pattern in .dvcignore matches the file, then we should return 1 here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001 you mean the proper git return code?
To avoid unnecessary change requests, let's leave it as is.

@karajan1001
Copy link
Contributor Author

karajan1001 commented Jul 28, 2020

@pared
Something to be noticed, replacing

if (conditions): 
    return False

with

try: 
    ... 
except exceptions:
    return False

runs faster when conditions happens rarely, but it runs much slower when conditions is a common one. The exception handling cost lots of resources.
I don't know whether these functions are called with some out of workspace path frequently.

@pared
Copy link
Contributor

pared commented Jul 28, 2020

@karajan1001
I think the most common out of workspace use case is with external dependencies. From the development point of view, I think we should stick to conditions because they are much more readable and don't require us to leave a long comment why are we doing things in a particular way.

@karajan1001
Copy link
Contributor Author

karajan1001 commented Jul 28, 2020

@karajan1001
I think the most common out of workspace use case is with external dependencies. From the development point of view, I think we should stick to conditions because they are much more readable and don't require us to leave a long comment why are we doing things in a particular way.

Thank you, if-condition is more readable and try-catch separates exception handling and function logic (All Exceptions been handled at one place and one level). The best practice might be sharing the same standard all over the project for all contributors. Because it makes us more ease to understand each other's code.
@pared @efiop I'm not sure if this project has a universal rule or code style? Or at what condition should we raise an exception and what situation to use if-condition

karajan1001 and others added 2 commits July 29, 2020 13:48
@jorgeorpinel
Copy link
Contributor

p.s. being able to review docs and core at the same time has some great benefits, especially when the draft doc is so thorough and with comprehensive examples so kudos and thanks again @karajan1001 !

@efiop
Copy link
Contributor

efiop commented Jul 29, 2020

@karajan1001 Sorry for the delay :)

I'm not sure if this project has a universal rule or code style? Or at what condition should we raise an exception and what situation to use if-condition

There is no strict policy. As you've noticed we mix and match these at different places in the code. For example, stuff like if self.tree.exists is usually used because we didn't finalize tree unification yet, where we need to unify the exceptions that all tree types raise, so that we could handle them in a general way. And sometimes in places where performance doesn't matter much, we just use ifs because they seem more readable in that particular place. So go with your gut in this case, we'll help out if we see that something could be improved. πŸ™‚

As always, thank you for contributing amazing stuff! πŸ™

@efiop efiop mentioned this pull request Jul 30, 2020
2 tasks
efiop added a commit to efiop/dvc that referenced this pull request Jul 30, 2020
Part of the issue is a bug from iterative#3806

Kudos to @karajan1001 for finding this bug while working on iterative#4282
efiop added a commit that referenced this pull request Jul 30, 2020
Part of the issue is a bug from #3806

Kudos to @karajan1001 for finding this bug while working on #4282
1. Argument `targets`'s description.
2. Error handling of `_get_normalize_path`
@karajan1001
Copy link
Contributor Author

karajan1001 commented Jul 30, 2020

@pared @efiop
According to discussions from in Discord

Reverted to the previous version, but several problems still remained. Maybe it's better to create independent issues for each?

  1. Return 0 or 1 for files matching a ! started pattern.
  2. Result insistency of __call__ ,is_ignored_dir, and is_ignored_file with paths outside the repo.
  3. Sub-repo detection refactoring.

CI had failed for

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

dvc/main.py Outdated
@@ -84,7 +84,7 @@ def main(argv=None): # noqa: C901
ret = 255

try:
if ret != 0:
if ret != 0 and ret != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer needed after the rebase

Suggested change
if ret != 0 and ret != 1:
if ret != 0:

Copy link
Contributor Author

@karajan1001 karajan1001 Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop
Still got this warning after I deleted ret != 1
image
Return numbers are used for those who want to do some ignore-check in a script. We might solve it in two ways:

  1. Return number only available in a --quiet mode, without -q they are always 0
  2. Alert not show for some non-zero numbers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001, or just do something like this for now:

if ret == 1 and args and args.cmd == "check-ignore":
    pass

Not ideal, but should work for now.

karajan1001 and others added 2 commits July 31, 2020 08:42
Co-authored-by: Ruslan Kuprieiev <[email protected]>
1. Add a new test for the out side repo cases
2. check ignore now only check one file not file lists
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! Thank you @karajan1001 ! πŸ™

@efiop efiop merged commit 11d906e into iterative:master Aug 3, 2020
Comment on lines +65 to +68
"targets",
nargs="+",
help="Exact or wildcard paths of files or directories to check "
"ignore patterns.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence is a bit confusing. How about "File or directory paths, or ignore patterns to check"

Copy link
Contributor Author

@karajan1001 karajan1001 Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel
I think the targets we checked here are wildcard patterns not ignore patterns. It's a pattern used in file searches so the full name need not be typed provided by the file system. For example, we have

aa
ab
bb

in our current path.
The command dvc check-ignore a* equals to dvc check-ignore aa ab. Our program received a list of files instead of a single pattern a*.

Only my personal opinion, not good at English, maybe ignore patterns can represent the same meaning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the targets we checked here are wildcard patterns not ignore patterns

Good to know, thanks! Confusing probably... but makes sense. So those depend on the OS? I.e. POSIX shells have accept different wildcards than, say, Windows PowerShell.

And yeah no worries, it's getting too deep. I'll take care of rewriting it as needed in iterative/dvc.org/pull/1629 and submit a matching PR to the core repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went for "File or directory paths to check (wildcards supported)" in iterative/dvc.org/pull/1629.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "File or directory paths to check" now... See #1673



def add_parser(subparsers, parent_parser):
ADD_HELP = "Debug DVC ignore/exclude files"
Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like "Check whether files or directories are excluded due to .dvcignore." ?

Per https://github.com/iterative/dvc.org/pull/1629/files

Copy link
Contributor Author

@karajan1001 karajan1001 Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johntharian
No problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong handle πŸ˜†

@karajan1001
Copy link
Contributor Author

@jorgeorpinel
I'm sorry Seems that it had been merged, and I can't change it anymore.
Maybe we could open another Issue (GoodFirstOne) for these problems?

@jorgeorpinel
Copy link
Contributor

No problem, I'll submit a PR when I get a chance.

jorgeorpinel added a commit to karajan1001/dvc.org that referenced this pull request Aug 7, 2020
jorgeorpinel added a commit that referenced this pull request Aug 7, 2020
efiop added a commit that referenced this pull request Aug 11, 2020
* check_ignore: update help output
per #4282 (review)
and #4282 (review)

* check-ignore: more output string updates
per #4323 et al.

* check-ignore: min text update
to match iterative/dvc.org/pull/1673

Co-authored-by: Ruslan Kuprieiev <[email protected]>
@karajan1001 karajan1001 deleted the fix_3736 branch April 7, 2021 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce check-ignore for dvc ignore command.
5 participants