Skip to content

Negation "!" in .dvcignore doesn't unignore #10122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Eve-ning opened this issue Nov 29, 2023 · 5 comments
Open

Negation "!" in .dvcignore doesn't unignore #10122

Eve-ning opened this issue Nov 29, 2023 · 5 comments
Labels
bug Did we break something? git Related to git and git backends p2-medium Medium priority, should be done, but less important

Comments

@Eve-ning
Copy link

Bug Report

Issue name

Negation "!", commonly used in .gitignore to "unignore" files, doesn't work in DVC.

Description

Given 2 files:

  • ignore.txt
  • no-ignore.txt

We can git ignore them using this .gitignore file

ignore.txt
!no-ignore.txt

However, in DVC, it doesn't "unignore".

Reproduce

Output shown in comments

touch ignore.txt no-ignore.txt
echo -e 'ignore.txt\n!no-ignore.txt' > .dvcignore
echo -e 'ignore.txt\n!no-ignore.txt' > .gitignore
cat .dvcignore
# ignore.txt
# !no-ignore.txt
git check-ignore ignore.txt no-ignore.txt
# ignore.txt
dvc check-ignore ignore.txt no-ignore.txt
# ignore.txt
# no-ignore.txt

Expected

Ideally, both check-ignores should be the same

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.30.1 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 2.22.0
        dvc_objects = 1.2.0
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.4.1
Supports:
        gs (gcsfs = 2023.10.0),
        http (aiohttp = 3.9.0, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.0, aiohttp-retry = 2.8.3)
Config:
        Global: /home/jc/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: 9p on drvfs
Caches: local
Remotes: gs
Workspace directory: 9p on drvfs
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/67bc00a31a88271f0f2653ea5494f098

Additional Information (if any):

@dberenbaum dberenbaum added bug Did we break something? p2-medium Medium priority, should be done, but less important git Related to git and git backends labels Dec 1, 2023
@anunayasri
Copy link

@Eve-ning I think dvcignore feature ie the patterns is working fine, the bug is in dvc check-ignore command. dvc add unignores the files marked with !. You can try the following.

file: .dvcignore
to_ignore*
!to_ignore-NOT.txt

$ mkdir data
$ cat '1' > data/to_ignore.txt
$ cat '1' > data/to_ignore-NOT.txt

$ dvc add data

$ tree .dvc/cache/files/md5

└── c1
    └── ba58b05f6245f221ad65391fa6690b    <<-- md5 for data/to_ignore-NOT.txt is added to cache

$ md5 data/*

MD5 (data/to_ignore-NOT.txt) = c1ba58b05f6245f221ad65391fa6690b
MD5 (data/to_ignore.txt) = 919d117956d3135c4c683ff021352f5c

It looks like this is the expected behaviour of check-ignore. this is the test for the behaviour.

@dberenbaum Do you think this test is correct, or am I missing something here? Tagging you since the last activity on this ticket was yours.

@dberenbaum
Copy link
Contributor

Yes, I think that is correct. It looks related to a previous issue in #5046. DVC uses both of these methods and they don't seem to always be consistent:

dvc/dvc/ignore.py

Lines 395 to 409 in 9b5772f

def check_ignore(self, target):
# NOTE: can only be used in `dvc check-ignore`, see
# https://github.com/iterative/dvc/issues/5046
full_target = self.fs.abspath(target)
if not self._outside_repo(full_target):
dirname, basename = self.fs.split(self.fs.normpath(full_target))
pattern = self._get_trie_pattern(dirname)
if pattern:
matches = pattern.matches(
dirname, basename, self.fs.isdir(full_target), True
)
if matches:
return CheckIgnoreResult(target, True, matches)
return _no_match(target)

dvc/dvc/ignore.py

Lines 411 to 424 in 9b5772f

def is_ignored(
self, fs: "FileSystem", path: str, ignore_subrepos: bool = True
) -> bool:
# NOTE: can't use self.check_ignore(path).match for now, see
# https://github.com/iterative/dvc/issues/4555
if fs.protocol != Schemes.LOCAL:
return False
if fs.isfile(path):
return self.is_ignored_file(path, ignore_subrepos)
if fs.isdir(path):
return self.is_ignored_dir(path, ignore_subrepos)
return self.is_ignored_file(path, ignore_subrepos) or self.is_ignored_dir(
path, ignore_subrepos
)

@anunayasri
Copy link

git check-ignore and dvc check-ignore behaves differently for the same ignore patterns.

For .dvcignore and .gitignore -

data/data1
to_ignore*
!to_ignore-NOT.txt

Following outputs differ -

dvc-demo-2 on  master [!?] via 🐍 v3.11.4 (.venv)
❯ git check-ignore data/*
data/data1
data/to_ignore.txt

dvc-demo-2 on  master [!?] via 🐍 v3.11.4 (.venv)
❯ dvc check-ignore data/*
== I am LOCAL DVC ==
data/data1
data/to_ignore-NOT.txt
data/to_ignore.txt

@dberenbaum This definitely seems like a bug. I believe the test cases are wrong and needs to be updated. Can the team confirm?

@shcheklein
Copy link
Member

@anunayasri good catch. Yes, I think DVC check-ignore should behave the same way as git.

I believe the test cases are wrong and needs to be updated. Can the team confirm?

only test case, or the implementation as well?

@anunayasri
Copy link

@shcheklein Of course, we have to update both the test case and the implementation. By test case I meant that the expected behaviour is misaligned.
I working on another issue. Will try to fix this post that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? git Related to git and git backends p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

4 participants