Skip to content

better error when output is dvcignored #3538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorgeorpinel opened this issue Mar 26, 2020 · 16 comments · Fixed by #4386
Closed

better error when output is dvcignored #3538

jorgeorpinel opened this issue Mar 26, 2020 · 16 comments · Fixed by #4386
Assignees
Labels
enhancement Enhances DVC ui user interface / interaction

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 26, 2020

UPDATE: Skip to #3538 (comment)


$ dvc version
DVC version: 0.90.2
Python version: 3.7.5
Platform: Windows-10-10.0.18362-SP0
Binary: True
Package: exe
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('NTFS', 'C:\\')

Having data in .dvcignore makes DVC commands ignore all file names starting with that string e.g data2, dataset. https://dvc.org/doc/user-guide/dvcignore links to gitignore patterms but Git doesn't do that.

UPDATE: Skip to #3538 (comment)

@jorgeorpinel jorgeorpinel added the bug Did we break something? label Mar 26, 2020
@efiop
Copy link
Contributor

efiop commented Mar 26, 2020

@jorgeorpinel So having data, right? Git does the same thing, hence why we use /data in .gitignore when we dvc add data.

@efiop efiop closed this as completed Mar 26, 2020
@efiop efiop reopened this Mar 26, 2020
@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Mar 26, 2020
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Mar 26, 2020

Maybe it behaves differently in distinct OSs? I'm on Windows (Cmder/ Git Bash):

λ git init
λ touch data data2 data3 data4
λ echo data > .gitignore
λ ls
data  data2  data3  data4
λ git status
...
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        .gitignore
        data2
        data3
        data4
λ git add data
The following paths are ignored by one of your .gitignore files:
data
Use -f if you really want to add them.
λ git add .
λ git status
...
        new file:   .gitignore
        new file:   data2
        new file:   data3
        new file:   data4

@efiop
Copy link
Contributor

efiop commented Mar 26, 2020

@jorgeorpinel Oh, I misunderstood your report. Need to check that, indeed. @pared Maybe you have any ideas?

@jorgeorpinel jorgeorpinel removed the awaiting response we are waiting for your reply, please respond! :) label Apr 7, 2020
@jorgeorpinel jorgeorpinel changed the title dvcignore: bad pattern matching dvcignore: bad pattern matching [qa] Apr 7, 2020
@pared
Copy link
Contributor

pared commented Apr 20, 2020

Will check this, sorry for late response

@pared pared self-assigned this Apr 20, 2020
@pared
Copy link
Contributor

pared commented Apr 30, 2020

@jorgeorpinel that was happening on windows right?
following test is passing on my linux machine for 0.90.2:

def test_ignore_data(tmp_dir, dvc):
    tmp_dir.gen({"data": "data", "data1": "data1", "data2": "data2",
                 "data3":"data3", ".dvcignore": "data"})


    files = []
    for r, d, fs in dvc.tree.walk("."):
        files.extend([os.path.join(r, f) for f in fs])

    assert set(files) == set(["./data1", "./data2", "./data3", "./.dvcignore"])

@jorgeorpinel
Copy link
Contributor Author

Sorry guys. I don't remember which DVC commands I noticed originally that don't respect files in .dvcignore. It wasn't a great report... But I'm QAing this and I think there's definitely some funny stuff happening:

$ echo data > data
$ echo data > .dvcignore
$ dvc add data
100% Add|██...  # Works and puts it in .gitignore. Should ignore? E.g.:

$ git add data
The following paths are ignored by one of your .gitignore files:
data
Use -f if you really want to add them.

I'm on 0.93.0 from exe installer now. Yes, Windows (Git Bash on Cmder)

Also, the first example in https://dvc.org/doc/user-guide/dvcignore#examples doesn't work for me (all files inside data/ dir are added and cached). I didn't check the others.

@pared
Copy link
Contributor

pared commented May 4, 2020

@jorgeorpinel hmm, that does not seem right.
I think we should behave in a similar way as git does, which is informing the user that his output collides with an entry of some .dvcignore file.
This functionality can be implemented probably after we introduce something like git check-ignore for .dvcignore. I think there was an issue for that already, but I am unable to find it.

@jorgeorpinel
Copy link
Contributor Author

Hello! Update here, I'm on DVC 1.3 and this seems solved, although the message I get,

$ dvc add data
Adding...
ERROR: output 'data' does not exist

could be improved, similar to what git add uses for files in .gitignore patterns:

The following paths are ignored by one of your .gitignore files:
data
Use -f if you really want to add them.

And, should we have a -f/--force option for add as well?

Thanks

@efiop
Copy link
Contributor

efiop commented Aug 6, 2020

@jorgeorpinel Thanks for the update! Indeed, that message could be improved.

And, should we have a -f/--force option for add as well?

I would rather not do that. We've seen it causing very odd bugs in git, wouldn't want that for dvc unless there are some very good scenarios. Would wait for someone to ask for it.

@efiop efiop changed the title dvcignore: bad pattern matching [qa] better error when output is dvcignored Aug 6, 2020
@jorgeorpinel jorgeorpinel added enhancement Enhances DVC ui user interface / interaction and removed bug Did we break something? labels Aug 6, 2020
@autayeu
Copy link

autayeu commented Aug 12, 2020

It seems the conflation of concepts "exists" and "ignored" leads to confusing error messages. In my case one of "run" outputs was by mistake ignored in .dvcignore, but the respective message was confusingly saying the file "does not exist". Possibly related source: dvc/tree/local.py#L79.

@pared
Copy link
Contributor

pared commented Aug 12, 2020

@autayeu hi!
Could you share some info on your setup?
.dvcignore contained your run output before you actually dvc run it, right?

@autayeu
Copy link

autayeu commented Aug 12, 2020

Sure. Attached dvc.message.repro.zip is a minimum setup to reproduce the issue. To reproduce:

Unpack, change into dvc.message.repro directory and run:

dvc run -n preprocess -d preprocess.py -o data.txt -o data.log "python -u preprocess.py | tee -i data.log"

You'll see:

Running stage 'preprocess' with command:
	python -u preprocess.py | tee -i data.log
screen content
ERROR: output 'data.log' does not exist

However:

(dvc) zzz:dvc.message.repro zzz$ ls -l
total 24
-rw-r--r--  1 zzz  zzz  15 Aug 12 15:45 data.log
-rw-r--r--  1 zzz  zzz  12 Aug 12 15:45 data.txt
-rw-r--r--  1 zzz  zzz  84 Aug 12 15:39 preprocess.py

So data.log does exist. A bit unexpectedly, dvc also created empty .gitignore file:

(dvc) zzz:dvc.message.repro zzz$ ls -l .gitignore
-rw-r--r--  1 zzz  zzz  0 Aug 12 16:05 .gitignore

Other relevant outputs:

(dvc) zzz:dvc.message.repro zzz$ cat .dvcignore
*.log
(qac) zzz:dvc.message.repro zzz$ dvc version
DVC version: 1.4.0+12c701

---------------------------------
Platform: Python 3.8.5 on macOS-10.15.6-x86_64-i386-64bit
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: reflink, hardlink, symlink
Repo: dvc, git

@pared
Copy link
Contributor

pared commented Aug 13, 2020

@autayeu Thank you! It seems to me that we should rethink the message.

@pared
Copy link
Contributor

pared commented Aug 14, 2020

@autayeu, @jorgeorpinel Fix for this issue is already on the master, will be available on the next release.

@autayeu
Copy link

autayeu commented Aug 14, 2020

👍 Thank you!

@jorgeorpinel
Copy link
Contributor Author

Thanks guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants