Closed
Description
Bug Report
I have a large dataset (~60GB) of mostly large files. Each file has additional json-files associated with it.
After I use dvc add FOLDER
all files are replaced with symlinks, and some of them are referencing wrong files in the cache (but from the same dataset). Moreover most of the links reference the same file, so the links are not just shuffled.
I tried to reproduce this issue with json-files only, but didn't succeed.
However the issue persists if I include the large files, even though it's somewhat random: each time (running add
) the links point to different files.
Information about my setup
Output of dvc version
:
$ dvc version
DVC version: 1.2.2
Python version: 3.6.8
Platform: Linux-4.15.0-107-generic-x86_64-with-debian-stretch-sid
Binary: False
Package: pip
Supported remotes: http, https, s3, ssh
Filesystem type: ('nfs4', EDITED_OUT_THE_PATH)
Additional Information:
The cache is stored outside of the repository on an NFS4-mounted disk.