Description
The chances of hash collisions between two different files is extraordinarily low -- unless the two files are both "empty", in which case they will both have checksum d41d8cd98f00b204e9800998ecf8427e
.
Empty files can get created for various reasons, including by workflow tools such as Snakemake. Snakemake creates .snakemake_timestamp
files that exist only for their mtime, which is then lost when the file is added to the cache. (#8602)
There is not much to be gained by caching/tracking these empty files either. We could explicitly ignore them via .dvcignore
when we know that they might turn up, but perhaps DVC could ignore empty files by default?
By "ignore", I think I mean "leave in the workspace, don't add to cache". Not sure if they should be tracked by .dir
files.
Not sure if there would be unintended consequences. If so, perhaps "ignore empty files" could be configurable.