You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@efiop I was able to reproduce this very reliably. Sorry to keep finding this stuff ;)
Description
dvc cache files are (rarely but reproducibly) being read/written incorrectly and the contents of the cache are not matching their hash after being inserted into .dvc/cache
Reproduce
I have been able to reliably reproduce this. I am finding this with larger datasets. I was testing a remote setup when I discovered this.
In this example I used the go 1.17.2 installation from my home directory. The contents are 11880 files and 17 are corrupted. The same files are reproducibly corrupted. I was using a similar but different dataset initially.
I caught this while testing an http remote that verifies the hashes while accepting uploads.
@bobertlo md5 that we use is not real md5 strictly speaking, it is something like md5(dos2unix(data)), so I'm wondering if you are confusing it with actual corruption. Does dvc status after dvc add say that stuff is corrupted? What if you remove .dvc/tmp, what does dvc status say?
Bug Report
@efiop I was able to reproduce this very reliably. Sorry to keep finding this stuff ;)
Description
dvc cache files are (rarely but reproducibly) being read/written incorrectly and the contents of the cache are not matching their hash after being inserted into
.dvc/cache
Reproduce
I have been able to reliably reproduce this. I am finding this with larger datasets. I was testing a remote setup when I discovered this.
In this example I used the go 1.17.2 installation from my home directory. The contents are 11880 files and 17 are corrupted. The same files are reproducibly corrupted. I was using a similar but different dataset initially.
I caught this while testing an http remote that verifies the hashes while accepting uploads.
Expected
Files in .dvc/cache should be inserted under the correct hash.
Environment information
Output of
dvc doctor
:I was first running 2.8.2, then reproduced with a pip upgrade in place after deleting the cache and tmp directories.
Additional Information (if any):
The text was updated successfully, but these errors were encountered: