Description
Bug Report
Description
The RAW file has a md5sum
fd0de1350b92b00d60afd53b015f6aea 214089_JAI.raw
But DVC calculates it as
md5: 0b4d86bc06ee3260e8172b2196805382 size: 63232000 path: 214089_JAI.raw
This happens because it identifies it as a text file and runs the dos2unix replacement:
https://github.com/iterative/dvc/blob/1.11/dvc/utils/__init__.py#L39 -> https://github.com/iterative/dvc/blob/1.11/dvc/istextfile.py#L34
It still happens in version 2.4.3
https://github.com/iterative/dvc/blob/2.4.3/dvc/utils/__init__.py#L37 -> https://github.com/iterative/dvc/blob/2.4.3/dvc/istextfile.py#L22
When uploading it through the gocloud.dev library, it fails due to the MD5 check, since the one calculated by DVC and the real one of the file is not the same:
https://github.com/google/go-cloud/blob/v0.23.0/blob/blob.go#L328
Reproduce
- dvc init
- dvc remote modify --local our-proxy password 123123
- Copy 214089_JAI.raw to the directory
- dvc add 214089_JAI.raw
- dvc push
Expected
The file is expected to upload correctly, but since the md5 of the file and the one sent by DVC do not match, the upload is canceled
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 1.11.16 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-65-generic-x86_64-with-glibc2.29
Supports: http, https, ssh
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git
Additional Information (if any):
https://github.com/atekoa/dvc-rawfile