Skip to content

push: RAW file considered as text file (bad MD5) #6253

Closed
@themaikelman

Description

@themaikelman

Bug Report

Description

The RAW file has a md5sum

fd0de1350b92b00d60afd53b015f6aea 214089_JAI.raw

But DVC calculates it as

md5: 0b4d86bc06ee3260e8172b2196805382 size: 63232000 path: 214089_JAI.raw

This happens because it identifies it as a text file and runs the dos2unix replacement:
https://github.com/iterative/dvc/blob/1.11/dvc/utils/__init__.py#L39 -> https://github.com/iterative/dvc/blob/1.11/dvc/istextfile.py#L34

It still happens in version 2.4.3
https://github.com/iterative/dvc/blob/2.4.3/dvc/utils/__init__.py#L37 -> https://github.com/iterative/dvc/blob/2.4.3/dvc/istextfile.py#L22

When uploading it through the gocloud.dev library, it fails due to the MD5 check, since the one calculated by DVC and the real one of the file is not the same:
https://github.com/google/go-cloud/blob/v0.23.0/blob/blob.go#L328

Reproduce

  1. dvc init
  2. dvc remote modify --local our-proxy password 123123
  3. Copy 214089_JAI.raw to the directory
  4. dvc add 214089_JAI.raw
  5. dvc push

Expected

The file is expected to upload correctly, but since the md5 of the file and the one sent by DVC do not match, the upload is canceled

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 1.11.16 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-65-generic-x86_64-with-glibc2.29
Supports: http, https, ssh
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Additional Information (if any):
https://github.com/atekoa/dvc-rawfile

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting responsewe are waiting for your reply, please respond! :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions