Skip to content

repro: acknowledge that text file might come from different OS #6314

Closed as not planned
@pared

Description

@pared

Discussed in #6313

Originally posted by SebbanSms July 14, 2021
I have some issues using a s3 bucket to push and pull input data with dvc.

I have a stage where the deps are outs of a prior stage.

When I reproduce the dvc.yaml and push the data on OSX(Mac)
and then pull it on another machine using Windows, I reproduce the dvc.yaml again,
dvc.lock shows the same hash but different file sizes for that stage deps and reruns the stage completely, then also showing different files sizes and hash on the outs

git marks the diff in my IDE in the deps of that stage in the .lock file only for sizes:

image

Any idea how the file size in deps could change if the hash is the same?

What I tried so far:
deleting the files on OSX, pull them again from s3, reproducing dvc.yaml -> no changes detected, dvc.lock stays the same
delteing the files on Windows, pull them again from s3, reproducing dvc.yaml -> changes detected, dvc.lock shows different file sizes for the files in deps of that stage

On both systems, I definitely use the same git commit.

It seems that running repro on different file system will retrigger existing stages just because file size from Unix system is different than one from Windows.
We should probably acknowledge that OS when adding the file (same as we do with calculating hash). The problem is that changing this behaviour now would probably affect all repositories on Windows that use text files.

Maybe we could do some additional check on Windows that would allow to verify that given file is unchanged even if the sizes do not match?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?research

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions