Description
Discussed in #6313
Originally posted by SebbanSms July 14, 2021
I have some issues using a s3 bucket to push and pull input data with dvc.
I have a stage where the deps
are outs
of a prior stage.
When I reproduce the dvc.yaml and push the data on OSX(Mac)
and then pull it on another machine using Windows, I reproduce the dvc.yaml again,
dvc.lock shows the same hash but different file sizes for that stage deps
and reruns the stage completely, then also showing different files sizes and hash on the outs
git marks the diff in my IDE in the deps of that stage in the .lock file only for sizes:
Any idea how the file size in deps
could change if the hash is the same?
What I tried so far:
deleting the files on OSX, pull them again from s3, reproducing dvc.yaml -> no changes detected, dvc.lock stays the same
delteing the files on Windows, pull them again from s3, reproducing dvc.yaml -> changes detected, dvc.lock shows different file sizes for the files in deps of that stage
On both systems, I definitely use the same git commit.
It seems that running repro
on different file system will retrigger existing stages just because file size from Unix system is different than one from Windows.
We should probably acknowledge that OS when adding the file (same as we do with calculating hash). The problem is that changing this behaviour now would probably affect all repositories on Windows that use text files.
Maybe we could do some additional check on Windows that would allow to verify that given file is unchanged even if the sizes do not match?