Description
We want to use DVC to store media files of a static page that is build with Jupyter Book (Sphinx doc). However, dvc fetch
/ dvc pull
sets the mtime of the files downloaded from remote storage in AWS S3 to the local DVC cache to the current time instead of the last modified time of the remote file object. This then triggers a complete rebuild of the entire documentation, consisting of >1000 pages. The files are then checked out using dvc checkout
(or dvc pull
, but after fetch it won't re-download anything) to the local repository using link type symlink
. That latter step works to preserve the mtime of the object in the local DVC cache. But the download from remote storage to local cache is the issue.
It would be great if DVC would set the mtime of the files in the cache to the last modified time of the remote storage object to help avoid the rebuild issue. Otherwise we would need to use AWS CLI or a custom script to download the remote folder to the local cache directory instead of dvc fetch
.