Skip to content

dvc fetch: Files downloaded from remote storage (AWS S3) to the DVC cache should have mtime restored #10347

Open
@aschuh-hf

Description

@aschuh-hf

We want to use DVC to store media files of a static page that is build with Jupyter Book (Sphinx doc). However, dvc fetch / dvc pull sets the mtime of the files downloaded from remote storage in AWS S3 to the local DVC cache to the current time instead of the last modified time of the remote file object. This then triggers a complete rebuild of the entire documentation, consisting of >1000 pages. The files are then checked out using dvc checkout (or dvc pull, but after fetch it won't re-download anything) to the local repository using link type symlink. That latter step works to preserve the mtime of the object in the local DVC cache. But the download from remote storage to local cache is the issue.

It would be great if DVC would set the mtime of the files in the cache to the last modified time of the remote storage object to help avoid the rebuild issue. Otherwise we would need to use AWS CLI or a custom script to download the remote folder to the local cache directory instead of dvc fetch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions