Skip to content

Ignore empty files #9329

Closed
Closed
@johnyaku

Description

@johnyaku

The chances of hash collisions between two different files is extraordinarily low -- unless the two files are both "empty", in which case they will both have checksum d41d8cd98f00b204e9800998ecf8427e.

Empty files can get created for various reasons, including by workflow tools such as Snakemake. Snakemake creates .snakemake_timestamp files that exist only for their mtime, which is then lost when the file is added to the cache. (#8602)

There is not much to be gained by caching/tracking these empty files either. We could explicitly ignore them via .dvcignore when we know that they might turn up, but perhaps DVC could ignore empty files by default?

By "ignore", I think I mean "leave in the workspace, don't add to cache". Not sure if they should be tracked by .dir files.

Not sure if there would be unintended consequences. If so, perhaps "ignore empty files" could be configurable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-managementRelated to dvc add/checkout/commit/move/removefeature requestRequesting a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions