Skip to content

Keep persistent outputs protected by default? #6562

Open
@menegais1

Description

@menegais1

Due to memory concerns, some of my pipeline stages have persistent outputs that I handle in some python scripts. The pipeline stage receives a JSON as input and outputs a folder containing image files. As there is no need to rewrite those files every execution, I check inside the folder if a file is present (read-only) to avoid reprocessing it. As the dataset can get quite large, the space consumption becomes worrisome, as every dvc repro executed needs to unprotect all the files in the folder, copying them from the cache to the workspace. If an output could be marked as safe, so it would only suffer from append/remove operations, the unprotect could be avoided, reducing the space usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: pipelinesRelated to the pipelines featurefeature requestRequesting a new featurep2-mediumMedium priority, should be done, but less importantperformanceimprovement over resource / time consuming tasksresearch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions