Skip to content

introduce a readonly property for better parallelization #4979

Closed
@johnnychen94

Description

@johnnychen94

Training a bunch of models using different parameters now becomes much easier with the parameterization feature, this is a great improvement by deduplicating dvc.yml, but there's still one thing that blocks us from training multiple models asynchronously.

stages:
  train:
    foreach: ${models}
    do:
      cmd: julia --project=train train/main.jl data/train/data.h5 models/${item.name} ${item.config}
      deps:
      - data/train/data.h5
      - train
      outs:
      - models/${item.name}

Naturally, we want to schedule multiple jobs asynchronously to different devices without error, e.g.,:

mkdir log
for i in 0 1 2 3; do
    CUDA_VISIBLE_DEVICES=$i dvc repro train@model_$i > log/model_$i.txt 2>&1 &
    sleep 2
done

but since data/train/train.h5 will be locked and thus we can only run one job at the same time. (we could work around it by creating multiple copies/symlinks but that's not elegant...)

I'm wondering if it's possible to introduce a looser version of read lock that's specified by users from dvc.yaml, e.g.,

 stages:
   train:
     foreach: ${models}
     do:
       cmd: julia --project=train train/main.jl data/train/data.h5 models/${item.name}
       deps:
-      - data/train/data.h5
+      - data/train/data.h5:readonly
       - train
       outs:
       - models/${item.name}

When a user adds this property, he's explicitly saying that "okay I plan to use this in a read-only way and I'll take responsibility for whatever bugs it may occur due to my impropriate usage", then dvc could choose to not add an entry for them in rwlock, which enables concurrency.

I'm not sure if DVC has plans to give native support for concurrent job scheduling. With #4976, it can be very promising if dvc repro train@* -s schedules multiple jobs in parallel.


It can be nice to also support environment variable passing, but it's also doable by passing params.yml's values to language's internal utils (e.g., os.environ["CUDA_VISIBLE_DEVICES"]=config["gpu_device"]).

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestRequesting a new featurep3-nice-to-haveIt should be done this or next sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions