Skip to content

Execution order for dvc DAG #9958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kosmitive opened this issue Sep 18, 2023 · 4 comments
Closed

Execution order for dvc DAG #9958

kosmitive opened this issue Sep 18, 2023 · 4 comments
Labels
awaiting response we are waiting for your reply, please respond! :) feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint

Comments

@kosmitive
Copy link

kosmitive commented Sep 18, 2023

Imagine the stages

  1. experiment: Experiment, with names e.1.1, e.1.2, e.2.1, e.2.2 and e.3.1, e.3.2
  2. plot: Plots for p.1, p.2and p.3

in dvc.yaml. By using the default execution order dvc evaluates the stage experiment, e.g. all experiments e.1.1 to e.3.2. Afterwards it evaluates the stage plot, e.g. p.1 to p.3. It would be favorable to have an option --prioritize-stage plot, which prioritizes the stage plot. This would result in the execution order e.1.1, e.2.2, p.1 to e.3.1, e.3.2, p.3. It has the advantage that all stages are executed faster and thus errors might be detected earlier.

For the toy example the feature could be included by

  1. a flag --prioritize-stage plot
  2. option for stage in dvc.yaml with priority: <num>.
  3. specifying the execution order breadth-first or depth-first

The issue is related to #5181.

@efiop efiop added feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint labels Sep 19, 2023
@skshetry
Copy link
Collaborator

DVC is not a general purpose task runner or a pipeline orchestrator. It is a simple build-system, like make. The order of execution is that the dependencies are run before dependents.
Other than that is an implementation detail.

Here it looks like you are using independent stages with no dependencies but only outputs, which is more of an edgecase for DVC. We used to call this a callback stage, because they always run and can be used to trigger (or, not trigger) other stages. We probably won't support this use-case for this scenario. But maybe that was just an example.

For debugging, you can use dvc repro --single-stage that supports multiple targets, and the stages will run in the order you specified rather than the topological order of the graph.

Do you mind elaborating more on your usecase and the benefit of having a different execution order? You do mention that stages are executed faster, and the errors are detected early.
But both of these depend on the kind of dependency graph that you have. Faster execution is debatable, and we could argue that errors might be raised earlier with a depth-first approach as (a subtree of) related stages are executed in order.

I do think we'll have some priority when we implement #755.

@skshetry skshetry added the awaiting response we are waiting for your reply, please respond! :) label Sep 22, 2023
@kosmitive
Copy link
Author

kosmitive commented Sep 29, 2023

We use it for running experiments and they take a lot of time. Sometimes what happens, is that we change e.g. the plot stage and rerun it. It then takes a while until we detect e.g. plotting issues. We also had the case that the issues were connected to the first stage in the pipeline and we needed to rerun the whole pipeline. We would have saved time if we could inspect the (partial) results earlier. But it could be circumvented by including better testing in the process.

@kosmitive
Copy link
Author

kosmitive commented Sep 29, 2023

@skshetry Just tried dvc repro -s and it is a solution for debugging. Thanks for the hint.

@skshetry
Copy link
Collaborator

skshetry commented Mar 25, 2024

Closing as we are unlikely to support different execution orders.

@skshetry skshetry closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

3 participants