-
Notifications
You must be signed in to change notification settings - Fork 399
ref: document dvc queue
#3715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: document dvc queue
#3715
Changes from all commits
4e68869
dbe354c
fb61d31
817afb7
a26ee1d
e5feaf4
8a4b3d8
75e21a3
01ee250
ea22d10
89041fa
f5fef4d
29760dc
533f175
6677c18
f337cad
ecd46d4
d230f9a
7ae6413
8f1681b
09a23bc
7df232e
c87cc2a
6a17f18
7264c94
e571871
68d34e4
8fd6a29
5d556a1
45183e7
f6061dd
ed2e412
e1930b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# queue | ||
|
||
A set of commands to manage the | ||
[DVC experiments](/doc/user-guide/experiment-management/experiments-overview) | ||
task queue: [start](/doc/command-reference/queue/start), | ||
Comment on lines
+3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure about this terminology: "experiment task queue" Why not just "experiment queue" ? Are "tasks" important enough to differentiate from experiments? (At first glance it sounds like an implementation detail) WDYT @dberenbaum ? Could impact strings in the core codebase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's important to differentiate between "experiment" vs "queue task" due to commands where there's overlap (like
The current plan is also to emphasize this in other commands. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorgeorpinel Maintaining two separate concepts is important, but feel free to suggest more useful terminology 🙏 . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK I see there's some rationale and justification for the special terminology. I still think "task" is an implementation detail and could be avoided...
I'm thinking that it should be clear that But it's something we can follow-up on later if needed (still unsure about the release timeline we're looking at here). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is in main now and should be in the next DVC release. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Everyone raises good points here. Agreed that
In this case, When the Ideas to further improve:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My only specific suggestion for now is to avoid term "task". Be descriptive instead e.g. "queued experiment", "experiment from queue", even "entry from exps queue" if needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorgeorpinel if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not strongly opposed but it seems unnecessary to me: the whole task management aspect of this is an implementation detail (ultimately irrelevant for users). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But "queue" is the name of the command. Hard to avoid that term. p.s. term "worker process" also seems redundant and too deep. We already have "jobs" for the option name, I'd stick to that. |
||
[stop](/doc/command-reference/queue/stop), | ||
[status](/doc/command-reference/queue/status), | ||
[logs](/doc/command-reference/queue/logs), | ||
[remove](/doc/command-reference/queue/remove), | ||
[kill](/doc/command-reference/queue/kill) | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc queue [-h] [-q | -v] | ||
pmrowla marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{start,stop,status,logs,remove,kill} ... | ||
|
||
positional arguments: | ||
COMMAND | ||
start Start experiments queue workers. | ||
stop Stop experiments queue workers. | ||
status List the status of the queue tasks and workers. | ||
logs Show output logs for a task in the experiments queue. | ||
remove Remove tasks in experiments queue. | ||
kill Kill tasks in experiments queue. | ||
``` | ||
|
||
## Description | ||
|
||
`dvc queue` subcommands provide specialized ways to manage queued experiment | ||
tasks. | ||
Comment on lines
+28
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's provide just a bit more context instead like "You can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pmrowla What do you think? Could you add some phrasing like this? |
||
|
||
## Options | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
## queue kill | ||
pmrowla marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Kill actively running | ||
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview) | ||
tasks. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc queue kill [-h] [-q | -v] [<task> ...] | ||
|
||
positional arguments: | ||
<task> Tasks in queue to kill. | ||
``` | ||
|
||
## Description | ||
|
||
Forcefully stops execution of the specified (running) experiment tasks. Killed | ||
tasks will be considered as failed runs. | ||
|
||
This command does not stop the queue worker process. After the specified task | ||
has been killed, the worker process will consume and execute the next experiment | ||
task in the queue. | ||
|
||
To kill all running experiment tasks and also stop queue processing, you can use | ||
`dvc queue stop --kill`. | ||
|
||
<admon type="warn"> | ||
|
||
Note that killed experiment tasks will be considered failed runs and will not be | ||
re-added to the queue for future execution. | ||
|
||
</admon> | ||
|
||
## Options | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
## queue logs | ||
|
||
Show output logs for running and completed tasks in the | ||
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview) | ||
task queue. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc queue logs [-h] [-q | -v] [-e <encoding>] [-f] <task> | ||
|
||
positional arguments: | ||
<task> Task to show. | ||
``` | ||
|
||
## Description | ||
|
||
Shows output logs for the specified running or completed experiment task. | ||
|
||
By default, this command will show any available log data and then exit. For | ||
tasks which are still running, the `--follow` option can be used to attach to | ||
the task and continuously show live log output, until the task has completed. | ||
|
||
When using the `--follow` option, it is safe to stop following output using | ||
`Ctrl+C` (or `SIGINT`). This will only cause the logs command to exit, and the | ||
experiment task will continue to be run in the background. | ||
|
||
## Options | ||
|
||
- `-e <encoding>`, `--encoding <encoding>` - text encoding for log output. | ||
Defaults to the system locale encoding. | ||
|
||
<admon type="warn"> | ||
|
||
Note that this option is used to specify the encoding of the experiment task | ||
output (i.e. the output of pipeline stage commands), which may not always | ||
match the encoding of your system terminal. | ||
|
||
</admon> | ||
|
||
- `-f`, `--follow` - attach to task and follow additional live output. Only | ||
applicable if the task is still running. | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
|
||
## Examples | ||
|
||
## Example: View logs for completed experiment tasks | ||
|
||
Let's say we have previously run some queued experiment tasks: | ||
|
||
```dvc | ||
$ dvc queue status | ||
Task Name Created Status | ||
192a13c 04:15 PM Failed | ||
753b005 04:01 PM Success | ||
0bbb118 04:01 PM Success | ||
1ae8b65 04:01 PM Success | ||
|
||
Worker status: 0 active, 0 idle | ||
``` | ||
|
||
We can view the output for both failed and successfully completed experiment | ||
tasks: | ||
|
||
```dvc | ||
$ dvc queue logs 192a13c | ||
'data/data.xml.dvc' didn't change, skipping | ||
Running stage 'prepare': | ||
> python src/prepare.py data/data.xml | ||
Traceback (most recent call last): | ||
File "/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/tmp217n0tjv/src/prepare.py", line 10, in <module> | ||
raise AssertionError | ||
AssertionError | ||
ERROR: failed to reproduce 'prepare': failed to run: python src/prepare.py data/data.xml, exited with 1 | ||
``` | ||
|
||
```dvc | ||
$ dvc queue logs 0bbb118 | ||
'data/data.xml.dvc' didn't change, skipping | ||
Stage 'prepare' is cached - skipping run, checking out outputs | ||
Updating lock file 'dvc.lock' | ||
|
||
Stage 'featurize' is cached - skipping run, checking out outputs | ||
Updating lock file 'dvc.lock' | ||
|
||
Stage 'train' is cached - skipping run, checking out outputs | ||
Updating lock file 'dvc.lock' | ||
|
||
Stage 'evaluate' is cached - skipping run, checking out outputs | ||
Updating lock file 'dvc.lock' | ||
|
||
To track the changes with git, run: | ||
|
||
git add dvc.yaml scores.json roc.json params.yaml data/prepared data/data.xml prc.json src/featurization.py data/features src/evaluate.py model.pkl dvc.lock src/train.py src/prepare.py | ||
|
||
To enable auto staging, run: | ||
|
||
dvc config core.autostage true | ||
``` | ||
|
||
## Example: View logs for running experiment tasks | ||
|
||
Let's queue a new experiment and view the output while it is running: | ||
|
||
```dvc | ||
$ dvc exp run --queue -S prepare.split=0.40 -S featurize.max_features=4000 | ||
Queued experiment '93cfa70' for future execution. | ||
$ dvc queue start | ||
Started '1' new experiments task queue worker. | ||
$ dvc queue logs 93cfa70 | ||
'data/data.xml.dvc' didn't change, skipping | ||
Running stage 'prepare': | ||
> python src/prepare.py data/data.xml | ||
Updating lock file 'dvc.lock' | ||
|
||
Running stage 'featurize': | ||
> python src/featurization.py data/prepared data/features | ||
``` | ||
|
||
We can see that by default, `dvc queue logs` displays any available output and | ||
then exits. In this case, our `featurize` stage is still running, so no | ||
additional output is available at this time. | ||
|
||
If we wanted to continuously view live output from the running task (until it | ||
completes) we also could have used the `--follow` option. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
## queue remove | ||
|
||
Remove queued and completed tasks from the | ||
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview) | ||
task queue. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc queue remove [-h] [-q | -v] | ||
[--all] [--queued] [--success] [--failed] | ||
[<task> ...] | ||
|
||
positional arguments: | ||
<task> Tasks in queue to remove. | ||
``` | ||
|
||
## Description | ||
|
||
Removes the specified queued or completed experiment tasks from the queue. For | ||
completed tasks, this will also remove any associated output logs. | ||
|
||
<admon type="warn"> | ||
|
||
Note that for successfully completed tasks, this command is not the same as | ||
`dvc exp remove`. `dvc queue remove` does not remove any Git or DVC data | ||
associated with a successful DVC experiment. It only removes the task queue | ||
entry and any associated output logs for that task. | ||
|
||
</admon> | ||
|
||
## Options | ||
|
||
- `--all` - remove all (queued and completed) experiment tasks from the queue. | ||
|
||
- `--queued` - remove all queued experiment tasks from the queue. | ||
|
||
- `--success` - remove all successfully completed tasks (and associated output | ||
logs) from the queue. | ||
Comment on lines
+36
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably too late to discuss now but just adding a note for a possible design follow-up:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one was just an idea for you @dberenbaum, no follow-up needed in #3894. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a good point but not a high priority for me. Feel free to open an issue and maybe we can change it or make |
||
|
||
- `--failed` - remove all failed tasks (and associated output logs) from the | ||
queue. | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
## queue start | ||
|
||
Start the | ||
[DVC experiments](/doc/user-guide/experiment-management/experiments-overview) | ||
task queue worker process. | ||
Comment on lines
+3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use more user-perspective descriptions. In this case something like what we have for " There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pmrowla What do you think? |
||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: dvc queue start [-h] [-q | -v] [-j <number>] | ||
``` | ||
|
||
## Description | ||
|
||
Starts one or more task queue worker processes. Each worker process will consume | ||
and execute one queued experiment task at a time in the background, until either | ||
`dvc queue stop` is used or the queue is empty. | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<admon type="info"> | ||
|
||
Due to [internal limitations], when the queue is empty a worker may be idle for | ||
up to 10 seconds before exiting. If new experiment tasks are added to the queue | ||
during this time, the idle worker will resume processing them instead. | ||
|
||
[implementation limitations]: | ||
/doc/user-guide/experiment-management/running-experiments#how-are-experiments-queued | ||
|
||
</admon> | ||
|
||
Queued experiment tasks are run sequentially by default, but can be run in | ||
parallel by using the `--jobs` option to start more than one worker. | ||
|
||
<admon type="warn"> | ||
|
||
Parallel runs are experimental and may be unstable. Make sure you're using | ||
number of jobs that your environment can handle (no more than the CPU cores). | ||
|
||
Note that since queued experiments are run isolated from each other, common | ||
stages may sometimes be executed several times depending on the state of the | ||
[run-cache] at that time. | ||
|
||
</admon> | ||
|
||
## Options | ||
|
||
- `-j <number>`, `--<number>` - start up to this number of workers in parallel. | ||
Defaults to 1 (the task queue is processed serially). | ||
|
||
<admon type="info"> | ||
|
||
Note that if any queue worker processes have already been started, this | ||
command will not start additional processes unless `number` is greater than | ||
the number of existing workers (`number` is treated as the maximum allowed | ||
concurrency value). | ||
|
||
If `number` is less than the number of existing worker processes, this command | ||
will not stop any existing worker processes. To reduce worker concurrency, | ||
`dvc queue stop` must first be used to stop queue processing, before running | ||
`dvc queue run` with the desired number of `--jobs`. | ||
|
||
</admon> | ||
|
||
- `-h`, `--help` - prints the usage/help message, and exit. | ||
|
||
- `-q`, `--quiet` - do not write anything to standard output. | ||
|
||
- `-v`, `--verbose` - displays detailed tracing information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be under
--run-all
? Or split into 2 admonitions, even if they're similar.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla What do you think about moving or splitting this one?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think it's necessary to repeat this block after 2 consecutive options -
run-all
andjobs
will appear one after another and both options + the info block fit into a single screen on most devicesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But this is much more about
--run-all
than about--jobs
so I moved it up in 26d28e8.