experiments: support local parallel execution in temp directories #4257

pmrowla · 2020-07-22T08:04:59Z

❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Maybe: see experiments: support local parallel execution in temp directories #4257 (comment) below.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Related to #2799.

Adds local temp directory executor and support for running experiments in parallel.
- Only dvc repro is currently supported
- Individual dvc run stages within a single repro command cannot be run in parallel
dvc repro --experiment --queue can be used to add an experiment to the execution queue
dvc repro --run-all will run all experiments currently in the queue
- -j/--jobs can be used to run experiments in parallel, currently defaults to 1 (run sequentially), will need to decide if we should default to cpu count for this?
Queued experiments are listed in dvc exp show and denoted with *

Next steps:

Reorganize executor output collection (currently collection is done manually in the experiments classes rather than reading output from an executor's tree)
Clean up executor classes so that they can be more easily extended to support other types of executors

pmrowla · 2020-07-28T09:20:42Z

pmrowla · 2020-07-28T14:45:25Z

Looks like there's a windows issue w/cleaning up the executor temp directories, I'll have to take a look into it tomorrow

* `--queue` can be used to stage an experiment for future execution

* `--run-all` can be used to run all queued experiments in parallel

* fix returning unpicklable objects error

* on windows tempdir cannot be removed if we are chdir'd into that directory

pmrowla · 2020-07-29T07:01:41Z

The windows CI issue is resolved and this should be mergable.

Not sure what's going on with the lint build step, check_patch/pylint passes for me locally and in the travis build.
For some reason pylint in the lint step is reporting errors for pytest imports that I think are normally ignored:

pylint...................................................................Failed
- hook id: pylint
- exit code: 2

Registered custom plugin. Some checks will be disabled for tests.
************* Module tests.unit.repo.test_repo
tests/unit/repo/test_repo.py:15:1: E1102: pytest.mark.parametrize is not callable (not-callable)
tests/unit/repo/test_repo.py:33:1: E1102: pytest.mark.parametrize is not callable (not-callable)
************* Module tests.unit.remote.ssh.test_connection
tests/unit/remote/ssh/test_connection.py:87:1: E1102: pytest.mark.skipif is not callable (not-callable)
tests/unit/remote/ssh/test_connection.py:100:1: E1102: pytest.mark.skipif is not callable (not-callable)
...

skshetry · 2020-07-29T07:12:08Z

@pmrowla, could be related to pytest-dev/pytest#7558.
Pinning pytest to <6.0.0 should do. Let me create a PR.

pytest-dev/pytest#7558 pylint throws error for `pytest.mark.*` functions in pytest6. ``` tests/unit/repo/test_repo.py:15:1: E1102: pytest.mark.parametrize is not callable (not-callable) tests/unit/repo/test_repo.py:33:1: E1102: pytest.mark.parametrize is not callable (not-callable) ************* Module tests.unit.remote.ssh.test_connection tests/unit/remote/ssh/test_connection.py:87:1: E1102: pytest.mark.skipif is not callable (not-callable) tests/unit/remote/ssh/test_connection.py:100:1: E1102: pytest.mark.skipif is not callable (not-callable) ``` #4257 (comment)

jorgeorpinel · 2020-08-06T02:09:01Z

Hi! Wow, this looks cool 👍 👍

Docs-wise though, We have this section in the repro cmd ref, https://dvc.org/doc/command-reference/repro#parallel-stage-execution, reading:

Currently, dvc repro is not able to parallelize stage execution automatically. If you need to do this, you can launch dvc repro multiple times manually. For example...

Sounds like it needs an update?

dvc repro --experiment --queue can be used to add an experiment to the execution queue

In fact all this seems like several new options have been added, so def. needs docs ticket, I think.

Thanks

pmrowla · 2020-08-06T03:01:44Z

@jorgeorpinel yeah there will be docs tickets once behavior is finalized, but for now since everything is expected to continue to change before the feature is released (and since it's currently an experimental feature and disabled by default) I haven't submitted any docs PRs yet.

jorgeorpinel · 2020-08-06T17:06:33Z

Kk

pmrowla self-assigned this Jul 22, 2020

pmrowla force-pushed the experiments-parallel branch from 02a5fb9 to 0fa523d Compare July 24, 2020 11:55

weekly-digest bot mentioned this pull request Jul 26, 2020

Weekly Digest (19 July, 2020 - 26 July, 2020) #4285

Closed

pmrowla force-pushed the experiments-parallel branch from 0fa523d to ea37620 Compare July 28, 2020 06:50

pmrowla marked this pull request as ready for review July 28, 2020 09:23

pmrowla changed the title ~~[WIP] experiments: support local parallel execution in temp directories~~ experiments: support local parallel execution in temp directories Jul 28, 2020

pmrowla requested a review from efiop July 28, 2020 09:24

pmrowla added the A: experiments Related to dvc exp label Jul 28, 2020

pmrowla added 17 commits July 29, 2020 15:31

experiments: add initial local tmpdir executor

bbca2e5

fix detached head pull

703275c

experiments: include unchanged (unrepro'd) stages in experiment hash

83a4bd0

experiments: repro single experiment using tmpdir executor by default

aa1e044

stage experiments as stash commits before running

48b74e0

include repro args/kwargs in stashed experiments

7a53923

support running arbitrary experiment commits (including stash commits)

41b53b4

experiments: use ProcessPoolExecutor

2571204

experiments: add dvc exp alias for dvc experiments

3569bf7

experiments: add dvc repro --experiment --queue

9abb0a0

* `--queue` can be used to stage an experiment for future execution

experiments: show queued (unexecuted) experiments in dvc exp show

0f3b7d3

revert ProcessPoolExecutor change

9214934

experiments: add dvc repro --run-all [--jobs]

4916263

* `--run-all` can be used to run all queued experiments in parallel

cleanup output

ccaa098

experiments: use ProcessPoolExecutor

bdf6356

* fix returning unpicklable objects error

update tests

2fc3b4a

fix windows cleanup issue

66df566

* on windows tempdir cannot be removed if we are chdir'd into that directory

pmrowla force-pushed the experiments-parallel branch from 6382f65 to 66df566 Compare July 29, 2020 06:33

skshetry mentioned this pull request Jul 29, 2020

Exclude pylint==6.0 #4300

Merged

2 tasks

efiop approved these changes Jul 29, 2020

View reviewed changes

efiop merged commit c8d2f4f into iterative:master Jul 29, 2020

pmrowla deleted the experiments-parallel branch July 29, 2020 12:15

weekly-digest bot mentioned this pull request Aug 2, 2020

Weekly Digest (26 July, 2020 - 2 August, 2020) #4315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

experiments: support local parallel execution in temp directories #4257

experiments: support local parallel execution in temp directories #4257

Uh oh!

pmrowla commented Jul 22, 2020 •

edited by jorgeorpinel

Loading

Uh oh!

pmrowla commented Jul 28, 2020

Uh oh!

pmrowla commented Jul 28, 2020

Uh oh!

pmrowla commented Jul 29, 2020

Uh oh!

skshetry commented Jul 29, 2020

Uh oh!

jorgeorpinel commented Aug 6, 2020 •

edited

Loading

Uh oh!

pmrowla commented Aug 6, 2020

Uh oh!

jorgeorpinel commented Aug 6, 2020

Uh oh!

Uh oh!

experiments: support local parallel execution in temp directories #4257

experiments: support local parallel execution in temp directories #4257

Uh oh!

Conversation

pmrowla commented Jul 22, 2020 • edited by jorgeorpinel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmrowla commented Jul 28, 2020

Uh oh!

pmrowla commented Jul 28, 2020

Uh oh!

pmrowla commented Jul 29, 2020

Uh oh!

skshetry commented Jul 29, 2020

Uh oh!

jorgeorpinel commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmrowla commented Aug 6, 2020

Uh oh!

jorgeorpinel commented Aug 6, 2020

Uh oh!

Uh oh!

pmrowla commented Jul 22, 2020 •

edited by jorgeorpinel

Loading

jorgeorpinel commented Aug 6, 2020 •

edited

Loading