-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
A: experimentsRelated to dvc expRelated to dvc expdiscussionrequires active participation to reach a conclusionrequires active participation to reach a conclusionp1-importantImportant, aka current backlog of things to doImportant, aka current backlog of things to doproduct: VSCodeIntegration with VSCode extensionIntegration with VSCode extension
Description
Bug Report
Description
After the recent changes to make exp show
lockless I am seeing intermittent issues with the data returned during experiment runs.
The issues that I have seen so far are as follows:
Running checkpoint experiments from the queue
exp show
fails when running experiments from the queue withdulwich.porcelain.DivergedBranches
errors.- @pmrowla tried to patch the above but the change led to this behaviour.
Please take a look at the above behaviour and LMK what you think. I do not anticipate there being an easy fix. IMO we should consider "dropping support" for live tracking experiments run from the queue until the DVC
mechanics have been updated.
Running checkpoint experiment in the workspace
exp show
returns a single dict for a set of checkpoints during an experiment. This happens intermittently and breaks our "live experiment" tracking.exp show
shows running experiments as not running mid-run.
Reproduce
Run a checkpoint experiment and monitor the output of exp show
.
Expected
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.9 on macOS-12.3.1-x86_64-i386-64bit
Supports:
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s5s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk1s5s1
Repo: dvc (subdir), git
Additional Information (if any):
I'll continue to add issues here as I find them. I have discussed with @pmrowla already. Wanted to raise the visibility by raising an issue to discuss possible mitigation and the priority for fixes.
Thanks
shcheklein and dberenbaum
Metadata
Metadata
Assignees
Labels
A: experimentsRelated to dvc expRelated to dvc expdiscussionrequires active participation to reach a conclusionrequires active participation to reach a conclusionp1-importantImportant, aka current backlog of things to doImportant, aka current backlog of things to doproduct: VSCodeIntegration with VSCode extensionIntegration with VSCode extension
Type
Projects
Status
Done