-
Notifications
You must be signed in to change notification settings - Fork 1.2k
dvc exp run: experiment metrics are not reported when metric files are on another device than training code #7863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @AlexandreRozier! Is that the full traceback? Are there any other exceptions logged above the traceback you shared? |
Hi @AlexandreRozier . It looks like the errors in the Could you share the output of |
I also have had a similar issue, but all on the same machine, actually. When I run Rolling back to version 2.10.2 fixes this for me. Hopefully this provides some additional context that can help narrow down the issue. Let me know if you also want my logging information as well. |
@daavoo You might be right, the experiments are indeed completing successfully, but unfortunately they don't show up in @dfossl I'm also working on a single machine, but across 2 different partitions. And |
@AlexandreRozier is your metrics path inside your DVC repo? Or is it a completely external path? From your sample code:
Where is your DVC project located (i.e. where is the It sounds like So something like this would be allowed:
Since in this case, |
But rather than nesting mount points like that, I think what you may be looking for is to configure see: |
@pmrowla Thanks, I had not understood that all output paths had to be in the dvc repo.
In conclusion, it seems that my issue originates in the lack of support for output file paths outside of the DVC repo, and can be closed if that's a DVC design choice :) |
Right, normally the output paths are validated in commands like cc @daavoo |
Not sure if this belongs in DVCLive though. I think that it might be better if Problem is that |
I'm not up to date on when/how the stage Line 24 in 685a2d5
It seems to me that it shouldn't matter whether or not the output path is cached? Even uncached outputs still have to be inside the repo. Is there an actual use case where dvclive outputs would need to be external? |
We currently promote using
What I mean is that I someone manually creates/edits the |
This isn't an error case though. External outputs with local fs paths are valid, and in this case there is no separate cache to configure. Local external outs still use the regular local cache (which defaults to That's why I think that we need an explicit check for dvclive outputs. Assuming we do not want to support external outputs for dvclive metrics/plots, when a user runs |
This will be fixed when we work on #3920. Closing in favour of that issue. |
Uh oh!
There was an error while loading. Please reload this page.
Bug Report
Issue name
dvc exp run
runs but does not store metrics.Description
I'm running my training script on
/dev/mapper/system-home
and it outputs data (model checkpoints, metrics) in/data/.cache
located on another partition (/dev/sdb1
)./dev/sdb1
is a purposely large partition where we are supposed to store large files. Runningdvc exp run
works fine, but after completiondvc exp show
does not show any metrics (aswell asdvc metrics show
).When outputting metrics to a folder on the same partition as the training script (
/dev/mapper/system-home
),dvc exp show
works perfectly and shows metrics.When using verbose mode, I get the following errors:
The full traceback can be found here:
trace.Log
The Invalid cross-device link part seems to show that dvc cannot handle cross-devices operations.
Reproduce
/sda1/foo1
training and evaluating a model, writing metrics to another device/sdb1/foo2
ex of /data/metrics.json:
ex of /data/metrics/scalar/loss.tsv:
dvc exp show
doesn't show any metrics columnExpected
dvc metrics show
actually shows metrics columns.Environment information
Python 3.8.13
Description: Ubuntu 20.04.3 LTS
Release: 20.04
dvclive 0.8.2
Output of
dvc doctor
:Additional Information (if any):
I think the error comes from a missing support of cross-device copying (check https://stackoverflow.com/questions/42392600/oserror-errno-18-invalid-cross-device-link). Do you have any ideas ? Thanks for this nice piece of software 👍
The text was updated successfully, but these errors were encountered: