-
Notifications
You must be signed in to change notification settings - Fork 1.2k
dump yaml: .dvc and dvc.lock in 1.1 version, dvc.yaml in 1.2 version #4380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* only dump dvc.lock in 1.1, dump dvc.yaml in 1.2
This sounds good, but I'm worried about some of our users that use a very large number of dvc-files in their projects. We've tried to optimize collection before, but it is still not instant and this change will make it slower. We've been talking about alternative approaches there, that might even mitigate this change. Do you also suggest moving to 1.2 for metrics/params? Or would 1.1. stay there? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though I wonder whether version
in parse
and dump
methods shouldn't be required. That way we could force choosing a proper version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty ugly to be mixing versions like this, but it seems like there's not much we can do about it for now
From a discussion w/@efiop: since we are planning on supporting specifying params via command line for the experiments feature, we will eventually want to move to 1.2 for params as well. For this experiments feature we need to load/edit/dump params files and preserve comments. Preserving comments requires using ruamel, which defaults to 1.2 |
@pmrowla, I'd say we provide repo-wide config just in case as with this |
@skshetry Repo config option to make dvc use 1.1 or 1.2 in dvcfiles/params/metrics? |
Discussed with @efiop to use YAML 1.2 everywhere. Closing this PR, I'll re-create another one with some minor adjustments. |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Requires documentation updates noting some of the things from this description.
The solution feels like a kludge, caused by two different parsers each of them with their own bugs/botched implementation.
For now, the effect of this PR will be that it will change any
y/n
keys to its string counterpart ondvc.lock
and.dvc
files. I have tried to see compatibility withpre
andpost
version of this PR, and have found it to be compatible (it's hard to make any guarantees given the differences though).dvc.yaml
should remain a compatible-subset of 1.1 and 1.2 for the time being. This was partly done because we introducedy
for plots (which wasn't introduced on .dvc files luckily), which is a boolean in YAML 1.1. Another thing is that, till now, we have no reported issues for dvc.yaml and we only use a subset of functionality there.Alternatives to consider:
I haven't looked, but we can change the "regex" that
pyyaml
uses for parsing floats. This way, we'll have partial 1.2 support. But, it won't be 1.1 compatible for sure. Another advantage of this is, it will be very minimal change and should fix the issue till the next one pops-up.Deprecate now (should have a very minimal effect), and move to YAML 1.2. It's twice as slow, but it does not make much difference until the stages are in four-figure numbers.
Things to note:
dvc.lock
file (we dump a stage without keeping it's older comments in dvc.lock, however other stages' comments do survive).Fixes #4281