-
Notifications
You must be signed in to change notification settings - Fork 1.2k
QA: params/vars and loops in dvc.yaml (2.0) round 2 #5165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not a bug. The ordering is based on the DAG. Other than that, it might be difficult to promise to order (and, not that it matters). Lock file has its own way of ordering things wherever it matters, it does not have to work the way you expect.
The reasoning for this is that we have two levels of vars: local and global. Three if you count the foreach's
I don't like this term at all. They are not a group in any way. If you are confusing it with
I brought that in the discussion with @dmpetrov. The main reasoning was that most people will use And, if you don't want any parameterization, don't use it at all. If you only want to declare variables in
vars:
grp:
b: 3 This is useful in the top-level We have decided that if someone wants to have an "overwrite" method, we'd introduce a new keyword as follows in the future: overwrite_vars: true
vars:
grp: 3 |
So if there are no DAG connections in a stage group, why use any other order than what is specified in dvc.yaml? We even assign 0...n indices to some lists given to
DVC should not work in a predictable/intuitive way? I probably didn't understand what you meant with that @skshetry.
I think we are assuming there. OR do we know that users actually prefer global vs. local vars?
I don't have a strong opinion on this since it's already implemented that way but I think it looks cleaner without
I see you fixed the "repro prints a generic ERROR" bug, great! 👍
No strong opinion either but those are workarounds. What if I want parameterization with
Is there a reasoning to keep that default behavior? Thanks
You're right! I confirmed this does work consistently 👍 (except that |
any additional guarantee reduces flexibility (let's say we decide to run them in parallel at some point and people already expect certain order). I would wait for someone to ask this, for a specific use case to start considering this at all.
I think @skshetry mentioned that
so, according to that screenshot I tend to agree with @skshetry - since they do not represent a unit in any way |
This comment has been minimized.
This comment has been minimized.
hmm ... what about the explanation @skshetry provided, with global/local vars?
no. We might run them in parallel by default, nothing prevents us from doing this at the moment.
I'm not sure I understand this to be honest. Could you clarify please?
you kinda set specific thing (execution order in this case). This is by definition reduces flexibility - we have to maintain that specification. E.g. it makes harder to run stuff in parallel by default.
Yes. We don't guarantee any order. |
This comment has been minimized.
This comment has been minimized.
On the "group" term@jorgeorpinel it's a language construct to generate stages to my mind. Not sure how else I would describe it. As I mentioned I would analyze how users describe/ask about it in the first place- and would use that language (it should be correct though, e.g. we don't like group, because group is a very specific term in math/languages, and in our case it's not a group, these stages are not different from any other stages. Also, you can take a look at some other tools that use loops in yaml. Writing this, I see where this confusion comes from. There are two languages now - I'm btw would be totally fine to use "group" in some informal way. The misleading would be to make it formal (use in title, and everywhere else). At least we should about some better term here. |
My last comments on that
OK I think see what your concern is now. We can def. find another term for the "official name" of the feature.
Agree on this. I'm just not sure where to find that info...
They're indirectly connected by the stage name prefixes e.g. p.p.s. I'm not sure why this terminology discussion is happening here (it's not something I reported related to QA). Let's move this part to iterative/dvc.org#2052 (review) please! |
@shcheklein answers to the other comments (which I just noticed):
TBH I didn't understand that explanation 😅. Here's an example with all 3 types of substitutions, and without vars:
- glb: foo
stages:
mystages:
vars:
- loc: bar
foreach:
- 1
- 2
# do: needed?
cmd: echo ${glb} ${loc} ${item} I will create a separate issue on the order of stage execution and summarize/clarify there, since it seems to be more general than the scope of this QA ticket. |
OK, created #5181 for the order thing. The only matter still in discussion here is about the I'm also building a 3rd (hopefully last) QA ticket around 2.0 pipelines: #5180 |
I lied, there's 2 other questions still pending:
My proposal would be to address both (not necessarily a priority) with something like vars:
- none/null # or introduce a special comment like # dvc-skip-params.yaml
- foo: bar # This can now "override" values in params.yaml
- params.json # This will be the only params file loaded. Or (for the latter one) perhaps if you specify params files then by default DVC shouldn't load params.yaml (so no |
Closing in favor of #5312 |
Uh oh!
There was an error while loading. Please reload this page.
Bugs?
dvc repro
executes stage groups alphabetically e,g, if myforeach
list isfoo, bar, baz
(in that order) still the stages gets reproed (and saved to dvc.locl) in this order:stg@bar
,stg@baz
,stg@foo
.Generalized and extracted to repro: fallback order of execution for independent stages #5181
UI/UX
do:
always contains all the regular stage fields, what about avoidingdo
and go straight tocmd
, etc.? Like this:vars
, repro prints a genericERROR: unexpected error - [Errno 21] Is a directory: '{path}'.
which while informative enough, seems like it could be better handled. (Fix on parametrization: fix error when the file invars
is a directory #5171)Feature design/ Side-effects
use: none
.params.json
and avoid the default (params.yaml
). I feel like the default behavior should be to skip params.yaml if any params files are specified.E.g.: params.jsonvars
cannot overwrite any values in params files, but params files accept repeated objects IF they can be merged without conflict.{"grp": {"a": 1}}
params2.json{"grp": {"b": 2}}
can both be included even whengrp
is overloaded,but you can't definegrp
at all invars
.The text was updated successfully, but these errors were encountered: