-
Notifications
You must be signed in to change notification settings - Fork 1.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Extract checksums to a common state file #2940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An idea: it might happen that However, there is a small disadvantage - it might lead to empty directories which won't be reflected by Git. |
We have several issues that discuss the way we organize the state on files. With this approach, it doesn't matter how do we identify each block/object of the store (gathering a collection of stage files, querying a single file, or split it across several files -- checksums, pipelines, artifcats.) |
Could you elaborate what would be the best practice? |
@dmpetrov, by the way, terraform has an option to submit the state to a remote/shared space. It can't be done through GitHub because their state includes sensitive information (API keys?), but with DVC is only checksums paired with files. It would be even simpler if we move to a prefix based approach instead of a path based one, since directories wouldn't have special treatments. |
Best practice - files are editable by humans only. No software writes in files that goes under Git control. If software needs to write something and put under Git control it is better to localize the places when the modification happens.
@MrOutis are you talking about Terraform Cloud? If so, it seems like a different use case that can be implemented on top.
Is it only about the internal, code redesign? Yeah, the separation is needed. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Now all the checksums are scattered among DVC-files. It was a design decision to simplify
git merge
for ML experiments when a single data-file/dvc-stage changes were localized. However, we learned that in many cases-X theirs
strategy is the best way to bring ML experiments to another branch without a manual merging and it is a good time to revisit this design decision.There are two issues with checksums in many DVC-files:
dvc repro
). The changes in repo (changed dvc-files) need to be copied to somewhere (e.g. GitLab artifacts).To solve the issues from the above - it might worth to extract all the checksums into a separate "State"-file. For example:
Dvc.state
or<anyname>.dvcstate
or.dvc/state
Note, this is not the same as the current
.dvc/state
which is an ephemeral (not committed to Git) DB file. The state file needs to be committed to Git.Example: Terraform keeps all the infrastructure configuration in
*.tf
files but stores state in a single, separate fileterraform.tfstate
.Related issues: This FR might be related to a single dag FR #1871
The text was updated successfully, but these errors were encountered: