-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add: support virtual operation #9389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9389 +/- ##
==========================================
- Coverage 91.59% 91.56% -0.04%
==========================================
Files 487 488 +1
Lines 37857 38060 +203
Branches 5443 5462 +19
==========================================
+ Hits 34677 34851 +174
- Misses 2624 2645 +21
- Partials 556 564 +8
☔ View full report in Codecov by Sentry. |
506425a
to
40a371b
Compare
@dberenbaum, I have removed prompts from |
445c657
to
46290ab
Compare
@skshetry Let me know when you want me to take a look. Only semi-related and not a blocker for this PR, but we should really have a summary of changes from |
32f6f49
to
1c19be3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
c27641d
to
22423fe
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Just one edgecase regarding
Plain But what should Similarly, it feels like users should be able to do |
@dberenbaum, I decided not to implement support for external outputs, so this should be ready, whenever you have time. |
40224cb
to
110bd2f
Compare
dvc/commands/commit.py
Outdated
commit_parser.add_argument( | ||
"-f", | ||
"--force", | ||
action="store_true", | ||
default=False, | ||
help="Commit even if hash value for dependencies/outputs changed.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a way, removing prompts could be seen as a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the option itself is more breaking, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seems like a completely unrelated change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @daavoo. I will bring the flag back. But still the default behavior that it no longer fails is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry Looks amazing! I did a little QA and everything worked as expected. I even like the trick to use dvc add missing_path
to remove a file.
But what should
dvc commit data
do? Ideally, I'd argue that it should also update dependencies, not just output.
I'm not sure I follow this part. Is data
something with a data.dvc
file? What would be its dependencies?
yes, stages:
stage1:
cmd: python train.py
deps: [data]
outs: model.pkl I guess my question is, should EDIT: related #2094. |
We show a hint like following in DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
(use "dvc checkout <file>..." to discard changes)
deleted: dataset/train/0/00002.png Now, the only thing that does not work is |
I just had an interesting discussion with @efiop about The current behaviour of When you do
Also see Rapid Iterations. My understanding was that
And even though docs agree to a certain extent, it seems like the main motivation behind commit seems to be storing files to cache if the hashes in So, that's why prompts exist, and @efiop said that it'll be a major breaking change to the commit. So, now with granular commit, what does But if we go with the current implementation, it'll mean checking if the files for the whole output match with the hash in the |
Thank you, @skshetry ! 🙏 I also had a discussion with @dberenbaum and he also agrees so far that the way At the same time that But also So maybe we could indeed just break Maybe @dberenbaum could share additional thoughts as well. |
@efiop, if you are okay with this, do you mind reviewing this PR? Thanks. :) |
No, I think it's important that it doesn't so that it's clear that |
Should we change that hint from @skshetry Is that the primary motivation for changing |
556ea1b
to
ec29a1c
Compare
I have split this PR and merged I find DVC does not have staging, so Even today, |
To try this out, make sure to re-install all dependencies.
pip install -e ".[dev]"
The
add
andcommit
takes a path which can be granular.If it's granular, the path can be of a file or a directory. If the path does not exist, it'll be considered as a remove operation and it will try to commit without the files under that prefix. Otherwise, it'll try to update the dataset.
If the path is already being tracked and is not a granular path, in case of commit, it'll fail and in add, it'll create a new stage as expected.
You can use
dvc data status
ordvc-data diff <old.dir hash> <new.dir hash>
to find changes.