-
Notifications
You must be signed in to change notification settings - Fork 1.2k
remote: use .dir checksum existence to infer file contents existence #3632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- `used_cache()`/`get_used_cache()` in repo/stage/output now return tuples of (dir_cache, file_cache) instead of one flat/merged cache
- affects all commands which use `cache_exists()` (remote status)
dvc/output/base.py
Outdated
@@ -429,16 +436,14 @@ def get_used_cache(self, **kwargs): | |||
) | |||
) | |||
logger.warning(msg) | |||
return NamedCache() | |||
return None, NamedCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to push dir
into NamedCache? So that NamedCache could contain both dirs and files(e.g. `cache["local"].dirs)? Just an idea, not sure if it would be nicer looking in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NamedCache supports nesting cache items (for .dir
checksums) after 02aaf42, the end result is a bit cleaner now
@pmrowla Please check the tests, travis is failing. |
@efiop test issue is resolved |
@pmrowla Check DS please too, e.g. this https://deepsource.io/gh/iterative/dvc/run/b2b2c666-0af4-4fb4-aed2-0210d2946dfc/python/PYL-R1704/ doesn't look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor test mocker decision to be made.
Co-Authored-By: Saugat Pachhai <[email protected]>
) | ||
return dict(dir_status, **file_status) | ||
|
||
def _status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit late to the party :) excellent PR overall. will mention a few things that I noticed reviewing that will might help later somehow.
this function definitely wants some split I would say... extract presentation logic out for example, and the that checks dir and excludes it? or may be some other refactoring ... but it's too long and complicated now
overall "internal-client" functions should be easy to read, even if it feels that extraction does not make much sense (e.g. helpers that are used only in a single place) think how will developers read this. It's easier to read story-like main function and go into details if needed
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
push
/gc
behavior changes from #3600status
: (also affectspush
/pull
) when checking .dir checksum exists on the remote, if the .dir checksum exists, presume that checksums for all files contained in that directory also exist on the remotepush
: push .dir checksums after file checksums. If any file in the directory fails to be uploaded, do not upload the .dir filegc
: remove .dir checksums before file checksums