-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[WIP] cache/remote: drop dos2unix MD5 by default #5449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dvc/utils/__init__.py
Outdated
@@ -41,8 +41,8 @@ def _fobj_md5(fobj, hash_md5, binary, progress_func=None): | |||
progress_func(len(data)) | |||
|
|||
|
|||
def file_md5(fname, tree=None): | |||
""" get the (md5 hexdigest, md5 digest) of a file """ | |||
def file_md5(fname, tree=None, enable_d2u=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create two functions:
file_md5
file_dos2unixmd5
or smth like
get_md5
get_dos2unixmd5
just so we don't get tangeled in it ourselves :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though a flag is good too. Nevermind π
dvc/objects.py
Outdated
if tree.PARAM_CHECKSUM == name: | ||
return tree.get_hash(path_info, **kwargs) | ||
|
||
if name == "md5": | ||
from dvc.hash_info import HashInfo | ||
from dvc.odb.versions import ODB_VERSION | ||
from dvc.utils import file_md5 | ||
|
||
if odb.version == ODB_VERSION.V1: | ||
kwargs["enable_d2u"] = True | ||
return HashInfo("md5", file_md5(path_info, tree, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stuff under if name == "md5"
is defunct, just added to clarify the intentions.
We should actually do tree.get_hash(path_info, name, **kwargs)
and so then, for example, LocalTree.get_hash will see that we ask it about md5
and will use file_md5
and if it sees md5dos2unix
- file_md5dos2unix
Sorry for the noise, I understand that this is a WIP and needs to be adapted. |
@@ -7,6 +8,17 @@ | |||
DirInfo = Dict[str, str] | |||
|
|||
|
|||
class HashName(str, enum.Enum): | |||
MD5 = "md5" # Raw MD5 | |||
MD5_D2U = "md5-d2u" # DVC dos2unix MD5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's written right now, for users who have explicitly opted into using dos2unix, we will write a config file containing version 1.0
, and dvc.lock files will now contain hashes like
md5-d2u: abc123
This seems more "correct" to me versus continuing to write lock files with md5: abc123
, but the downside would be that anyone on DVC < 2.0 won't be able to read lock files with md5-d2u
hash names.
with NamedTemporaryFile() as tmp: | ||
with modify_yaml(tmp.name) as data: | ||
data.update(config) | ||
tmp.seek(0) | ||
fs.upload_fobj(tmp, path_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good use case for MemoryFileSystem
, but for some reason the fsspec memory implementation of fs.open()
does not play nicely with the edit-in place modify_yaml
functionality and I didn't think it was worth digging into the issue at this point.
Still need to investigate why the last 2 tests pass for me locally but fail in CI |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Will close #4658 (replaces #5337)