Skip to content

[WIP] cache/remote: drop dos2unix MD5 by default #5449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Feb 11, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Will close #4658 (replaces #5337)

@pmrowla pmrowla self-assigned this Feb 11, 2021
@@ -41,8 +41,8 @@ def _fobj_md5(fobj, hash_md5, binary, progress_func=None):
progress_func(len(data))


def file_md5(fname, tree=None):
""" get the (md5 hexdigest, md5 digest) of a file """
def file_md5(fname, tree=None, enable_d2u=False):
Copy link
Contributor

@efiop efiop Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create two functions:

file_md5
file_dos2unixmd5

or smth like

get_md5
get_dos2unixmd5

just so we don't get tangeled in it ourselves :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though a flag is good too. Nevermind πŸ™‚

dvc/objects.py Outdated
Comment on lines 23 to 33
if tree.PARAM_CHECKSUM == name:
return tree.get_hash(path_info, **kwargs)

if name == "md5":
from dvc.hash_info import HashInfo
from dvc.odb.versions import ODB_VERSION
from dvc.utils import file_md5

if odb.version == ODB_VERSION.V1:
kwargs["enable_d2u"] = True
return HashInfo("md5", file_md5(path_info, tree, **kwargs))
Copy link
Contributor

@efiop efiop Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stuff under if name == "md5" is defunct, just added to clarify the intentions.

We should actually do tree.get_hash(path_info, name, **kwargs) and so then, for example, LocalTree.get_hash will see that we ask it about md5 and will use file_md5 and if it sees md5dos2unix - file_md5dos2unix

@efiop
Copy link
Contributor

efiop commented Feb 11, 2021

Sorry for the noise, I understand that this is a WIP and needs to be adapted.

@@ -7,6 +8,17 @@
DirInfo = Dict[str, str]


class HashName(str, enum.Enum):
MD5 = "md5" # Raw MD5
MD5_D2U = "md5-d2u" # DVC dos2unix MD5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's written right now, for users who have explicitly opted into using dos2unix, we will write a config file containing version 1.0, and dvc.lock files will now contain hashes like

md5-d2u: abc123

This seems more "correct" to me versus continuing to write lock files with md5: abc123, but the downside would be that anyone on DVC < 2.0 won't be able to read lock files with md5-d2u hash names.

Comment on lines +48 to +52
with NamedTemporaryFile() as tmp:
with modify_yaml(tmp.name) as data:
data.update(config)
tmp.seek(0)
fs.upload_fobj(tmp, path_info)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good use case for MemoryFileSystem, but for some reason the fsspec memory implementation of fs.open() does not play nicely with the edit-in place modify_yaml functionality and I didn't think it was worth digging into the issue at this point.

@pmrowla
Copy link
Contributor Author

pmrowla commented Feb 15, 2021

Still need to investigate why the last 2 tests pass for me locally but fail in CI

@pmrowla pmrowla closed this Feb 15, 2021
@pmrowla pmrowla deleted the odb-config branch February 15, 2021 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop dos2unix behaviour
2 participants