Skip to content

dvc: introduce merge-driver #4298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 19, 2020
Merged

dvc: introduce merge-driver #4298

merged 1 commit into from
Aug 19, 2020

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Jul 28, 2020

Related to #4162

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

TODO:

  • more tests

@efiop efiop force-pushed the fix-4162 branch 6 times, most recently from 81f360e to aabf1d7 Compare July 29, 2020 22:54
efiop added a commit to iterative/dvc.org that referenced this pull request Aug 10, 2020
efiop added a commit to iterative/dvc.org that referenced this pull request Aug 10, 2020
@efiop efiop force-pushed the fix-4162 branch 2 times, most recently from 9dbceb7 to 346f169 Compare August 10, 2020 17:59
@efiop efiop changed the title [WIP] dvc: introduce merge-driver dvc: introduce merge-driver Aug 11, 2020
@efiop efiop requested review from skshetry, pared and pmrowla and removed request for skshetry August 11, 2020 08:08
Comment on lines +262 to +259
def merge(self, ancestor, other):
raise NotImplementedError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stubs are for the future, as merging dvcfiles is potentially a handy thing to have.

Comment on lines 604 to 605
# Sorting the list by path to ensure reproducibility
return sorted(result, key=itemgetter(self.tree.PARAM_RELPATH))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar thing in _collect_dir in BaseTree's, but these will be unified as a part of #4144 , once dir hash computation is detached from cache.

@efiop efiop changed the title dvc: introduce merge-driver [WIP] dvc: introduce merge-driver Aug 11, 2020
efiop added a commit to iterative/dvc.org that referenced this pull request Aug 11, 2020
@pared
Copy link
Contributor

pared commented Aug 12, 2020

@efiop
What about conflicting files? I prepared sample script:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet

git add -A
git commit -am "init"

dvc install
echo "*.dvc merge=dvc" >> .gitattributes
git add -A
git commit -m "setup hooks"

git checkout -b one
mkdir data
echo foo >> data/foo

dvc add data
git add -A
git commit -m "add foo"

git checkout master

git checkout -b two
mkdir data
echo bar >> data/foo

dvc add data
git add -A
git commit -m "add foo but bar"

git checkout master

git merge one --no-edit
git merge two --no-edit
# I think we should get error now

Where I have 2 branches:
one that has dvc add-ed data/foo with foo content
two that has dvc add-ed data/foo with bar content

and then, on master I merge one and then two.
My data.dvc gets silently overridden so that data/foo contains bar - the two branch content.
Shouldn't we get some error?

@efiop
Copy link
Contributor Author

efiop commented Aug 15, 2020

@pared Sorry for the delay.

git checkout master
git merge one --no-edit
# I think we should get error now

Not really, because there is nothing in master, so there is no conflict.

git merge two --no-edit

And here you get a conflict, as expected.

@pared
Copy link
Contributor

pared commented Aug 17, 2020

@efiop sorry, I meant that I would expect error after the comment.
And yes, there is conflict, but my .dvc file gets Auto-merged. It is some indicator of failure, but it is easy to omit. I would expect that git will not try to auto-merge. Choosing which file is the proper one should be users responsibility.
Now the content of data/foo depends on the order of merge:

git merge one --no-edit
git merge two --no-edit
# cat data/foo == bar
git merge two --no-edit
git merge one --no-edit
# cat data/foo == foo

If we were doing same thing (tracking data/foo) in git, we would get unresolved conflict.

EDIT:
git equivalent:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
echo other_file >> other_file
git add other_file

git add -A
git commit -am "init"

git checkout -b one
mkdir data
echo foo >> data/foo

git add data
git add -A
git commit -m "add foo"

git checkout master

git checkout -b two
mkdir data
echo bar >> data/foo

git add data
git add -A
git commit -m "add foo but bar"

git checkout master

git merge two --no-edit
git merge one --no-edit
#merge conflict

@efiop
Copy link
Contributor Author

efiop commented Aug 18, 2020

@pared Ah, got it. Yes, that's a bug, thank you! πŸ™ Finally getting to creating additional tests for it, will be sure to add one for that as well.

@efiop efiop changed the title [WIP] dvc: introduce merge-driver dvc: introduce merge-driver Aug 18, 2020
@efiop efiop merged commit ac24b5c into iterative:master Aug 19, 2020
@efiop efiop deleted the fix-4162 branch August 19, 2020 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants