Skip to content

import: allow downloading regular files/dirs tracked by git #2889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 17, 2019

Conversation

Baranowski
Copy link
Contributor

@Baranowski Baranowski commented Dec 4, 2019

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Closes #2862

@Baranowski
Copy link
Contributor Author

This is rebased on top of #2837 by danihodovic (for some reason I couldn't find his repo on the list of available forks when submitting the PR).

to.checkout()
except NoOutputInExternalRepoError as e:
try:
with self._make_repo(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's an idiomatic way to avoid this duplication from fetch(), so I left it like this and I'm open to suggestions.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Baranowski ! Thanks for the PR πŸ™ We need to also test this case #2837 (comment) . Here is an example:

#!/bin/bash

set -x
set -e

rm -rf mytest
mkdir mytest
cd mytest

mkdir erepo
pushd erepo
git init
dvc init
dvc run -O foo 'echo foo > foo'  # note that foo is not cached by dvc, hence the "-O"
git add .
git commit -m "init"
popd

mkdir repo
pushd repo
git init
dvc init
dvc import ../erepo foo
popd

@Baranowski
Copy link
Contributor Author

Related dvc.org issue: iterative/dvc.org#835

@Baranowski
Copy link
Contributor Author

@efiop, sure I will see if danihodovic writes the test soonish so that I can copy it for import. If not, I will write it myself.

@efiop
Copy link
Contributor

efiop commented Dec 4, 2019

@Baranowski The tests would be different, so you could work async, I guess :) Looks like the only piece that you are reusing here is _copy_git_file, which is not that crucial.

@Baranowski
Copy link
Contributor Author

@efiop, added. The test passed without code modifications, somewhat to my surprise.

@efiop
Copy link
Contributor

efiop commented Dec 5, 2019

@Baranowski Btw, please note that get PR was revisited so you might want to adjust your PR accordingly. Sorry for this clash of PRs πŸ™

@Baranowski Baranowski force-pushed the import_regulars.squashed branch from 49707d4 to 0738afd Compare December 5, 2019 18:26
@jorgeorpinel jorgeorpinel changed the title WiP import: allow downloading regular files/dirs tracked by git [WIP] import: allow downloading regular files/dirs tracked by git Dec 5, 2019
@Baranowski Baranowski force-pushed the import_regulars.squashed branch from 0738afd to 7fb0a68 Compare December 5, 2019 19:26
@Baranowski
Copy link
Contributor Author

Thanks for pointing that out, @efiop. Rebased and updated.

@efiop
Copy link
Contributor

efiop commented Dec 10, 2019

@Baranowski Fixed get implementation, please rebase. Sorry for the delay πŸ™

@Baranowski Baranowski force-pushed the import_regulars.squashed branch from 7fb0a68 to e137887 Compare December 11, 2019 12:22
@Baranowski Baranowski changed the title [WIP] import: allow downloading regular files/dirs tracked by git import: allow downloading regular files/dirs tracked by git Dec 11, 2019
@Baranowski Baranowski marked this pull request as ready for review December 11, 2019 12:25
@Baranowski
Copy link
Contributor Author

@efiop, this is rebased and ready for a review

@efiop
Copy link
Contributor

efiop commented Dec 11, 2019

@Baranowski Please check the tests, they are failing on py2.

to.checkout()
try:
if self._copy_if_git_file(to.fspath):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: Not setting to.info as we do down below is fine, as git files are tiny and the hash will be computed later in the output itself.

@Baranowski Baranowski force-pushed the import_regulars.squashed branch from 9aa4c06 to 8985759 Compare December 14, 2019 12:43
def _git_status(self):
cache_dir = self.repo.cache.local.cache_dir
with self._make_repo(
cache_dir=os.path.join(cache_dir, "old")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass cache dir, as you are not pulling anything anyway

Suggested change
cache_dir=os.path.join(cache_dir, "old")

cache_dir=os.path.join(cache_dir, "old")
) as old_repo:
with self._make_repo(
cache_dir=os.path.join(cache_dir, "new"), rev_lock=None
Copy link
Contributor

@efiop efiop Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Suggested change
cache_dir=os.path.join(cache_dir, "new"), rev_lock=None
rev_lock=None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without cache_dir specified explicitly, both repos seem to be pulled into the same directory, and so the update to the imported file is never detected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Baranowski But cache_dir has nothing to do with the update, right? So I suppose the reason might be a bug in the caching system in dvc/external_repo.py?

Copy link
Contributor Author

@Baranowski Baranowski Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, @efiop, I think I can see the bug in external_repo.py. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Baranowski please see #2889 (comment) . Let's create a separate ticket.

Comment on lines 41 to 43
assert os.path.isfile(str(tmp_dir / dst))
assert filecmp.cmp(str(erepo_dir / src), str(tmp_dir / dst), shallow=False)
assert tmp_dir.scm.repo.git.check_ignore(str(tmp_dir / dst))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only use str() on Path-like objects for presentation purposes, to get str file path one should use fspath(). Or fspath_py35() when passing to an util working with Path-likes in Python 3.6+.

Also, Path has .is_file() and .is_dir(). May also skip .exists() since it is covered with is_file() anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@efiop efiop requested review from pared and a user December 17, 2019 01:48
@efiop efiop merged commit 032d094 into iterative:master Dec 17, 2019
@efiop
Copy link
Contributor

efiop commented Dec 17, 2019

Thanks @Baranowski ! πŸ™

Baranowski added a commit to Baranowski/dvc.org that referenced this pull request Dec 17, 2019
@Baranowski
Copy link
Contributor Author

Other than the dvc status issue, there are also two things worth pointing out:

  • dvc update for Git-tracked import-ed files fails with the same error msg as dvc status,
  • You cannot dvc import a file from a Git repo that's not a DVC repo. There is still must be a DVC repo in the remote Git repo. It's sufficient if the DVC repo is empty.

Apologies for raising this only after merging the PR.

@shcheklein
Copy link
Member

@Baranowski @efiop create issue just to keep track of those?

@Baranowski
Copy link
Contributor Author

Created #2976 and #2977

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import: allow downloading regular files/dirs tracked by Git?
4 participants