Skip to content

gdrive: add open #3916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 13, 2020
Merged

gdrive: add open #3916

merged 14 commits into from
Jun 13, 2020

Conversation

casperdcl
Copy link
Contributor

@casperdcl casperdcl commented May 30, 2020

@casperdcl casperdcl added enhancement Enhances DVC refactoring Factoring and re-factoring ui user interface / interaction research labels May 30, 2020
@casperdcl casperdcl requested review from shcheklein and efiop May 30, 2020 22:44
@casperdcl casperdcl self-assigned this May 30, 2020
@restyled-io restyled-io bot mentioned this pull request May 30, 2020
@@ -393,6 +396,23 @@ def _gdrive_download_file(
) as pbar:
gdrive_file.GetContentFile(to_file, callback=pbar.update_to)

@contextmanager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to double check - do we properly pass close down to the PyDrive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydrive2 doesn't expose any close() methods anywhere. I think the underlying httplib2 closes connections before raising errors; so no special handling needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should close if just return from the block ... we should check the logic carefully here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upstream issue: iterative/PyDrive2#44

@casperdcl
Copy link
Contributor Author

casperdcl commented May 31, 2020

hmm tests failing with

ERROR: failed to push data to the cloud - GDrive remote auth failed with credentials in 'GDRIVE_CREDENTIALS_DATA'.
Backup first, remove of fix them, and run DVC again.
It should do auth again and refresh the credentials.

https://travis-ci.com/github/iterative/dvc/jobs/341956240

@codecov
Copy link

codecov bot commented May 31, 2020

Codecov Report

Merging #3916 into master will decrease coverage by 0.18%.
The diff coverage is 92.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3916      +/-   ##
==========================================
- Coverage   92.32%   92.13%   -0.19%     
==========================================
  Files         160      161       +1     
  Lines       11107    11121      +14     
==========================================
- Hits        10254    10246       -8     
- Misses        853      875      +22     
Impacted Files Coverage Δ
dvc/utils/stream.py 85.71% <85.71%> (ø)
dvc/exceptions.py 96.99% <100.00%> (+0.02%) ⬆️
dvc/remote/gdrive.py 81.91% <100.00%> (+0.73%) ⬆️
dvc/utils/http.py 95.34% <100.00%> (+4.04%) ⬆️
dvc/system.py 79.66% <0.00%> (-9.33%) ⬇️
dvc/remote/ssh/connection.py 83.17% <0.00%> (-4.33%) ⬇️
dvc/analytics.py 95.06% <0.00%> (-1.24%) ⬇️
dvc/remote/local.py 95.52% <0.00%> (-0.50%) ⬇️
dvc/repo/__init__.py 97.59% <0.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5efc3bb...8ae3016. Read the comment docs.

tests/remotes.py Outdated
@@ -141,10 +142,13 @@ class GDrive:
def should_test():
return os.getenv(GDriveRemote.GDRIVE_CREDENTIALS_DATA) is not None

def get_url(self):
if not getattr(self, "_remote_url", None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this will break TestGDriveRemoteCLI most likely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I believe we should probably introduce caching of get_url in for other remotes as well - it is strange to return different values within one test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein It actually shouldn't, because other remotes do that too and just reuse the results. Need to check maybe there is something wrong or special about gdrive...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop I think they just call it once, in case of GDrive it just happened that we need it in multiple places. And in general, it's def not a good and safe interface - this value should be cached one way or another and stay the same during the test.


remote_params = [S3, GCP, Azure, OSS, SSH, HDFS]
remote_params = [S3, GCP, Azure, GDrive, OSS, SSH, HDFS]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a lot of tests for functions besides open here - do we expect all of them to pass? we'll need to implement more stuff then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein All the tests should work, there is nothing new required from GdriveRemotes. There might be some issues with Gdrive itself though (creds or something), that differentiates it from other remotes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even it works out of the box, some things do not make much sense - e.g. what will dvc.api.url return for the object? Some URL that no tool (except DVC) can work with?

Re credentials - should be fine (we use an env var to set them, so I would expect external repo to pick it up as well). As I mentioned though, GDrive remote requires some special setup (e.g. it does not create a directory for you).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding get_url, I think that's how the value should look like:

Screen Shot 2020-06-04 at 5 00 38 PM

(id= is the file ID that we can get via API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein is this comment outdated since rebasing on master? Or does tests/remotes.py still need updating?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Could do that, sure. Will require some work to make it do that without first passing it a dvc instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the only other way I see is to rework the current pytest fixture (currently it hides the GDrive object so there's no way to call create_dir)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Both work. Choose the easiest one, we'll be revisiting this later anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried a simple change along the same lines as setup_remote but still failing with the same credentials error https://travis-ci.com/github/iterative/dvc/jobs/348374056#L7952

Copy link
Contributor

@efiop efiop Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl does it work locally with your creds?

efiop added a commit that referenced this pull request Jun 3, 2020
This makes `GDrive.get_url` behaviour consistent with the rest of the
test remotes.

Related to #3916
efiop added a commit that referenced this pull request Jun 4, 2020
This makes `GDrive.get_url` behaviour consistent with the rest of the
test remotes.

Related to #3916
@efiop
Copy link
Contributor

efiop commented Jun 4, 2020

Adjusted GDrive to have a create_dir method that is needed before you can use GDrive remote to push to it. Need to adjust those remote test classes to call it for the setup. @casperdcl could you take a look?

@@ -72,6 +122,7 @@ def test_open(remote_url, tmp_dir, dvc):
@pytest.mark.parametrize("remote_url", all_remote_params, indirect=True)
def test_open_external(remote_url, erepo_dir, setup_remote):
setup_remote(erepo_dir.dvc, url=remote_url)
ensure_dir_scm(erepo_dir.dvc, remote_url)
Copy link
Contributor Author

@casperdcl casperdcl Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop I've added ensure_dir_scm here based off setup_remote.

  • Using ensure_dir instead of ensure_dir_scm, there's
    ERROR: configuration error - config file error: remote 'upstream' doesn't exists.
  • Using run_dvc("remote", "add", ...) results in
    ERROR: configuration error - config file error: remote 'upstream' already exists. Use -f|--force to overwrite it.

@casperdcl casperdcl marked this pull request as ready for review June 13, 2020 18:33
@efiop efiop merged commit 4d140a9 into iterative:master Jun 13, 2020
@efiop
Copy link
Contributor

efiop commented Jun 13, 2020

Thanks @casperdcl ! For the record: s3 test is failing for unrelated reasons.

@efiop
Copy link
Contributor

efiop commented Jun 13, 2020

@casperdcl Please don't forget about the docs 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC refactoring Factoring and re-factoring research ui user interface / interaction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

api: support streaming from Google Drive remotes
3 participants