Skip to content

Running import-url on S3 URL gives error message "requires existing cache on 's3' remote" #4261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nimrand opened this issue Jul 22, 2020 · 2 comments
Labels
bug Did we break something?

Comments

@nimrand
Copy link

nimrand commented Jul 22, 2020

Bug Report

We have a corpus (i.e., directory of text documents) that are stored in a company-wide S3 data store at s3://my-company-research-data/data/corpus.

When I run dvc import-url s3://my-company-research-data/data/corpus ./local/path, I get an error:

ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it with dvc add. - Current operation was unsuccessful because 's3://my-company-research-data/data/corpus' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.. Per, this thread, this appears to be a bug.

Please provide information about your setup

Output of dvc version:

$ dvc version

DVC version: 1.1.11
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, s3
Repo: dvc, git

Additional Information (if any):

When I run the same command with --verbose, this is what i get:

2020-07-22 17:20:16,566 DEBUG: fetched: [(3,)]                                  
2020-07-22 17:20:16,935 DEBUG: Removing output 'challenge/common/data/COCA-corpus' of stage: 'COCA-corpus.dvc'.
Importing 's3://duolingo-research-data/det/COCA' -> 'challenge/common/data/COCA-corpus'
2020-07-22 17:20:16,936 DEBUG: Computed stage: 'COCA-corpus.dvc' md5: 'fda33f7e862514b4c924b2692aff808d'
2020-07-22 17:20:16,936 DEBUG: 'md5' of stage: 'COCA-corpus.dvc' changed.
2020-07-22 17:20:17,523 DEBUG: fetched: [(0,)]
2020-07-22 17:20:17,524 ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it with `dvc add`. - Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/command/imp_url.py", line 18, in run
    no_exec=self.args.no_exec,
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/__init__.py", line 34, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 54, in imp_url
    stage.run()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/decorators.py", line 35, in rwlocked
    return call()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 424, in run
    sync_import(self, dry, force)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/imports.py", line 29, in sync_import
    stage.save_deps()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 387, in save_deps
    dep.save()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 267, in save
    self.info = self.save_info()
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 191, in save_info
    return self.tree.save_info(self.path_info)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 314, in save_info
    self.PARAM_CHECKSUM: self.get_hash(path_info, tree=tree, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 282, in get_hash
    hash_ = self.get_dir_hash(path_info, tree, **kwargs)
  File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 296, in get_dir_hash
    raise RemoteCacheRequiredError(path_info)
dvc.exceptions.RemoteCacheRequiredError: Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
@jorgeorpinel
Copy link
Contributor

Thanks! I don't think importing should have this condition: requires existing cache on 's3' remote.

@jorgeorpinel jorgeorpinel added the bug Did we break something? label Jul 22, 2020
@efiop
Copy link
Contributor

efiop commented Jul 22, 2020

Hi @nimrand !

Unfortunately, this is a known bug 🙁 #4144 We'll try to get to it in the next sprint (starting next week). Thank you for the feedback! Closing this ticket in favor of #4144 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something?
Projects
None yet
Development

No branches or pull requests

3 participants