-
Notifications
You must be signed in to change notification settings - Fork 1.2k
get-url
and import-url
doesn't seem to work with S3 buckets anymore.
#4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I found out a couple of things:
That kind of solves the issue, but I don't get the logic behind this. Why would I need a cache in a separate bucket just to download the file from a completely different bucket? Seems weird because I need to download the file to my local machine anyway in order to compute hashes |
@anotherbugmaster This is a well known bug that became more intrusive once we've adjusted the way we process inputs in get-url and import-url. It will be improved in the near future. |
Currently we kinda assume that whatever is returned by `get_file_hash` is of type self.PARAM_CHECKSUM, which is not actually true. E.g. for http it might return `etag` or `md5`, but we don't distinguish between those and call both `etag`. This is becoming more relevant for dir hashes that are computed a few different ways (e.g. in-memory md5 or upload to remote and get etag for the dir file). Prerequisite for iterative#4144 and iterative#3069
Currently we kinda assume that whatever is returned by `get_file_hash` is of type self.PARAM_CHECKSUM, which is not actually true. E.g. for http it might return `etag` or `md5`, but we don't distinguish between those and call both `etag`. This is becoming more relevant for dir hashes that are computed a few different ways (e.g. in-memory md5 or upload to remote and get etag for the dir file). Prerequisite for #4144 and #3069
Prerequisite for iterative#4144 and iterative#3069
Related to iterative#4144 , iterative#3069 , iterative#1676
* dvc: use HashInfo Related to #4144 , #3069 , #1676 * Update dvc/tree/s3.py Co-authored-by: Saugat Pachhai <[email protected]> Co-authored-by: Saugat Pachhai <[email protected]>
By itself `self.info` is quite confusing, as it is not clear what it is about. Using `hash_info` makes much more sense and is required to support alternative hash types. Related to iterative#4144, iterative#3069, iterative#1676
Uh oh!
There was an error while loading. Please reload this page.
Bug Report
dvc init --no-scm
dvc import-url s3://some_bucket/some_target -v
dvc get-url s3://some_bucket/some_target -v
The same commands work with
https://some_domain/some_target
urls and I don't think that external cache were ever necessary to download files from S3.Please provide information about your setup
Output of
dvc version
:Additional Information (if any):
If applicable, please also provide a
--verbose
output of the command, eg:dvc add --verbose
.The text was updated successfully, but these errors were encountered: