-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Endpoint URL is not taken into account when adding an external file from Minio #4151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@lucasmaheo That happens because you use direct
but I would suggest something like:
and then just
🙂 The external workspaces scenario is admitedly not very polished right now and has some flaws, so we've created #3920 to discuss how it should be changed to become better. |
Duplicate of #1280 |
Duplicate of #3441 Also, we have an outstanding docs issue: iterative/dvc.org#108 |
Thank you for the explanation @efiop. Indeed this is a misunderstanding on my part. How do these instructions not end up on the If anyone else ends up on this issue, the following command worked: dvc add remote://s3cache/somefile --external |
@lucasmaheo The reason is that functionality is considered very advanced and not polished enough, so it is even hard to describe it nicely in the docs :( But we do have a few tickets with |
My scenario is hypothetical at this point. The typical use case would be to be able to version data files as well as ML models in a scalable storage. Usually, our projects use cloud storage (or on-premise, cloud-like storage) to have a single reference for data. We are looking for a solution to version efficiently those voluminous datasets, and DVC seems to fit the bill. |
@lucasmaheo Sounds like that is indeed not the best approach. The first problem here is the isolation - if any user of your dvc repo runs |
Oh thanks for the clarification. That is indeed not the behaviour I was looking for. So DVC registries are to be used with local copies, as I understand. Exactly in the same way as git registries, with the exception of the added possibility to select only required files. To circumvent that, we need to use the API. It all makes sense now. At least I now know how to create the data registry. I am expecting that using dvc.api.open() with rev left blank should read from the version of the data that was committed with the current revision of the Git local repository. Now is there a way to stream outputs to the remote registry? Supposing I was reading data from S3 and producing a transformed version of that data iteratively. If the outputs do not fit on disk, I would prefer to output to another location in S3 and after the whole process, push that version of data to the registry. Is that a feature you are looking into (pushing from a remote location)? |
@lucasmaheo You mean kinda like publish them there? Currently there is no way to push it back like that 🙁 But we've heard requests for that. In my mind should be some special operation that does "straight to remote" action. So... something like
that would create |
I feel like that ^ covers the most misuses that we've seen. Maybe it is even worth doing that by default when someone tries to feed a url to
would just create |
Bug Report
Please provide information about your setup
Output of
dvc version
:Additional Information (if any):
I was trying out DVC and I cannot make it work with a local deployment of Minio. Minio is hosted at
127.0.0.1:9000
and works as expected, I tested it.Contents of .dvc/config:
Logs:
After some investigation, dvc does seem to take into account the configuration and the endpointurl. However on this specific boto3 request it does not. I did not go much more into the code to find out why the two s3 clients are generated from different configurations.
Configuration for the failing request:
{'url_path': '/mybucket/textfile', 'query_string': {}, 'method': 'HEAD', 'headers': {'User-Agent': 'Boto3/1.14.14 Python/3.7.7 Linux/4.20.17-042017-generic Botocore/1.17.14'}, 'body': b'', 'url': 'https://s3.amazonaws.com/mybucket/textfile', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f4fd6aaf310>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': 'mybucket'}, 'timestamp': '20200702T130555Z'}}
Configuration loaded by DVC at some point during the call:
{'url': 's3://mybucket', 'endpointurl': 'http://127.0.0.1:9000', 'use_ssl': True, 'listobjects': False}
Any idea as to why this behaviour is happening?
Thanks,
Lucas
The text was updated successfully, but these errors were encountered: