Skip to content

Can't list buckets #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue May 11, 2016 · 19 comments · Fixed by #46
Closed

Can't list buckets #38

jseabold opened this issue May 11, 2016 · 19 comments · Fixed by #46

Comments

@jseabold
Copy link

I have some AWS credentials for which I can't list buckets. I almost never have bucket level permissions, but I do have permissions on prefixes. So something like

fs = s3fs.S3FileSystem(profile_name='user')
fs.ls('s3://home/sseabold/prefix')

Won't work because I can't list any buckets (S3Client.list_buckets fails), and I also don't generally have permissions at the bucket level.

Similar issues to those here [1, 2] for the default (old) boto behavior.

[1] conda/conda#2126
[2] blaze/odo#448

@martindurant
Copy link
Member

I don't yet have a solution for this, we might indeed have to change the way that ls works.
In the meantime, you can still access details of individual files using info() and open them.

@jseabold
Copy link
Author

It looks like info has the same issue with bucket permissions.

@martindurant
Copy link
Member

Apologies, seems we only implemented the direct info() for the S3File object, i.e,. one you have already opened. Would info() on a given file, sidestepping the ls solve this problem for you?

@jseabold
Copy link
Author

Ah, I suppose. I was just looking at info for the first time. My main use is for list. No worries. s3 ACLs are a pain.

@jseabold
Copy link
Author

jseabold commented May 18, 2016

It seems that all of my errors that I've dug in to are coming down to things that assume I can list the bucket. The most recent dask error is because of the list_objects paginator [1]. It tries to list the objects on the bucket rather than the prefix that I asked for in dask.DataFrame.read_csv.

I don't have my head completely around things here yet and the paginator stuff is new to me, but my files are in a bucket that has a lot of objects and I'm not interested in any of them. I only want the keys in the prefix I'm asking for, which I have permissions for and there are only a few. The dask error comes because I called read_csv on an object from a single key and it's listing the whole bucket AFAICT.

[1] https://github.com/dask/s3fs/blob/master/s3fs/core.py#L256

@jseabold
Copy link
Author

I wonder if you shouldn't also pop out the prefix from the path in the S3FileSystem._ls and pass that along for pagination. I assume this is why all of the AWS SDK stuff is very particular about trailing slashes to identify prefixes vs keys.

Happy to test any changes.

@martindurant
Copy link
Member

It is a valid model.
The initial reason for wanting to list the whole bucket rather than the prefix/delimiter model, is that every list call is pretty slow, so caching once is much faster for nested key structures. Let me consider, and could either have using prefixes could be optional, or the caching scheme a bit smarter.

@jseabold
Copy link
Author

Sure that makes sense.

Just as another data point, as you noted the list call is slow, and a typical bucket I use could take a very, very long time to list, assuming I even could. I've run into this before just on a prefix in this bucket, and I ended up killing the operation I was doing. I worry a bit about the cost of trying to make this key-value store too filesystem like, but that's up to you. I think users will understand that s3 is not a filesystem and that there are limitations here.

Sorry to bother about all of this. I generally get pretty far with boto(3) myself, but now that dask is and pandas is going to use s3fs, I'm having to rewrite utility functions to make things work for me, and I want to raise the issues I'm going to run into.

@martindurant
Copy link
Member

Your help in identifying issues is totally appreciated, no apology required!

@kennethreitz
Copy link

I am also running into this, trying to use s3contents with a restricted bucket with an IAM that doesn't allow BucketListing.

@martindurant
Copy link
Member

@kennethreitz , s3fs lists a bucket with a given prefix since a long time. Can you restrict you listing just to areas of the bucket that you do have access to? You should be able to read specific paths even if that doesn't work, since the fallback is to call head_object rather than listing any part of the bucket.

@kennethreitz
Copy link

Let me show you the exception.

@kennethreitz
Copy link

Traceback (most recent call last):
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/bin/jupyter-lab", line 11, in <module>
    sys.exit(main())
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-7>", line 2, in initialize
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/notebook/notebookapp.py", line 1505, in initialize
    self.init_configurables()
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/notebook/notebookapp.py", line 1214, in init_configurables
    log=self.log,
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3contents/s3manager.py", line 41, in __init__
    delimiter=self.delimiter)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3contents/s3_fs.py", line 54, in __init__
    self.init()
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3contents/s3_fs.py", line 59, in init
    self.isdir("")
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3contents/s3_fs.py", line 91, in isdir
    exists = self.fs.exists(path_)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3fs/core.py", line 575, in exists
    if key or bucket not in self.ls(''):
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3fs/core.py", line 369, in ls
    files = self._ls(path, refresh=refresh)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3fs/core.py", line 360, in _ls
    return self._lsbuckets(refresh)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/s3fs/core.py", line 328, in _lsbuckets
    files = self.s3.list_buckets()['Buckets']
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/botocore/client.py", line 324, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Volumes/KR/.local/share/virtualenvs/jupyter-test-A6nGQDyR/lib/python3.6/site-packages/botocore/client.py", line 622, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied

@kennethreitz
Copy link

kennethreitz commented Feb 23, 2018

ListBuckets is trying to be called, even though I only specified a specific bucket. This is likely an implementation detail of s3contents, but I experimented with the code a bit, and the problem seems to lie within s3fs.

@martindurant
Copy link
Member

OK, I see what you are doing.
Yes, I think this is something for s3content to sort out: figuring out whether a given prefix is directory-like involves listing the parent prefix, which may not be allowed. An alternative would be to do an ls on the prefix in question and see what comes back.

@kennethreitz
Copy link

interesting, I'll give that a try.

@xiaoyu-wu
Copy link

https://github.com/dask/s3fs/blob/bfd5de29270a0063935889ce089f84b3f803012b/s3fs/core.py#L791
Would it make more sense to change to self.ls(bucket) in this line? Some credentials have limited ACL to see other buckets.

@martindurant
Copy link
Member

Yes agreed, we should check in ls('') first, and if not found, try ls(bucket) - if it succeeds, the bucket exists.

@ClaytonJY
Copy link

chiming in as another user bit by this. I was able to get ListBuckets permissions in this case, but it would be nice to not need that. At the same time, I'm told people run into this issue with plain ol boto3 all the time too, so ¯_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants