Skip to content

PR: Rewrite ls to use prefixes #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 23, 2016
Merged

PR: Rewrite ls to use prefixes #46

merged 9 commits into from
May 23, 2016

Conversation

martindurant
Copy link
Member

Designed to get around the case where some parts of a bucket are not accessible

Fixes #38

@jseabold , please see if this fixes things for you.

Designed to get around the case where some parts of a bucket are not accessible
@mrocklin
Copy link
Collaborator

@jseabold can you recommend a few tests to ensure that the behavior you're looking for checks out?

@jseabold
Copy link

Will have a look. Working on getting the ACLs for you that would allow you to make a role with these settings. It sounds like our typical settings were best practices from AWS but still trying to chase that down.

Martin Durant added 2 commits May 20, 2016 13:27
ls('') should list buckets, but [] if no bockets or anon.
ls('nonexistent') should raise if can't access it (whether or not it actually
exists)
path = path[len('s3://'):]
path = path.rstrip('/')
bucket, key = split_path(path)
key = key + '/' if key else ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe raise an informative error here if the path doesn't end in a forward slash? Also, key is really a prefix here, if things are called correctly. Maybe use that nomenclature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not raise here, I think it's reasonable to do ls('bucket/path') and expect to get keys below bucket/path/.
OK to calling it prefix.

Martin Durant added 2 commits May 20, 2016 16:21
In list_objects call, these are not really keys
@martindurant
Copy link
Member Author

@jseabold , I'm sorry, I still can't reproduce your situation, where you can't even list some bucket/prefix. The link is about user-level access policies, but I don't have the privileges to create those. Is there any chance we can set up a little bucket which encapsulates this problem for testing? Hare you checked whether my changes above solve the problem for you?

@jseabold
Copy link

This didn't work on the first thing I tried, but there's a lot going on here, so I'm trying to pin down why. Fundamentally, there's something I don't yet understand with how you're instantiating the client. This works

import boto3
session = boto3.Session(profile_name='account')
client = session.client('s3')
client.get_object(Bucket=BUCKET, Key=KEY)

This doesn't

fs = s3fs.S3FileSystem(profile_name='account')
fs.s3.get_object(Bucket=BUCKET, Key=KEY)

@jseabold
Copy link

It's because you're calling ls('') in the instantiation to make sure that everything worked I guess. I can't do this, but I need still need to sign my requests.

@jseabold
Copy link

jseabold commented May 21, 2016

Does this list and cache the entire bucket on instantiation? Again still trying to understand what's going on here, but, if so, that can be a super expensive operation. Assuming that's right, I would start from the premise that everyone has a key-value store with hundreds of thousands of keys (if not more) and err on the side of avoiding trying to list those at all cost.

@jseabold
Copy link

Other than that, yes, this seems to fix my issues in some quick checks. Thanks!

Will let you know if I run in to anything else.

@martindurant
Copy link
Member Author

Oh! Thank you for finding the root (ls("")) - so this is a IAM policy thing, not a bucket permission thing...

That can certainly be avoided - it exists for the anon=None default case, to see if the credentials are valid. I can't immediately see another way to check this (but the check itself could be optional?).

@jseabold
Copy link

jseabold commented May 21, 2016

so this is a IAM policy thing, not a bucket permission thing...

I don't follow this.

Seems like a heavy way to check if credentials are valid. Given that it will happen later, why check at all? If you really want to handhold, you could fail gracefully later and tell the users to check their credentials or use anon=True and don't try to automatically determine this for users, given that there isn't a straightforward way to do it. You could try to use the sts service to generate a session token since that's fairly light, but I don't know the full permissions and failure modes around this for users and it's probably not ok to assume that everyone can do this.

If you attach the session, users can call S3FileSystem.session.get_credentials() and look themselves. I don't see how to get these from the service clients.

@martindurant
Copy link
Member Author

martindurant commented May 21, 2016

Excellent idea!
The check is now done via sts, and I've attached the session too, as you suggest.
Is your problem finally fixed?

Also, you may have seen in #48 that there may be a simpler way coming to access single files one at a time (so you don't get filesystem methods, but you do get file-like behaviour, buffering etc).

@jseabold
Copy link

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't list buckets
3 participants