Skip to content

FSMap: Strange behavior for check and create parameters on S3. #733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tasansal opened this issue Aug 23, 2021 · 4 comments
Open

FSMap: Strange behavior for check and create parameters on S3. #733

tasansal opened this issue Aug 23, 2021 · 4 comments

Comments

@tasansal
Copy link

tasansal commented Aug 23, 2021

Description

We were using the check and create parameters to make sure we have write permission to the bucket before we do a lot of time-consuming work.

When using s3fs backend, and potentially any object store, we are running into this issue. Did some debugging and after some discussions with other contributors, it seems like fsspec is the right place to handle this behavior.

Workflow:

  1. We have a bucket, and want to create a zarr file. As you may already know a zarr root is a directory.
  2. Before we scan a big binary file and start writing to zarr we want to check if we can access the bucket and write to it.
  3. Hence we were trying to use check and create for convenience. In this case the path we want to create is my_bucket/my_file.zarr
  4. However, even though we pass check and create when this file (folder) doesn't exist, it still fails.

This is because (Also see related discussions at the end):
create doesn't "really" create a directory, as we would normally create in the cloud console.
check calls fs.exists which in this case is a S3FileSystem.exists(root) instance and it returns False, even though we ran the mkdir before that.

Are these checks necessary anymore since backends can raise exceptions related to write permissions?
Or should we handle it differently in fsspec to work as expected on cloud backends?
Or should the users handle it on their own code?

https://github.com/intake/filesystem_spec/blob/06d1c7c9668377d37137e11dc7db84fee14d2d1b/fsspec/mapping.py#L47-L57

Related Issue/Discussions:
zarr-developers/zarr-python#814
fsspec/s3fs#401 (comment)

@martindurant

@martindurant
Copy link
Member

I recommend removing the exists part of the check and only including the write/rm part. In fact, I don't think check=True is used often, which is why this wasn't picked up before.

@tasansal
Copy link
Author

tasansal commented Aug 23, 2021

I recommend removing the exists part of the check and only including the write/rm part. In fact, I don't think check=True is used often, which is why this wasn't picked up before.

Sounds good to me. I will fork and get to work on it.

Two questions:

  1. Could the proposed changes cause any major issues with local file systems?
  2. What do you suggest we do with the create part (maybe remove it completely)? That is also redundant for s3. Maybe still needed for local?

@martindurant
Copy link
Member

  1. I don't see why, no
  2. create is not redundant for s3, it just has no effect; you could see it as "ensure a writable space", which in s3 already was the case.

@martindurant
Copy link
Member

This seems to have become lots. @telamonian , were you intending to work on fixing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants