Skip to content

Writing to a new file in an existing bucket throws "unspecified location constraint" error as it incorrectly tries to create new bucket #1404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DanielTsiang opened this issue Oct 24, 2023 · 6 comments

Comments

@DanielTsiang
Copy link
Contributor

DanielTsiang commented Oct 24, 2023

Writing to a new file in an existing bucket throws "unspecified location constraint" error as it incorrectly tries to create new bucket

Problem

Using the current latest versions of fsspec and s3fs i.e. 2023.10.0, when I tried to write to a new file inside a "subdirectory" in S3 that didn't exist yet via:

with fsspec.open("S3://existing-bucket-name/existing-key/new-key/new-file", "w") as file:
    df.to_csv(file, index=False)

I got this error, even though the S3 bucket already exists:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/aiobotocore/client.py", line 383, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  ...
    with fsspec.open("S3://existing-bucket-name/existing-key/new-key/new-file", "w") as file:
  File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 452, in open
    out = open_files(
  File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 293, in open_files
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/usr/local/lib/python3.10/site-packages/fsspec/core.py", line 293, in <listcomp>
    [fs.makedirs(parent, exist_ok=True) for parent in parents]
  File "/usr/local/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/usr/local/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 914, in _makedirs
    await self._mkdir(path, create_parents=True)
  File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 899, in _mkdir
    await self._call_s3("create_bucket", **params)
  File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 348, in _call_s3
    return await _error_wrapper(
  File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 140, in _error_wrapper
    raise err
PermissionError: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

This happens because of the parameter auto_mkdir=True in the open_files() function here.

Workaround

I managed to resolve this error by changing the command to the following, i.e. adding auto_mkdir=False as a kwarg:

with fsspec.open("S3://existing-bucket-name/existing-key/new-key/new-file", "w", auto_mkdir=False) as file:
    df.to_csv(file, index=False)
@martindurant
Copy link
Member

martindurant commented Oct 26, 2023

This has come up in previous issues. Essentially, it is right to try to create the bucket in this case (accepting auto_mkdir), since s3 doesn't provide a guaranteed way to know whether the bucket exists. I suggest that makedirs should

  • first check if the bucket is in dircache, in which case pass
  • see if we can determine that our create bucket call fails due to bucket already existing, and pass if yes
  • maybe ignore all errors when exist_ok=True

Alternatively, in fsspec.core, we could write the file whether or not the mkdirs succeeds, and provide an informative message if that then fails in turn.

@DanielTsiang
Copy link
Contributor Author

DanielTsiang commented Oct 26, 2023

see if we can determine that our create bucket call fails due to bucket already existing

The problem with this approach is that in a production environment, it is unlikely that permission for create bucket has been granted to the service account, only permission to write new files.

From the following Stack Overflow threads, there are a few alternatives suggested to check whether an S3 bucket already exists:

Perhaps one of these alternatives can be used instead to check whether the S3 bucket already exists, to prevent running into errors when trying to create an S3 bucket (when we already knows it exists beforehand)?

@martindurant
Copy link
Member

Indeed, an HTTP HEAD call to f"https://{bucket}.s3.amazonaws.com" is enough to ascertain if it exists or not (not going to use boto3!). But why would this be any faster than trying to create the bucket and failing?

@DanielTsiang
Copy link
Contributor Author

You're right in that it wouldn't be any faster. What I'm suggesting is, if I only want to write a file to an S3 bucket that already exists, it should not fail when I have permission to write a file but not permission to create a bucket.

Whether we use an alternative way to check for an S3 bucket that already exists, or we try to create the bucket but catch permission-related errors is fine by me but in the latter case we would still need an alternative way to check whether the bucket exists or not unless we just proceed with attempting to write the file anyway.

Alternatively, there could be a different flag which controls whether to check if the bucket exists before attempting to write to a file in it.

@martindurant
Copy link
Member

we just proceed with attempting to write the file anyway

yes, this was one of my suggestions. Would you like to contribute the change?

there could be a different flag

You can already opt out of the behaviour with auto_mkdir=False

@DanielTsiang
Copy link
Contributor Author

Thanks Martin. I've attempted to address this issue by opening the following PR:

Comments on how to improve the PR will be most welcome 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants