-
Notifications
You must be signed in to change notification settings - Fork 382
Forbidden 403 on missing chunk in remote Zarr #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was a deliberate decision, so that (intermittent) connection errors do not show up as false FileNotFound errors, which would mean chunks being filled by the default fill value. I can't think of an easy way to make our implementation more flexible in this regard: not being able to access a file is not the same as the file not existing. There could be a flag to the mapper to translate any fetch error to KeyError at the user's choice, which would risk the zarr problems from before. |
Hmmm..... thanks, @martindurant. cc'ing @alimanfoo while I do some more reading. Edit: In most of the text of #255, Alistair mentions "transient exceptions" except for in the one sentence
You go on to mention permissions error, @martindurant. I think this is a request for such permission errors to not be handled as transient errors. |
FWIW I think I'd suggest fsspec is doing the right thing in principle here, in the sense that it should only raise a KeyError when it definitely knows that a file does not exist. I.e., it can translate FileNotFound -> KeyError, but any other type of exception should be propagated as-is. But it seems weird that the service is responding with a 403 here, you don't need to list the bucket, you're just attempting to read an object that doesn't exist, surely 404 would be the right response. |
I'll look into the "spec" (or at least compare how other implementations handle this), but to some degree, I'd concur with the 403 -- by telling someone a key doesn't exist the server would be giving away some knowledge of the underlying system but it has no idea who I am and therefore shouldn't tell me anything. |
FWIW you can argue it the other way around. E.g., GitHub gives you a 404 if you try to access a resource that you don't have permission to. |
Oh, granted. Guess that's my point: there are arguments both ways. Current status of research:
However, I do wonder if it doesn't suffice that 403 is not transient, since that is what you were concerned about. (https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html certainly lists 403, but when is less clear....) |
Yeah, I guess this means that, whether you receive a 403 or 404, in both cases the true error could be either file not found or bad permissions. Ugh. Makes it hard to figure out what fsspec and zarr should do. The thing I'm most concerned about is, when is it safe for zarr to assume a chunk doesn't exist and fill it with missing values. |
For what it's worth,
at https://github.com/intake/filesystem_spec/blob/c1c1176bf53eedc7d3d0e0ff1ecf07eb2e53d86a/fsspec/implementations/http.py#L108 let's the dask array computation continue. cc: @will-moore |
Ping @martindurant @alimanfoo -- Sorry, I let this slip over the summer slump (and was just cc: @jni |
Also an update that I just realized, in @jni's bug report, the error is a 404:
I imagine the change from 403 to 404 stems from this bucket recently being made public. code:
stack trace:
|
Primarily driven by masks with missing chunks which currently lead to failures (see issue below) but also by a larger number of masks leading to performance issues, default to having mask labels added by set to invisible. see: fsspec/filesystem_spec#342 (comment)
A 404 should definitely become FileNotFound |
Note that zarr-developers/zarr-python#546 proposes making the list of exceptions to be regarded as "missing" (i.e., KeyError) configurable in the storage driver. |
Both direct HTTP access and S3FileSystem access of an S3 store fail with a PermissionsError if a Zarr chunk does not exist.
s3fs details
http details
The server is known to be quite restrictive, disallowing directory listings, etc. The only workaround I can think of is to create the missing chunks with the fill value which I'd like to avoid since this will be repeated for millions of images.
The text was updated successfully, but these errors were encountered: