-
Notifications
You must be signed in to change notification settings - Fork 282
Shutdown in asynchronous context leaves session and connector open #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe there may already be an issue referencing this, although I can't immediately see it. The documentation recommends explicitly stashing and closing the async session yourself, since automatic cleanup is indeed tricky: https://s3fs.readthedocs.io/en/latest/#async |
Thank you for your answer ! I do something like that : s3_endpoint = "http://localhost:19900"
fs = fsspec.filesystem(
"s3",
key="ACCESS",
secret="S3CRET",
endpoint_url=s3_endpoint,
asynchronous=True,
)
store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
dataset = xarray.open_zarr(store, zarr_format=3) I'll try to make a context manager to perform ressource cleaning, as this is for a library and I don't want to put this burden on the final user. EDIT : Note that I do not wish to perform async operation at the moment, however the UserWarning: fs (<s3fs.core.S3FileSystem object at 0x7f654795dfd0>) was not created with `asynchronous=True`, this may lead to surprising behavior |
Zarr should be able to do this for you automatically, I think. The problem is, that the filesystem you make is created outside of an asynchronous context. This means, that by the time of cleanup (probably at interpreter shutdown), there may be no event loop running anymore. |
I had a similar issue but using async S3FS directly with this sort of pattern
Calling:
This had to be called inside the async function, otherwise loop would potentially not be running. I'm wondering if |
I'm afraid not: running on a different loop would be an error; closed loops cannot be restarted; new loops cannot be created at shutdown time. |
Thanks @martindurant Maybe another naive question, but I'm curious why we dont see such warnings when using something like HTTP or I think even the gcs file systems which are both async as well. Is it because this is a problem thats unique to the use of aiobotocore (vs the use of just aiohttp?) No need for a deep investigation but just something I think I observed. Something like HTTPfilesystem seems to follow the more context manager route when dealing with aio sessions so they always get properly closed perhaps. |
http and gcsfs both depend on aiohttp, so they have the same behaviour, and it proves relatively easy to find and close the underlying connection even in non-async code. aiobotocore is better at hiding its internals. The line |
I wonder if the upstream libraries could expose to the end user the explicit way to call something like |
@alessio-locatelli , that is exactly what is suggested in https://filesystem-spec.readthedocs.io/en/latest/async.html#using-from-async . There are two issues:
|
When opening an asynchronous S3FileSsytem, it seems that some aiohttp ressources are not cleaned properly.
I can provide an example that uses xarray, fsspec, s3fs and zarr to create this issue. However I could not find any free zarr dataset. Here is the warning that are raised upong leaving the python interpreter :
IIt seems that a similar issue existed with
gcsfs
, here is the PR that patched it : https://github.com/fsspec/gcsfs/pull/657/files. HTTPFileSystem might have a similar issue : zarr-developers/zarr-python#2674 (comment).I tried to implement a similar thing, but after trying for a few hours I gave up because I am not familiar with Python async programming :(
The text was updated successfully, but these errors were encountered: