Skip to content

Improve async handling for s3fs #2901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gbip opened this issue Mar 10, 2025 · 5 comments
Open

Improve async handling for s3fs #2901

gbip opened this issue Mar 10, 2025 · 5 comments
Labels
bug Potential issues with the zarr-python library

Comments

@gbip
Copy link

gbip commented Mar 10, 2025

Zarr version

v3.X

Numcodecs version

v0.15.1

Python Version

3.12

Operating System

Linux

Installation

pip into venv

Description

When opening a zarr using s3fs, some errors related to aiohttp objects are raised when leaving the Python interpreter. From what I understand, it seems that cleaning asynchronous objects should be performed by the filestorage implementation, as in #2674 (comment).

However it is unclear to me if this should be performed by zarr_python or the s3fs implementation, as discussed here : fsspec/s3fs#943.

What do you think ?

Steps to reproduce

import xarray
import fsspec
import zarr

s3_endpoint = "http://localhost:19900"

fs = fsspec.filesystem(
    "s3",
    key="ACCESS",
    secret="S3CRET",
    endpoint_url=s3_endpoint,
    asynchronous=True,
)
store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
dataset = xarray.open_zarr(store, zarr_format=3)

Should raise the following error :

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fe2071f6c60>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x7fe2071ca6f0>, 24766.009516954), (<aiohttp.client_proto.ResponseHandler object at 0x7fe2071cac30>, 24766.015895832)])']
connector: <aiohttp.connector.TCPConnector object at 0x7fe2071ab800>

Additional output

No response

@gbip gbip added the bug Potential issues with the zarr-python library label Mar 10, 2025
@TomAugspurger
Copy link
Contributor

What happens if your application code closes the filesystem (does s3fs offer an async with context? I don't remember)?

async with fs:
    store = zarr.storage.FsspecStore(fs, path="/mybucket/myzarr.zarr")
    dataset = xarray.open_zarr(store, zarr_format=3)

The tricky bit here is that zarr-python doesn't really know who owns the filesystem passed into FsspecStore. We don't necessarily want to close it, since the user might be using it outside of zarr.

Maybe we could make FsspecStore (and the related stores) async context managers and have then close the store upon __aexit__?

@rabernat
Copy link
Contributor

s3fs actually has a custom finalizer that should close these sessions in some cases

https://github.com/fsspec/s3fs/blob/01b9c4b838b81375093ae1d78562edf6bdc616ea/s3fs/core.py#L575-L578

Although it looks like that is not used for asynchronous filesystems.

@christine-e-smit
Copy link
Contributor

I'm guessing this is the same error I'm seeing, even though I'm using s3fs.S3FileSystem? E.g. -

import zarr
import s3fs

s3 = s3fs.S3FileSystem(profile="PROFILE", asynchronous=True)
store = zarr.storage.FsspecStore(s3, path=f"/mybucket/myzarr.zarr")
z = zarr.open(store, mode="r")

which produces:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x11bd89940>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x11bdd0ad0>, 30591.100072875), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd02f0>, 30591.103170083), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd0890>, 30591.104944541), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd1610>, 30591.295504583)])']
connector: <aiohttp.connector.TCPConnector object at 0x11bd89010>

@gbip
Copy link
Author

gbip commented Apr 24, 2025

I'm guessing this is the same error I'm seeing, even though I'm using s3fs.S3FileSystem? E.g. -

import zarr
import s3fs

s3 = s3fs.S3FileSystem(profile="PROFILE", asynchronous=True)
store = zarr.storage.FsspecStore(s3, path=f"/mybucket/myzarr.zarr")
z = zarr.open(store, mode="r")

which produces:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x11bd89940>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x11bdd0ad0>, 30591.100072875), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd02f0>, 30591.103170083), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd0890>, 30591.104944541), (<aiohttp.client_proto.ResponseHandler object at 0x11bdd1610>, 30591.295504583)])']
connector: <aiohttp.connector.TCPConnector object at 0x11bd89010>

Yep to me it's the same error.

@mark-boer
Copy link

I have a similar issue, but mine only seems to pop up, when I use a role to authenticate, when my credentials are saved in ~/.aws/credentials I don't see the same error.

All I need to do to reproduce the errors is:

group = zarr.open_group("s3://path-to-zarr")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

5 participants