-
-
Notifications
You must be signed in to change notification settings - Fork 356
Description
In v3, since the storage API is asynchronous, we can open multiple array or groups concurrently. This would be broadly useful, but we don't have a good template from zarr-python
v2 to extrapolate from, so we have to invent something new here (new, relative to zarr-python
, that is).
Over in #1804 @martindurant brought this up, and I suggested something like this:
def open_nodes(store: Store, paths: tuple[str, ...], options: dict[Literal["array", "group"], dict[str, Any]]) -> Array | Group:
...
def open_arrays(store: Store, paths: tuple[str, ...], options: dict[str, Any]) -> Array:
...
def open_groups(store: Store, paths: tuple[str, ...], options: dict[str, Any]) -> Group:
...
I was imagining that the arguments to these functions would be the paths of arrays / groups anywhere in a Zarr hierarchy; we could also have a group.open_groups()
method which can only "see" sub-groups, and similarly for group.open_arrays()
.
An alternative would be to use a more general transactional context manager:
with transaction(store) as tx:
a1_maybe = tx.open_array(...)
a2_maybe = tx.open_array(...)
# IO gets run concurrently in `__aexit__`
a1 = a1_maybe.result()
a2 = a2_maybe.result()
I'm a lot less sure of this second design, since I have never implemented anything like it. For example, should we use futures for the results of tx.open_array()
?
Are there other ideas, or examples from other implementations / domains we could draw from?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status