Skip to content

Add asynchronous load method #10327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from
Draft

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 16, 2025

Adds an .async_load() method to Variable, which works by plumbing async get_duck_array all the way down until it finally gets to the async methods zarr v3 exposes.

Needs a lot of refactoring before it could be merged, but it works.

API:

  • Variable.load_async
  • DataArray.load_async
  • Dataset.load_async
  • DataTree.load_async
  • load_dataset?
  • load_dataarray?

@github-actions github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels May 16, 2025
@TomNicholas
Copy link
Member Author

TomNicholas commented May 19, 2025

These failing tests from the CI do not fail when I run them locally, which is interesting.

FAILED xarray/tests/test_backends.py::TestH5NetCDFViaDaskData::test_outer_indexing_reversed - ValueError: dimensions ('t', 'y', 'x') must have the same length as the number of data dimensions, ndim=4
FAILED xarray/tests/test_backends.py::TestNetCDF4ViaDaskData::test_outer_indexing_reversed - ValueError: dimensions ('t', 'y', 'x') must have the same length as the number of data dimensions, ndim=4
FAILED xarray/tests/test_backends.py::TestDask::test_outer_indexing_reversed - ValueError: dimensions ('t', 'y', 'x') must have the same length as the number of data dimensions, ndim=4
= 3 failed, 18235 passed, 1269 skipped, 77 xfailed, 15 xpassed, 2555 warnings in 487.15s (0:08:07) =
Error: Process completed with exit code 1.

@@ -267,13 +268,23 @@ def robust_getitem(array, key, catch=Exception, max_retries=6, initial_delay=500
time.sleep(1e-3 * next_delay)


class BackendArray(NdimSizeLenMixin, indexing.ExplicitlyIndexed):
class BackendArray(ABC, NdimSizeLenMixin, indexing.ExplicitlyIndexed):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As __getitem__ is required, I feel like BackendArray should always have been an ABC.

Comment on lines +277 to +278
async def async_getitem(key: indexing.ExplicitIndexer) -> np.typing.ArrayLike:
raise NotImplementedError("Backend does not not support asynchronous loading")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this for the ZarrArray class but in theory it could be supported by other backends too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be the desired behaviour though - this currently means if you opened a dataset from netCDF and called ds.load_async you would get a NotImplementedError. Would it be better to quietly just block instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file enhancement io topic-backends topic-documentation topic-indexing topic-NamedArray Lightweight version of Variable topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an asynchronous load method?
1 participant