Skip to content

Ensure KeyError raised for zarr datasets missing dim names #10025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,11 @@ def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr):
pass
else:
attributes = dict(zarr_obj.attrs)
if len(zarr_obj.shape) != len(dimensions):
raise KeyError(
"Zarr object is missing the `dimension_names` metadata which is "
"required for xarray to determine variable dimensions."
)
return dimensions, attributes

# Zarr arrays do not have dimensions. To get around this problem, we add
Expand All @@ -393,7 +398,13 @@ def _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr):

# NCZarr defines dimensions through metadata in .zarray
zarray_path = os.path.join(zarr_obj.path, ".zarray")
zarray = json.loads(zarr_obj.store[zarray_path])
if _zarr_v3():
import asyncio

zarray_str = asyncio.run(zarr_obj.store.get(zarray_path)).to_bytes()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no check previously guaranteeing that this is zarr_format=2, an additional check for zarray_paths ending with zarr.json will likely be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the first try/except block in this function is intended to ensure we have zarr_format=2 here:

https://github.com/oliverwm1/xarray/blob/c1a1c70310f256a1125b08f089e5688075f89d78/xarray/backends/zarr.py#L368-L375

else:
zarray_str = zarr_obj.store.get(zarray_path)
zarray = json.loads(zarray_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you find it useful if zarr let you do this with a single function? because we could totally add that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not needed to access the 'zarray' info before, so that wouldn't make a big difference to me. But it seems nczarr does use it and I was just trying to get this existing code to work with zarr-python 3.

try:
# NCZarr uses Fully Qualified Names
dimensions = [
Expand Down
12 changes: 12 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -4185,7 +4185,7 @@
fx.create_dataset(k, data=v)
with pytest.warns(UserWarning, match="The 'phony_dims' kwarg"):
with xr.open_dataset(tmp_file, engine="h5netcdf", group="bar") as ds:
assert ds.dims == {

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.::warning file=/home/runner/work/xarray/xarray/xarray/tests/test_backends.py,line=4188::The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.

Check warning on line 4188 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / windows-latest py3.10

The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
"phony_dim_0": 5,
"phony_dim_1": 5,
"phony_dim_2": 5,
Expand Down Expand Up @@ -6153,6 +6153,18 @@
assert_identical(original_da, loaded_da)


@requires_zarr
@pytest.mark.usefixtures("default_zarr_format")
def test_raises_key_error_on_invalid_zarr_store(tmp_path):
root = zarr.open_group(tmp_path / "tmp.zarr")
if Version(zarr.__version__) < Version("3.0.0"):
root.create_dataset("bar", shape=(3, 5), dtype=np.float32)
else:
root.create_array("bar", shape=(3, 5), dtype=np.float32)
with pytest.raises(KeyError, match=r"xarray to determine variable dimensions"):
xr.open_zarr(tmp_path / "tmp.zarr", consolidated=False)


@requires_zarr
@pytest.mark.usefixtures("default_zarr_format")
class TestZarrRegionAuto:
Expand Down
Loading