Skip to content

Unable to open zarr group created using xarray without consolidated metadata. #2984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oj-tooth opened this issue Apr 14, 2025 · 6 comments
Closed
Labels
bug Potential issues with the zarr-python library

Comments

@oj-tooth
Copy link

Zarr version

v3.0.6

Numcodecs version

v0.15.1

Python Version

3.12.7

Operating System

Linux

Installation

using pip into a virtual environment

Description

Following xarray issue #10209:

After successfully creating a zarr v3 store without consolidated metadata in an s3 compatible object store, using the zarr.open_group(url, mode='r', zarr_format=3, use_consolidated=False) returns an empty group. This error propagates into the xarray library, meaning that using xr.open_zarr(url, zarr_format=3, consolidated=False) with a read-only access s3 url returns an empty dataset due to calling zarr.open_group() internally.

My current work-around is to pass storage_options as follows:

zarr.open_group(url, mode='r', zarr_format=3, use_consolidated=False, storage_options={"anon":True, "asynchronous":True, "client_kwargs":{"endpoint_url":"https://my_endpoint_url"}})

Note, this error does not occur when creating a zarr v3 store with consolidated metadata (although I'm conscious this is no longer included in the zarr v3 spec) by passing consolidated=None to xarray's .to_zarr() function. In this case, our end-users can simply access the group as a Dataset using xr.open_zarr(url, zarr_format=3, consolidated=True) as expected.

Thanks in advance for your help!

Steps to reproduce

Create a zarr v3 store without consolidated metadata using xarray [v2025.3.1]:

import zarr
import xarray

filesystem = s3fs.S3FileSystem(...)
store = zarr.storage.FsspecStore(fs=filesystem, path="path/to/object")
ds.to_zarr(store=store, mode="w", zarr_format=3, consolidated=False)

Open the zarr group created at url (in our case this allows public read-only access) using zarr or equivalently using xr.open_zarr():

import zarr

zarr_group = zarr.open_group(url, mode='r', zarr_format=3, use_consolidated=False)

print(zarr_group.members())

Returns: ()

Additional output

No response

@oj-tooth oj-tooth added the bug Potential issues with the zarr-python library label Apr 14, 2025
@rabernat
Copy link
Contributor

Is url an HTTP url or an S3 one?

@oj-tooth
Copy link
Author

Apologies, should have been clearer on this! It's a HTTP url (e.g., "https://noc-msm-o.s3-ext.jc.rl.ac.uk/test/eorca1").

The workaround I mentioned uses an S3 url with the above endpoint url specified in storage_options.

@rabernat
Copy link
Contributor

rabernat commented Apr 14, 2025

Opening a Zarr group over http will never work without consolidated metadata because the http protocol does not have the concept of directory listing (like S3 does). This is not a bug.

(However, we could consider delivering a more useful error message rather than just returning an empty group.)

@oj-tooth
Copy link
Author

Thanks for the clarification @rabernat, that's a complete oversight on my part!
Great for us to know this when distributing our data via url - thankfully this will no longer be a concern when we can adopt Icechunk.

@rabernat
Copy link
Contributor

when we can adopt Icechunk.

Is that still blocked on earth-mover/icechunk#743?

@oj-tooth
Copy link
Author

oj-tooth commented Apr 14, 2025

Sadly, the issue is still ongoing. I've followed up several times with JASMIN, but adding support for conditional put operations still remains with the vendor. Thanks again for all your help on this - I'm hoping we'll get there soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

2 participants