Skip to content

Make the default zarr_format infer when reading files? #2175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Sep 11, 2024 · 2 comments · Fixed by #2183
Closed

Make the default zarr_format infer when reading files? #2175

TomAugspurger opened this issue Sep 11, 2024 · 2 comments · Fixed by #2183
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

n/a

Description

While working on consolidated metadata, I got myself confused when I did

store = zarr.store.LocalStore(root="../zarr-v2/air-v2.zarr")
zarr.open_group(store=store)
...
File ~/gh/zarr-developers/zarr-python/src/zarr/abc/store.py:74, in Store._check_writable(self)
     72 def _check_writable(self) -> None:
     73     if self.mode.readonly:
---> 74         raise ValueError("store mode does not support writing")

ValueError: store mode does not support writing

I didn't include zarr_format=2 in the call to open_group, so we defaulted to V3. When we failed to find the V3 zarr.json in the store, we fell back to trying to create the group, hence the "store mode does not support writing."

How can we make this nicer? The tricky part is that inferring the Zarr version requires I/O to check for the presence of a .zgroup or .zarray file in the store.

I think my preference would be to temporarily change the default zarr_format in AsyncGroup.open and maybe AsyncArray.open to be "infer". We'll continue to try and load the V3 metadata first. If that fails we'll fall back to trying to parse the V2 metadata.

If we fail to load V2 metadata, we'd continue to fall back to the current behavior of switching over to create mode.

If we do successfully load V2 metadata, we'll use that (as if the user had passed zarr_format=2. I think we might want to also emit a warning that we'll require the user to explicitly provide zarr_format=2 in the future, assuming we don't want to pay the cost of an extra store lookup when we fail to load V3 metadata long term.

To disable inference (and save store lookup you might know will fail) you can pass zarr_format=3 or zarr_format=2.

As an alternative to that "try v3, fallback to v2" approach, we could concurrently try to load V3 and V2 metadata. This would be faster for users relying on inference, but would be slightly slower for the someday common(?) case of trying to load V3 metadata (and logs will get polluted with 404s / key errors when we failed to load the v2 metadata).

Steps to reproduce

zarr.open_group(store=store) on a store with Zarr V2 metadata.

Additional output

No response

@TomAugspurger TomAugspurger added the bug Potential issues with the zarr-python library label Sep 11, 2024
@jhamman
Copy link
Member

jhamman commented Sep 11, 2024

We already handle the fallback option with zarr_format is None in Group.open()

elif zarr_format is None:
zarr_json_bytes, zgroup_bytes, zattrs_bytes = await asyncio.gather(
(store_path / ZARR_JSON).get(),
(store_path / ZGROUP_JSON).get(),
(store_path / ZATTRS_JSON).get(),
)

The problem is that we are doing too much in the top level API. Rather than opting for _default_zarr_version() here:

or _default_zarr_version()

we should only apply the default version if opening the group does not succeed.

@jhamman
Copy link
Member

jhamman commented Sep 13, 2024

Fix up for this in #2183

@jhamman jhamman added this to the 3.0.0.beta milestone Sep 13, 2024
@jhamman jhamman moved this to In Progress in Zarr-Python - 3.0 Sep 13, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Zarr-Python - 3.0 Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants