Make the default `zarr_format` infer when reading files? #2175

TomAugspurger · 2024-09-11T19:36:46Z

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

n/a

Description

While working on consolidated metadata, I got myself confused when I did

store = zarr.store.LocalStore(root="../zarr-v2/air-v2.zarr")
zarr.open_group(store=store)
...
File ~/gh/zarr-developers/zarr-python/src/zarr/abc/store.py:74, in Store._check_writable(self)
     72 def _check_writable(self) -> None:
     73     if self.mode.readonly:
---> 74         raise ValueError("store mode does not support writing")

ValueError: store mode does not support writing

I didn't include zarr_format=2 in the call to open_group, so we defaulted to V3. When we failed to find the V3 zarr.json in the store, we fell back to trying to create the group, hence the "store mode does not support writing."

How can we make this nicer? The tricky part is that inferring the Zarr version requires I/O to check for the presence of a .zgroup or .zarray file in the store.

I think my preference would be to temporarily change the default zarr_format in AsyncGroup.open and maybe AsyncArray.open to be "infer". We'll continue to try and load the V3 metadata first. If that fails we'll fall back to trying to parse the V2 metadata.

If we fail to load V2 metadata, we'd continue to fall back to the current behavior of switching over to create mode.

If we do successfully load V2 metadata, we'll use that (as if the user had passed zarr_format=2. I think we might want to also emit a warning that we'll require the user to explicitly provide zarr_format=2 in the future, assuming we don't want to pay the cost of an extra store lookup when we fail to load V3 metadata long term.

To disable inference (and save store lookup you might know will fail) you can pass zarr_format=3 or zarr_format=2.

As an alternative to that "try v3, fallback to v2" approach, we could concurrently try to load V3 and V2 metadata. This would be faster for users relying on inference, but would be slightly slower for the someday common(?) case of trying to load V3 metadata (and logs will get polluted with 404s / key errors when we failed to load the v2 metadata).

Steps to reproduce

zarr.open_group(store=store) on a store with Zarr V2 metadata.

Additional output

No response

The text was updated successfully, but these errors were encountered:

jhamman · 2024-09-11T22:56:59Z

We already handle the fallback option with zarr_format is None in Group.open()

zarr-python/src/zarr/core/group.py

Lines 165 to 170 in 726fdfb

    
           elif zarr_format is None: 
        
               zarr_json_bytes, zgroup_bytes, zattrs_bytes = await asyncio.gather( 
        
                   (store_path / ZARR_JSON).get(), 
        
                   (store_path / ZGROUP_JSON).get(), 
        
                   (store_path / ZATTRS_JSON).get(), 
        
               )

The problem is that we are doing too much in the top level API. Rather than opting for _default_zarr_version() here:

zarr-python/src/zarr/api/asynchronous.py

Line 532 in 726fdfb

or _default_zarr_version()

we should only apply the default version if opening the group does not succeed.

jhamman · 2024-09-13T23:15:49Z

Fix up for this in #2183

TomAugspurger added the bug Potential issues with the zarr-python library label Sep 11, 2024

jhamman mentioned this issue Sep 13, 2024

fix: opening a group with unspecified format finds either v2 or v3 #2183

Merged

6 tasks

jhamman added this to the 3.0.0.beta milestone Sep 13, 2024

jhamman added this to Zarr-Python - 3.0 Sep 13, 2024

jhamman moved this to In Progress in Zarr-Python - 3.0 Sep 13, 2024

jhamman closed this as completed in #2183 Sep 14, 2024

github-project-automation bot moved this from In Progress to Done in Zarr-Python - 3.0 Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the default `zarr_format` infer when reading files? #2175

Make the default `zarr_format` infer when reading files? #2175

TomAugspurger commented Sep 11, 2024

jhamman commented Sep 11, 2024

jhamman commented Sep 13, 2024

Make the default zarr_format infer when reading files? #2175

Make the default zarr_format infer when reading files? #2175

Comments

TomAugspurger commented Sep 11, 2024

Zarr version

Numcodecs version

Python Version

Operating System

Installation

Description

Steps to reproduce

Additional output

jhamman commented Sep 11, 2024

jhamman commented Sep 13, 2024

Make the default `zarr_format` infer when reading files? #2175

Make the default `zarr_format` infer when reading files? #2175