Skip to content

Backwards compatibility reading "old" consolidated dataset without attributtes #2694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mannreis opened this issue Jan 13, 2025 · 1 comment · Fixed by #2695
Closed

Backwards compatibility reading "old" consolidated dataset without attributtes #2694

mannreis opened this issue Jan 13, 2025 · 1 comment · Fixed by #2695
Labels
bug Potential issues with the zarr-python library

Comments

@mannreis
Copy link
Contributor

Zarr version

3.0.0

Numcodecs version

0.14.1

Python Version

3.12.8

Operating System

Linux - Ubunty

Installation

pip into virtual environment

Description

Hello,

I bumped into a misleading error when reading a simple consolidated dataset (zarr_format=2) with the zarr 3 implementation.

Traceback (most recent call last):
  File "/home/reis/debug-zarr3.py", line 3, in <module>
    zarr.open('/home/reis/test.zarr',zarr_format=2, mode='r', use_consolidated=True)
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/synchronous.py", line 190, in open
    obj = sync(
          ^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 142, in sync
    raise return_result
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 98, in _runner
    return await coro
           ^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 332, in open
    return await open_group(store=store_path, zarr_format=zarr_format, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 832, in open_group
    raise FileNotFoundError(f"Unable to find group: {store_path}")

The problem comes from the fact that when reading a consolidated zarr that was written and consolidated with zarr=2.18.4 the .zmetadata may not contain .zattrs keys but doing the same thing with zarr 3.0.0 (zarr_format=2) creates an empty dict for .zattrs regardless. Which breaks backwards compatibility with older datasets.

I was able to get work around this by avoiding raising exception when .zattrs is not present on the .zmetadata file:

diff --git a/src/zarr/core/group.py b/src/zarr/core/group.py
index b1447a85..2a533272 100644
--- a/src/zarr/core/group.py
+++ b/src/zarr/core/group.py
@@ -574,8 +574,8 @@ class AsyncGroup:
             v2_consolidated_metadata = v2_consolidated_metadata["metadata"]
             # We already read zattrs and zgroup. Should we ignore these?
             print("   DEBUG:", v2_consolidated_metadata)
-            v2_consolidated_metadata.pop(".zattrs")
-            v2_consolidated_metadata.pop(".zgroup")
+            v2_consolidated_metadata.pop(".zattrs", None)
+            v2_consolidated_metadata.pop(".zgroup", None)
 
             consolidated_metadata: defaultdict[str, dict[str, Any]] = defaultdict(dict)

Steps to reproduce

Here's how I produced this sample dataset with python = 3.10, zarr = 2.18.4 as follows:

import zarr
z=zarr.open('/tmp/test.zarr', mode='w')
z.create('myvar',shape=(2,3),dtype='uint8')
zarr.consolidate_metadata(z.store)

But creating the equivalent with python = 3.12.8, zarr = 3.0.0 produces different results:

import zarr
import numcodecs
z=zarr.open('/tmp/test-new.zarr', zarr_format=2,mode='w')
z.create(name='myvar',shape=(2,3),dtype='uint8',compressor=numcodecs.Blosc())
zarr.consolidate_metadata(z.store,zarr_format=2)

Where's the difference between both cases

$ diff <(jq --sort-keys < /tmp/test.zarr/.zmetadata) <(jq --sort-keys < /tmp/test-new.zarr/.zmetadata)
2a3
>     ".zattrs": {},
17a19
>       "dimension_separator": ".",
27c29,30
<     }
---
>     },
>     "myvar/.zattrs": {}

Additional output

No response

@mannreis mannreis added the bug Potential issues with the zarr-python library label Jan 13, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 13, 2025

thanks for the report, and for your fix. looks like it should be pretty simple to resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants