Skip to content

Order of keys created by ArrayV2Metadata.to_dict has changed between versions #3254

@TomNicholas

Description

@TomNicholas

Zarr version

v3.1.0 vs v3.0.9

Numcodecs version

n/a

Python Version

3.12

Operating System

linux

Installation

uv pip

Description

In v3.0.9:

In [5]: ArrayV2Metadata(
   ...:             chunks=(10,),
   ...:             shape=(5,),
   ...:             dtype=np.dtype("int32"),
   ...:             order="C",
   ...:             fill_value=None,
   ...:         ).to_dict()
Out[5]: 
{'shape': (5,),
 'chunks': (10,),
 'fill_value': None,
 'order': 'C',
 'filters': None,
 'dimension_separator': '.',
 'compressor': None,
 'attributes': {},
 'zarr_format': 2,
 'dtype': '<i4'}

but in v3.1.0:

In [6]: ArrayV2Metadata(
   ...:             chunks=(10,),
   ...:             shape=(5,),
   ...:             dtype=parse_data_type(np.dtype("int32"), zarr_format=2),
   ...:             order="C",
   ...:             fill_value=None,
   ...:         ).to_dict()
Out[6]: 
{'shape': (5,),
 'chunks': (10,),
 'dtype': '<i4',
 'fill_value': None,
 'order': 'C',
 'filters': None,
 'dimension_separator': '.',
 'compressor': None,
 'attributes': {},
 'zarr_format': 2}

All the fields are the same, but their order is different.

I don't know if this is meant to be preserved between versions or not (it's private API, and unsure if anythig in zarr relies on this), but it was enough of a difference to break virtualizarr (one of a few problems in zarr-developers/VirtualiZarr#677). In our case we care about the order because we take the result and serialize it as json as part of kerchunk reference files, and we had tests checking against the result of older versions.

Maybe you could argue that if it still round-trips via JSON then this shouldn't matter, and VirtualiZarr / Kerchunk are in the wrong?

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
# your reproducer code
# zarr.print_debug_info()

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions