Skip to content

Migrate to zarr-python 3 #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@

[Pydantic](https://docs.pydantic.dev/latest/) models for [Zarr](https://zarr.readthedocs.io/en/stable/index.html).

## ⚠️ Disclaimer ⚠️
This project is under flux -- I want to add [zarr version 3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) support to this project, but the [reference python implementation](https://github.com/zarr-developers/zarr-python) doesn't support version 3 yet. As the ecosystem evolves things will break so be advised!

## Installation

`pip install -U pydantic-zarr`
Expand Down Expand Up @@ -56,5 +53,7 @@ print(spec.model_dump())
}
"""
```

## History

This project was developed at [HHMI / Janelia Research Campus](https://www.janelia.org/). It was originally written by Davis Bennett to solve problems he encountered while working on the [Cellmap Project team](https://www.janelia.org/project-team/cellmap/members). In December of 2024 this project was migrated from the [`janelia-cellmap`](https://github.com/janelia-cellmap) github organization to [`zarr-developers`](https://github.com/zarr-developers) organization.
24 changes: 10 additions & 14 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,17 @@ Static typing and runtime validation for Zarr hierarchies.

`pydantic-zarr` expresses data stored in the [Zarr](https://zarr.readthedocs.io/en/stable/) format with [Pydantic](https://docs.pydantic.dev/1.10/). Specifically, `pydantic-zarr` encodes Zarr groups and arrays as [Pydantic models](https://docs.pydantic.dev/1.10/usage/models/). These models are useful for formalizing the structure of Zarr hierarchies, type-checking Zarr hierarchies, and runtime validation for Zarr-based data.


```python
import zarr

from pydantic_zarr.v2 import GroupSpec

# create a Zarr group
group = zarr.group(path='foo')
group = zarr.group(path='foo', zarr_format=2)
# put an array inside the group
array = zarr.create(store = group.store, path='foo/bar', shape=10, dtype='uint8')
array = zarr.create(
store=group.store, path='foo/bar', shape=10, dtype='uint8', zarr_format=2
)
array.attrs.put({'metadata': 'hello'})

# create a pydantic model to model the Zarr group
Expand All @@ -37,13 +39,7 @@ print(spec.model_dump())
'order': 'C',
'filters': None,
'dimension_separator': '.',
'compressor': {
'id': 'blosc',
'cname': 'lz4',
'clevel': 5,
'shuffle': 1,
'blocksize': 0,
},
'compressor': {'id': 'zstd', 'level': 0, 'checksum': False},
}
},
}
Expand All @@ -56,11 +52,11 @@ More examples can be found in the [usage guide](usage_zarr_v2.md).

`pip install -U pydantic-zarr`


### Limitations

#### No array data operations
This library only provides tools to represent the *layout* of Zarr groups and arrays, and the structure of their attributes. `pydantic-zarr` performs no type checking or runtime validation of the multidimensional array data contained *inside* Zarr arrays, and `pydantic-zarr` does not contain any tools for efficiently reading or writing Zarr arrays.

This library only provides tools to represent the _layout_ of Zarr groups and arrays, and the structure of their attributes. `pydantic-zarr` performs no type checking or runtime validation of the multidimensional array data contained _inside_ Zarr arrays, and `pydantic-zarr` does not contain any tools for efficiently reading or writing Zarr arrays.

#### Supported Zarr versions

Expand All @@ -84,7 +80,7 @@ In `pydantic-zarr`, Zarr groups are modeled by the `GroupSpec` class, which is a

Zarr arrays are represented by the `ArraySpec` class, which has a similar `attributes` field, as well as fields for all the Zarr array properties (`dtype`, `shape`, `chunks`, etc).

`GroupSpec` and `ArraySpec` are both [generic models](https://docs.pydantic.dev/1.10/usage/models/#generic-models). `GroupSpec` takes two type parameters, the first specializing the type of `GroupSpec.attributes`, and the second specializing the type of the *values* of `GroupSpec.members` (the keys of `GroupSpec.members` are always strings). `ArraySpec` only takes one type parameter, which specializes the type of `ArraySpec.attributes`.
`GroupSpec` and `ArraySpec` are both [generic models](https://docs.pydantic.dev/1.10/usage/models/#generic-models). `GroupSpec` takes two type parameters, the first specializing the type of `GroupSpec.attributes`, and the second specializing the type of the _values_ of `GroupSpec.members` (the keys of `GroupSpec.members` are always strings). `ArraySpec` only takes one type parameter, which specializes the type of `ArraySpec.attributes`.

Examples using this generic typing functionality can be found in the [usage guide](usage_zarr_v2.md#using-generic-types).

Expand All @@ -100,4 +96,4 @@ To handle these cases, `pydantic-zarr` allows the `members` attribute of a `Grou

## Standardization

The Zarr specifications do not define a model of the Zarr hierarchy. `pydantic-zarr` is an implementation of a particular model that can be found formalized in this [specification document](https://github.com/d-v-b/zeps/blob/zom/draft/ZEP0006.md), which has been proposed for inclusion in the Zarr specifications. You can find the discussion of that proposal in [this pull request](https://github.com/zarr-developers/zeps/pull/46).
The Zarr specifications do not define a model of the Zarr hierarchy. `pydantic-zarr` is an implementation of a particular model that can be found formalized in this [specification document](https://github.com/d-v-b/zeps/blob/zom/draft/ZEP0006.md), which has been proposed for inclusion in the Zarr specifications. You can find the discussion of that proposal in [this pull request](https://github.com/zarr-developers/zeps/pull/46).
69 changes: 38 additions & 31 deletions docs/usage_zarr_v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,26 +6,27 @@

The `GroupSpec` and `ArraySpec` classes represent Zarr v2 groups and arrays, respectively. To create an instance of a `GroupSpec` or `ArraySpec` from an existing Zarr group or array, pass the Zarr group / array to the `.from_zarr` method defined on the `GroupSpec` / `ArraySpec` classes. This will result in a `pydantic-zarr` model of the Zarr object.

> By default `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. This can be extremely slow if used on an extensive Zarr group on high latency storage. To limit the depth of traversal to a specific depth, use the `depth` keyword argument, e.g. `GroupSpec.from_zarr(zarr_group, depth=1)`
> By default `GroupSpec.from_zarr(zarr_group)` will traverse the entire hierarchy under `zarr_group`. This can be extremely slow if used on an extensive Zarr group on high latency storage. To limit the depth of traversal to a specific depth, use the `depth` keyword argument, e.g. `GroupSpec.from_zarr(zarr_group, depth=1)`

Note that `from_zarr` will *not* read the data inside an array.
Note that `from_zarr` will _not_ read the data inside an array.

### Writing

To write a hierarchy to some zarr-compatible storage backend, `GroupSpec` and `ArraySpec` have `to_zarr` methods that take a Zarr store and a path and return a Zarr array or group created in the store at the given path.

Note that `to_zarr` will *not* write any array data. You have to do this separately.
Note that `to_zarr` will _not_ write any array data. You have to do this separately.

```python
from zarr import group
from zarr.creation import create
from zarr.storage import MemoryStore
from zarr import create, group

from pydantic_zarr.v2 import GroupSpec

# create an in-memory Zarr group + array with attributes
grp = group(path='foo')
grp = group(path='foo', zarr_format=2)
grp.attrs.put({'group_metadata': 10})
arr = create(path='foo/bar', store=grp.store, shape=(10,), compressor=None)
arr = create(
path='foo/bar', store=grp.store, shape=(10,), compressor=None, zarr_format=2
)
arr.attrs.put({'array_metadata': True})

spec = GroupSpec.from_zarr(grp)
Expand Down Expand Up @@ -63,15 +64,9 @@ spec_dict2['members']['bar']['shape'] = (100,)
# serialize the spec to the store
group2 = GroupSpec(**spec_dict2).to_zarr(grp.store, path='foo2')

print(group2)
#> <zarr.hierarchy.Group '/foo2'>

print(dict(group2.attrs))
#> {'a': 100, 'b': 'metadata'}

print(group2['bar'])
#> <zarr.core.Array '/foo2/bar' (100,) float64>

print(dict(group2['bar'].attrs))
#> {'array_metadata': True}
```
Expand All @@ -81,9 +76,10 @@ print(dict(group2['bar'].attrs))
The `ArraySpec` class has a `from_array` static method that takes an array-like object and returns an `ArraySpec` with `shape` and `dtype` fields matching those of the array-like object.

```python
from pydantic_zarr.v2 import ArraySpec
import numpy as np

from pydantic_zarr.v2 import ArraySpec

print(ArraySpec.from_array(np.arange(10)).model_dump())
"""
{
Expand All @@ -100,6 +96,7 @@ print(ArraySpec.from_array(np.arange(10)).model_dump())
}
"""
```

### Flattening and unflattening Zarr hierarchies

In the previous section we built a model of a Zarr hierarchy by defining `GroupSpec` and `ArraySpec`
Expand All @@ -117,15 +114,16 @@ methods to convert to / from these dictionaries.
This example demonstrates how to create a `GroupSpec` from a `dict` representation of a Zarr hierarchy.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
from pydantic_zarr.v2 import ArraySpec, GroupSpec

# other than the key representing the root path "",
# the keys must be valid paths in the Zarr storage hierarchy
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.
tree = {
"": GroupSpec(members=None, attributes={"root": True}),
"/a": GroupSpec(members=None, attributes={"root": False}),
"/a/b": ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
}
"/a/b": ArraySpec(shape=(10, 10), dtype="uint8", chunks=(1, 1)),
}

print(GroupSpec.from_flat(tree).model_dump())
"""
Expand Down Expand Up @@ -162,12 +160,13 @@ This is similar to the example above, except that we are working in reverse -- w
flat `dict` from the `GroupSpec` object.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
from pydantic_zarr.v2 import ArraySpec, GroupSpec

# other than the key representing the root path "",
# the keys must be valid paths in the Zarr storage hierarchy
# note that the `members` attribute is `None` for the `GroupSpec` instances in this `dict`.

a_b = ArraySpec(shape=(10,10), dtype="uint8", chunks=(1,1))
a_b = ArraySpec(shape=(10, 10), dtype="uint8", chunks=(1, 1))
a = GroupSpec(members={'b': a_b}, attributes={"root": False})
root = GroupSpec(members={'a': a}, attributes={"root": True})

Expand All @@ -193,12 +192,14 @@ print(root.to_flat())
```

#### Implicit groups

`zarr-python` supports creating Zarr arrays or groups deep in the
hierarchy without explicitly creating the intermediate groups first.
`from_flat` models this behavior. For example, `{'/a/b/c': ArraySpec(...)}` implicitly defines the existence of a groups named `a` and `b` (which is contained in `a`). `from_flat` will create the expected `GroupSpec` object from such `dict` instances.

```python
from pydantic_zarr.v2 import GroupSpec, ArraySpec
from pydantic_zarr.v2 import ArraySpec, GroupSpec

tree = {'/a/b/c': ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))}
print(GroupSpec.from_flat(tree).model_dump())
"""
Expand Down Expand Up @@ -244,8 +245,11 @@ The `like` method works by converting both input models to `dict` via `pydantic.
The `like` method takes keyword arguments `include` and `exclude`, which determine the attributes included or excluded from the model comparison. So it's possible to use `like` to check if two `ArraySpec` instances have the same `shape`, `dtype` and `chunks` by calling `array_a.like(array_b, include={'shape', 'dtype', 'chunks'})`. This is useful if you don't care about the compressor or filters and just want to ensure that you can safely write an in-memory array to a Zarr array, which depends just on the two arrays having matching `shape`, `dtype`, and `chunks` attributes.

```python
from pydantic_zarr.v2 import ArraySpec, GroupSpec
import zarr
import zarr.storage

from pydantic_zarr.v2 import ArraySpec, GroupSpec

arr_a = ArraySpec(shape=(1,), dtype='uint8', chunks=(1,))
# make an array with a different shape
arr_b = ArraySpec(shape=(2,), dtype='uint8', chunks=(1,))
Expand All @@ -259,7 +263,7 @@ print(arr_a.like(arr_b, exclude={'shape'}))
#> True

# `ArraySpec.like` will convert a zarr.Array to ArraySpec
store = zarr.MemoryStore()
store = zarr.storage.MemoryStore()
# This is a zarr.Array
arr_a_stored = arr_a.to_zarr(store, path='arr_a')

Expand Down Expand Up @@ -302,25 +306,28 @@ This example shows how to specialize `GroupSpec` and `ArraySpec` with type param

```python
import sys
from pydantic_zarr.v2 import GroupSpec, ArraySpec, TItem, TAttr

from pydantic import ValidationError
from typing import Any

from pydantic_zarr.v2 import ArraySpec, GroupSpec, TAttr, TItem

if sys.version_info < (3, 12):
from typing_extensions import TypedDict
else:
from typing import TypedDict


# a Pydantic BaseModel would also work here
class GroupAttrs(TypedDict):
a: int
b: int


# a Zarr group with attributes consistent with GroupAttrs
SpecificAttrsGroup = GroupSpec[GroupAttrs, TItem]

try:
SpecificAttrsGroup(attributes={'a' : 10, 'b': 'foo'})
SpecificAttrsGroup(attributes={'a': 10, 'b': 'foo'})
except ValidationError as exc:
print(exc)
"""
Expand Down Expand Up @@ -350,11 +357,11 @@ except ValidationError as exc:
"""

# this passes validation
items = {'foo': ArraySpec(attributes={},
shape=(1,),
dtype='uint8',
chunks=(1,),
compressor=None)}
items = {
'foo': ArraySpec(
attributes={}, shape=(1,), dtype='uint8', chunks=(1,), compressor=None
)
}
print(ArraysOnlyGroup(attributes={}, members=items).model_dump())
"""
{
Expand Down
11 changes: 4 additions & 7 deletions docs/usage_zarr_v3.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ the backend for that doesn't exist.
## Defining Zarr v3 hierarchies

```python
from pydantic_zarr.v3 import GroupSpec, ArraySpec, NamedConfig
from pydantic_zarr.v3 import ArraySpec, GroupSpec, NamedConfig

array_attributes = {"baz": [1, 2, 3]}
group_attributes = {"foo": 42, "bar": False}

Expand All @@ -21,12 +22,8 @@ array_spec = ArraySpec(
shape=[1000, 1000],
dimension_names=["rows", "columns"],
data_type="uint8",
chunk_grid=NamedConfig(
name="regular", configuration={"chunk_shape": [1000, 100]}
),
chunk_key_encoding=NamedConfig(
name="default", configuration={"separator": "/"}
),
chunk_grid=NamedConfig(name="regular", configuration={"chunk_shape": [1000, 100]}),
chunk_key_encoding=NamedConfig(name="default", configuration={"separator": "/"}),
codecs=[NamedConfig(name="GZip", configuration={"level": 1})],
fill_value=0,
)
Expand Down
13 changes: 9 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ classifiers = [
"Programming Language :: Python :: Implementation :: CPython",
]
dependencies = [
"zarr<3",
"pydantic>2.0.0"
]
"zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
"pydantic>2.0.0",
]

[project.urls]
Documentation = "https://zarr.dev/pydantic-zarr/"
Expand All @@ -52,6 +52,9 @@ docs = [
version.source = "vcs"
build.hooks.vcs.version-file = "src/pydantic_zarr/_version.py"

[tool.hatch.metadata]
allow-direct-references = true

[tool.hatch.envs.test]
features = ["test"]

Expand Down Expand Up @@ -194,7 +197,9 @@ addopts = [
"--durations=10", "-ra", "--strict-config", "--strict-markers",
]
filterwarnings = [
"error"
"error",
# https://github.com/zarr-developers/zarr-python/issues/2948
"ignore:The `order` keyword argument has no effect for Zarr format 3 arrays:RuntimeWarning",
]

[tool.repo-review]
Expand Down
Loading