Skip to content

create_array() with shards and data parameter only partially writes data #3479

@dtonagel

Description

@dtonagel

Zarr version

v3.1.2

Numcodecs version

v0.16.2

Python Version

3.13.3

Operating System

Windows

Installation

pip install in venv

Description

When giving data directly into a create_array call with the shards parameter, only part of the data gets written to disk. The behaviour disappears when the shards parameter is not used. It also disappears when setting the data only after array creation.

Steps to reproduce

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues

import zarr
import numpy as np

na = np.random.random((2000,2000))

store = zarr.storage.MemoryStore()  # Can also be LocalStore, doesn't matter
root = zarr.group(store)

za_no_shard = root.create_array("noshard", data=na, chunks=(1000,1000), fill_value=np.nan, overwrite=True)
za_shard = root.create_array("shard", data=na, chunks=(1000,1000), shards=(2000,1000), fill_value=np.nan, overwrite=True)

print(np.isnan(na).sum().sum())  # 0 as expected
print(np.isnan(za_no_shard[:]).sum().sum())  # 0 as expected
print(f"{np.isnan(za_shard[:]).sum().sum()} should be 0!")  # 2,000,000 (half the chunks are missing)

# Problem occurs only when using "data" Param in create_array. Direct assignment works:
za_shard[:] = na
print(np.isnan(za_shard[:]).sum().sum())  # 0 as expected

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python libraryhelp wantedIssue could use help from someone with familiarity on the topic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions