Skip to content

Revert change to default write_empty_chunks. #1001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ Release notes
Unreleased
----------

Bug fixes
~~~~~~~~~

* Changes the default value of ``write_empty_chunks`` to ``True`` to prevent
unanticipated data losses when the data types do not have a proper default
value when empty chunks are read back in.
By :user:`Vyas Ramasubramani <vyasr>`; :issue:`965`.

.. _release_2.11.1:

2.11.1
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1309,7 +1309,7 @@ Empty chunks

As of version 2.11, it is possible to configure how Zarr handles the storage of
chunks that are "empty" (i.e., every element in the chunk is equal to the array's fill value).
When creating an array with ``write_empty_chunks=False`` (the default),
When creating an array with ``write_empty_chunks=False``,
Zarr will check whether a chunk is empty before compression and storage. If a chunk is empty,
then Zarr does not store it, and instead deletes the chunk from storage
if the chunk had been previously stored.
Expand All @@ -1318,7 +1318,7 @@ This optimization prevents storing redundant objects and can speed up reads, but
added computation during array writes, since the contents of
each chunk must be compared to the fill value, and these advantages are contingent on the content of the array.
If you know that your data will form chunks that are almost always non-empty, then there is no advantage to the optimization described above.
In this case, creating an array with ``write_empty_chunks=True`` will instruct Zarr to write every chunk without checking for emptiness.
In this case, creating an array with ``write_empty_chunks=True`` (the default) will instruct Zarr to write every chunk without checking for emptiness.

The following example illustrates the effect of the ``write_empty_chunks`` flag on
the time required to write an array with different values.::
Expand Down
2 changes: 1 addition & 1 deletion zarr/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ def __init__(
cache_metadata=True,
cache_attrs=True,
partial_decompress=False,
write_empty_chunks=False,
write_empty_chunks=True,
zarr_version=None,
):
# N.B., expect at this point store is fully initialized with all
Expand Down
22 changes: 11 additions & 11 deletions zarr/creation.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ def create(shape, chunks=True, dtype=None, compressor='default',
.. versionadded:: 2.8

write_empty_chunks : bool, optional
If True, all chunks will be stored regardless of their contents. If
False (default), each chunk is compared to the array's fill value prior
to storing. If a chunk is uniformly equal to the fill value, then that
chunk is not be stored, and the store entry for that chunk's key is
deleted. This setting enables sparser storage, as only chunks with
If True (default), all chunks will be stored regardless of their
contents. If False, each chunk is compared to the array's fill value
prior to storing. If a chunk is uniformly equal to the fill value, then
that chunk is not be stored, and the store entry for that chunk's key
is deleted. This setting enables sparser storage, as only chunks with
non-fill-value data are stored, at the expense of overhead associated
with checking the data of each chunk.

Expand Down Expand Up @@ -403,7 +403,7 @@ def open_array(
chunk_store=None,
storage_options=None,
partial_decompress=False,
write_empty_chunks=False,
write_empty_chunks=True,
*,
zarr_version=None,
dimension_separator=None,
Expand Down Expand Up @@ -462,11 +462,11 @@ def open_array(
is Blosc, when getting data from the array chunks will be partially
read and decompressed when possible.
write_empty_chunks : bool, optional
If True, all chunks will be stored regardless of their contents. If
False (default), each chunk is compared to the array's fill value prior
to storing. If a chunk is uniformly equal to the fill value, then that
chunk is not be stored, and the store entry for that chunk's key is
deleted. This setting enables sparser storage, as only chunks with
If True (default), all chunks will be stored regardless of their
contents. If False, each chunk is compared to the array's fill value
prior to storing. If a chunk is uniformly equal to the fill value, then
that chunk is not be stored, and the store entry for that chunk's key
is deleted. This setting enables sparser storage, as only chunks with
non-fill-value data are stored, at the expense of overhead associated
with checking the data of each chunk.

Expand Down