Skip to content

[v3]: TypeError when updating Array.attrs, thanks to incorrect inferred fill_value #2153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Sep 6, 2024 · 6 comments
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 6, 2024

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

v3

Description

Discovered while looking at the xarray unit tests for Zarr.

Steps to reproduce

import zarr
import numpy as np
from zarr.api import asynchronous

store = zarr.store.MemoryStore({}, mode="w")
arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
arr.update_attributes({"key": "value"})

The .put call raises with

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 7
      5 store = zarr.store.MemoryStore({}, mode="w")
      6 arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
----> 7 arr.update_attributes({"k": "v"})

File ~/gh/zarr-developers/zarr-python/src/zarr/core/array.py:2055, in Array.update_attributes(self, new_attributes)
   2053 def update_attributes(self, new_attributes: dict[str, JSON]) -> Array:
   2054     return type(self)(
-> 2055         sync(
   2056             self._async_array.update_attributes(new_attributes),
   2057         )
   2058     )

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:91, in sync(coro, loop, timeout)
     88 return_result = next(iter(finished)).result()
     90 if isinstance(return_result, BaseException):
---> 91     raise return_result
     92 else:
     93     return return_result

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:50, in _runner(coro)
     45 """
     46 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     47 exception, the exception will be returned.
     48 """
     49 try:
---> 50     return await coro
     51 except Exception as ex:
     52     return ex

File ~/gh/zarr-developers/zarr-python/src/zarr/core/array.py:601, in AsyncArray.update_attributes(self, new_attributes)
    600 async def update_attributes(self, new_attributes: dict[str, JSON]) -> AsyncArray:
--> 601     new_metadata = self.metadata.update_attributes(new_attributes)
    603     # Write new metadata
    604     await self._save_metadata(new_metadata)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:323, in ArrayV3Metadata.update_attributes(self, attributes)
    322 def update_attributes(self, attributes: dict[str, JSON]) -> Self:
--> 323     return replace(self, attributes=attributes)

File ~/mambaforge/lib/python3.10/dataclasses.py:1454, in replace(obj, **changes)
   1447         changes[f.name] = getattr(obj, f.name)
   1449 # Create the new object, which calls __init__() and
   1450 # __post_init__() (if defined), using all of the init fields we've
   1451 # added and/or left in 'changes'.  If there are values supplied in
   1452 # changes that aren't fields, this will correctly raise a
   1453 # TypeError.
-> 1454 return obj.__class__(**changes)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:189, in ArrayV3Metadata.__init__(self, shape, data_type, chunk_grid, chunk_key_encoding, fill_value, codecs, attributes, dimension_names)
    187 chunk_key_encoding_parsed = ChunkKeyEncoding.from_dict(chunk_key_encoding)
    188 dimension_names_parsed = parse_dimension_names(dimension_names)
--> 189 fill_value_parsed = parse_fill_value_v3(fill_value, dtype=data_type_parsed)
    190 attributes_parsed = parse_attributes(attributes)
    191 codecs_parsed_partial = parse_codecs(codecs)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:656, in parse_fill_value_v3(fill_value, dtype)
    654             raise ValueError(msg)
    655     msg = f"Cannot parse non-string sequence {fill_value} as a scalar with type {dtype}."
--> 656     raise TypeError(msg)
    657 return dtype.type(fill_value)

TypeError: Cannot parse non-string sequence b'' as a scalar with type |S6.

Additional output

The Array.update_attributes call is a bit of a red herring. That just happens to go through Metadata.__init__, which validates the fill value for a given dtype. Only this type we use fill_value=np.bytes(b""), which is incompatible with the dtype np.dtype("S6").

So I think the shorter reproducer is something like

arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
assert arr.metadata.fill_value != np.bytes_("b"))

in other words, maybe(?) our inferred fill value is incorrect for this specific dtype.

@TomAugspurger TomAugspurger added the bug Potential issues with the zarr-python library label Sep 6, 2024
@TomAugspurger TomAugspurger changed the title [v3]: TypeError when updating Array.attrs, thanks to incorrect inferred dtype [v3]: TypeError when updating Array.attrs, thanks to incorrect inferred fill_value Sep 6, 2024
@TomAugspurger
Copy link
Contributor Author

Or maybe the issue is just that zarr-python v3 (and maybe the spec?) don't support this data type yet?

Looking through the spec, I suppose we could support it with the r* type. I'll look into this a bit.

@TomAugspurger
Copy link
Contributor Author

As a short-term fix, I think we can update

if isinstance(fill_value, Sequence) and not isinstance(fill_value, str):
to skip by bytes and str, not just str.

I do want to spend some type figuring out what the appropriate fill_value is for np.dtype("S6") (both in practice and according to the spec). Right now we use np.dtype("S6").type(0) to get np.bytes_(b"") as the fill value. Unfortunately, that has a dtype of S, not S6. Is fill_value.dtype == dtype an invariant we want to hold? So we would need to pad the fill_value to be length 6 in this case?

@jhamman
Copy link
Member

jhamman commented Sep 24, 2024

This is now erroring much earlier with a reasonable error:

arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))

ValueError: Invalid V3 data_type: |S6

@rabernat
Copy link
Contributor

rabernat commented Oct 8, 2024

This appears to have been fixed by #2036

@rabernat rabernat closed this as completed Oct 8, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Zarr-Python - 3.0 Oct 8, 2024
@rabernat
Copy link
Contributor

rabernat commented Oct 8, 2024

There is another issue here, which is the fact that I can't figure out how to actually read arr!

 arr[:]
File ~/gh/zarr-developers/zarr-python/src/zarr/core/indexing.py:76, in err_too_many_indices(selection, shape)
     75 def err_too_many_indices(selection: Any, shape: ChunkCoords) -> None:
---> 76     raise IndexError(f"too many indices for array; expected {len(shape)}, got {len(selection)}")

IndexError: too many indices for array; expected 0, got 1

@jhamman
Copy link
Member

jhamman commented Oct 8, 2024

👇

In [8]: np.asarray(arr)
Out[8]: array(b'0', dtype='|S6')

In [9]: np.asarray(arr).item()
Out[9]: b'0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
Status: Done
Development

No branches or pull requests

3 participants