-
-
Notifications
You must be signed in to change notification settings - Fork 329
serializing not a time (NaT) to metadata #3028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@moradology I think this was added in #2802, was it intentional that we deviate from zarr-python 2 here? |
It was intentional, though this was among the cases that I wasn't entirely certain about and which only came up after I implemented the property-based tests (it was one of the edge cases not handled as things were at the time). I'm afraid I wasn't sure of the best path here, but hopefully the explicitness of expression in the tests will still serve as a single point of reference on these kinds of questions going forward #2802 (comment) Definitely reasonable to revert to the int-based representation to preserve compatibility with v2 though, if the slight improvement of representation is less important (as it seems to me is the case) |
@moradology no worries, I didn't know until very recently what zarr-python 2 was actually doing in this case, and the real problem is that zarr-python's behavior was never documented properly. I agree that the string literal |
OK, I've got a PR up that solves this problem by relaxing the deserialization behavior. Let me know what you think |
good news @moradology -- I did some testing and it seems like zarr-python 2 works with either the int form of Here's a script that demonstrates this: # /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr==2.18",
# "numcodecs < 0.16"
# ]
# ///
import zarr
import json
import numpy as np
store = {}
meta_a = {
'chunks': [2],
'compressor': {'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': 1},
'dtype': '<M8[s]',
'fill_value': -9223372036854775808,
'filters': None,
'order': 'C',
'shape': [2],
'zarr_format': 2
}
# replace the integer form of nat with the string NaT
meta_b = meta_a | {"fill_value": "NaT"}
store['a/.zarray'] = json.dumps(meta_a).encode()
store['b/.zarray'] = json.dumps(meta_b).encode()
a = zarr.open(store, path='a', mode='a')
b = zarr.open(store, path='b', mode='a')
assert all(np.isnat(a[:]))
assert all(np.isnat(b[:]))
a[0] = np.datetime64(10, 's')
b[0] = np.datetime64(10, 's')
assert a[0] == b[0]
assert json.loads(store['a/.zarray'])['fill_value'] == meta_a['fill_value']
assert json.loads(store['b/.zarray'])["fill_value"] == meta_b['fill_value']
print('success') |
we do still need to ensure that zarr-python 3 remains compatible with the int-encoded fill value, hence your PR. |
Numpy datetime and timedelta dtypes have a nan-like value called
NaT
. TheNaT
value internally represented by the 64 bit signed integer -9223372036854775808:The zarr v2 spec does not define a JSON-serializable fill value encoding for datetimes or timedeltas. In lieu of a spec, we can go by the implementation of zarr-python 2.x, which used the integer form of the datetime / timedelta for all fill value, including
NaT
:But we have a property test that checks if the numpy
NaT
value is serialized to metadata as the string"NaT"
. This goes against the behavior of v2, according to which It should be serialized as an int.The text was updated successfully, but these errors were encountered: