-
-
Notifications
You must be signed in to change notification settings - Fork 329
feat: metadata-only support for storage transformers metadata #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metadata-only support for storage transformers metadata #2180
Conversation
If creating a metadata object from a JSON file that specifies a storage transformer list, then this implies reading from or writing to that array hinges on applying the transformer pipeline since it may modify the keys and/or values of that array node. I think ignoring it and issuing a warning is not enough since reading from said array using the metadata would potentially result in the wrong byte sequence being passed to its codec pipeline (as a result of skipping the transformation pipeline). To me this appears as a programming error. I think parsing a metadata document with a specified The case where a |
I think this is valid, but we should raise when constructing an |
src/zarr/core/metadata/v3.py
Outdated
"The storage transformer(s) will be retained in array metadata but will not " | ||
"influence storage routines" | ||
) | ||
warnings.warn(msg, UserWarning, stacklevel=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after thinking about this some more, I believe we should raise an error here instead of warning.
We allow:
- metadata w/o the storage_transformers key
- metadata with
storage_transformers == [] or None
We error for:
- metadata w/
len(storage_transformers) > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not error when a user tries to create an array with the metadata? I'm thinking about how someone would develop a storage transformer with zarr-python as a dependency. If the metadata parsing functionality supports storage transformers, then they can use that immediately, and then subclass / implement their own array class that can use the storage transformer. Seems like a better workflow than forcing them to also subclass the metadata class?
…nto feat/metadata-support-storage-transformers
The changes in this PR now result in errors if an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @d-v-b
…nto feat/metadata-support-storage-transformers
* v3: (21 commits) Default zarr.open to open_group if shape is not provided (zarr-developers#2158) feat: metadata-only support for storage transformers metadata (zarr-developers#2180) fix(async): set default concurrency to 10 tasks (zarr-developers#2256) chore(deps): drop support for python 3.10 and numpy 1.24 (zarr-developers#2217) feature(store): add LoggingStore wrapper (zarr-developers#2231) Apply assorted ruff/flake8-simplify rules (SIM) (zarr-developers#2259) Add array storage helpers (zarr-developers#2065) Apply ruff/flake8-annotations rule ANN204 (zarr-developers#2258) No need to run DeepSource any more - we use ruff (zarr-developers#2261) Remove unnecessary lambda expression (zarr-developers#2260) Enforce ruff/flake8-comprehensions rules (C4) (zarr-developers#2239) Use `map(str, *)` in `test_accessed_chunks` (zarr-developers#2229) Replace Gitter with Zulip (zarr-developers#2254) Enforce ruff/flake8-pytest-style rules (PT) (zarr-developers#2236) Fix multiple identical imports (zarr-developers#2241) Enforce ruff/flake8-return rules (RET) (zarr-developers#2237) Enforce ruff/flynt rules (FLY) (zarr-developers#2240) Fix fill_value handling for complex dtypes (zarr-developers#2200) Update V2 codec pipeline to use concrete classes (zarr-developers#2244) Apply and enforce more ruff rules (zarr-developers#2053) ...
Addresses #2178
This adds metadata-only support for the
storage_transformers
key in array metadata. By "metadata only" I mean that creating an array metadata document from JSON or a dict with astorage_transformers
keyword will not error, and some very mild validation will be run (just ensuring that the value is an iterable or None), but the value is not used for any storage transforming, because we don't have any code for that yet.But at least the metadata can be constructed and the
storage_transformer
value should round-trip through metadata properly. This should resolve #2178.TODO: