Skip to content

feat: metadata-only support for storage transformers metadata #2180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Sep 12, 2024

Addresses #2178

This adds metadata-only support for the storage_transformers key in array metadata. By "metadata only" I mean that creating an array metadata document from JSON or a dict with a storage_transformers keyword will not error, and some very mild validation will be run (just ensuring that the value is an iterable or None), but the value is not used for any storage transforming, because we don't have any code for that yet.

But at least the metadata can be constructed and the storage_transformer value should round-trip through metadata properly. This should resolve #2178.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@zoj613
Copy link

zoj613 commented Sep 13, 2024

If creating a metadata object from a JSON file that specifies a storage transformer list, then this implies reading from or writing to that array hinges on applying the transformer pipeline since it may modify the keys and/or values of that array node. I think ignoring it and issuing a warning is not enough since reading from said array using the metadata would potentially result in the wrong byte sequence being passed to its codec pipeline (as a result of skipping the transformation pipeline).

To me this appears as a programming error. I think parsing a metadata document with a specified storage_transformers field should raise an exception to indicate that zarr-python cannot reliably read such an array node since it was clearly created by an implementation that used a storage transformer pipeline.

The case where a storage_transformer field could be safely ignored when parsing a JSON is when it is an empty list, as indicated by the spec

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 13, 2024

To me this appears as a programming error. I think parsing a metadata document with a specified storage_transformers field should raise an exception to indicate that zarr-python cannot reliably read such an array node since it was clearly created by an implementation that used a storage transformer pipeline.

I think this is valid, but we should raise when constructing an Array from metadata that references a storage transformer, rather than when constructing the array metadata itself. I will implement this.

"The storage transformer(s) will be retained in array metadata but will not "
"influence storage routines"
)
warnings.warn(msg, UserWarning, stacklevel=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after thinking about this some more, I believe we should raise an error here instead of warning.

We allow:

  • metadata w/o the storage_transformers key
  • metadata with storage_transformers == [] or None

We error for:

  • metadata w/ len(storage_transformers) > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not error when a user tries to create an array with the metadata? I'm thinking about how someone would develop a storage transformer with zarr-python as a dependency. If the metadata parsing functionality supports storage transformers, then they can use that immediately, and then subclass / implement their own array class that can use the storage transformer. Seems like a better workflow than forcing them to also subclass the metadata class?

@jhamman jhamman added this to the 3.0.0 milestone Sep 13, 2024
@jhamman jhamman added the V3 label Sep 13, 2024
@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 25, 2024

The changes in this PR now result in errors if an AsyncArray is created from ArrayV3Metadata that contains a non-zero number of storage transformers. However, you can create metadata with storage transformers without any error or warnings.

@d-v-b d-v-b requested a review from jhamman September 25, 2024 19:59
@d-v-b d-v-b changed the title feat: meager support for storage transformers metadata feat: metadata-only support for storage transformers metadata Sep 25, 2024
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @d-v-b

@d-v-b d-v-b merged commit 5ca080d into zarr-developers:v3 Sep 27, 2024
20 checks passed
dcherian added a commit to dcherian/zarr-python that referenced this pull request Sep 27, 2024
* v3: (21 commits)
  Default zarr.open to open_group if shape is not provided (zarr-developers#2158)
  feat: metadata-only support for storage transformers metadata (zarr-developers#2180)
  fix(async): set default concurrency to 10 tasks (zarr-developers#2256)
  chore(deps): drop support for python 3.10 and numpy 1.24 (zarr-developers#2217)
  feature(store): add LoggingStore wrapper (zarr-developers#2231)
  Apply assorted ruff/flake8-simplify rules (SIM) (zarr-developers#2259)
  Add array storage helpers (zarr-developers#2065)
  Apply ruff/flake8-annotations rule ANN204 (zarr-developers#2258)
  No need to run DeepSource any more - we use ruff (zarr-developers#2261)
  Remove unnecessary lambda expression (zarr-developers#2260)
  Enforce ruff/flake8-comprehensions rules (C4) (zarr-developers#2239)
  Use `map(str, *)` in `test_accessed_chunks` (zarr-developers#2229)
  Replace Gitter with Zulip (zarr-developers#2254)
  Enforce ruff/flake8-pytest-style rules (PT) (zarr-developers#2236)
  Fix multiple identical imports (zarr-developers#2241)
  Enforce ruff/flake8-return rules (RET) (zarr-developers#2237)
  Enforce ruff/flynt rules (FLY) (zarr-developers#2240)
  Fix fill_value handling for complex dtypes (zarr-developers#2200)
  Update V2 codec pipeline to use concrete classes (zarr-developers#2244)
  Apply and enforce more ruff rules (zarr-developers#2053)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

storage_transformers are not an accepted keyword for ArrayV3Metadata
3 participants