Skip to content

chore/handle numcodecs codecs #3376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Aug 13, 2025

This PR brings in all the codecs defined in numcodecs.zarr3. After this PR is merged, we can safely replace the numcodecs.zarr3 module with reexports from zarr python, or remove numcodecs.zarr3 entirely, thereby fixing our circular dependency problem.

This PR also changes the default config to ensure that the locally-defined codecs take priority over the same codec found in the numcodecs registry.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 13, 2025
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 96.93878% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.60%. Comparing base (4b26501) to head (0e10d8e).

Files with missing lines Patch % Lines
src/zarr/codecs/numcodecs/_codecs.py 96.44% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3376      +/-   ##
==========================================
+ Coverage   94.55%   94.60%   +0.04%     
==========================================
  Files          79       81       +2     
  Lines        9447     9641     +194     
==========================================
+ Hits         8933     9121     +188     
- Misses        514      520       +6     
Files with missing lines Coverage Δ
src/zarr/codecs/numcodecs/__init__.py 100.00% <100.00%> (ø)
src/zarr/core/config.py 83.33% <ø> (ø)
src/zarr/registry.py 88.81% <100.00%> (ø)
src/zarr/codecs/numcodecs/_codecs.py 96.44% <96.44%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


def _encode(self, chunk_data: Buffer, prototype: BufferPrototype) -> Buffer:
encoded = self._codec.encode(chunk_data.as_array_like())
if isinstance(encoded, np.ndarray): # Required for checksum codecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we know statically which are checksum codecs without the isinstance check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b., this was copy + pasted from numcodecs, but I think the answer is "no"

codec_name: str
codec_config: dict[str, JSON]

def __init_subclass__(cls, *, codec_name: str | None = None, **kwargs: Any) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a codec definition look like without this magic? I'd be fine with repeating a few things if it meant we could avoid this (and IIUC some of the complexity in __repr__ and __init__ would go away too?).

Comment on lines 254 to 255
def __init__(self, **codec_config: JSON) -> None:
super().__init__(**codec_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this do?


@pytest.mark.parametrize("codec_class", [_numcodecs.PCodec, _numcodecs.ZFPY])
def test_generic_bytes_codec(codec_class: type[_numcodecs._NumcodecsArrayBytesCodec]) -> None:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the different cases here? Do the pcodec zfpy codecs depend on optional numcodesc dependencies? And if so, is that the only reason we might not be able to run this test?

If so, can we maybe do pytest.importorskip(dependency_name) and then assume we have it and avoid the xfails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess we (or numcodecs?) raises a ValueError...? If that's numcodecs, then whatever. But I don't think we should raise a ValueError if a dependency is missing.

@d-v-b
Copy link
Contributor Author

d-v-b commented Aug 17, 2025

> Can you explain the different cases here?

Spinning this question out into the main thread -- from me, the general answer to questions like this will be "no", since I am only copy+pasting stuff from numcodecs. I haven't spent too much time figuring out what this code is doing. I do think @normanrz and @TomNicholas might be able to answer some of these questions though.

@TomAugspurger
Copy link
Contributor

Ah, I didn't realize this was mostly from numcodecs. I think that moots most of my comments aside from where in the public API we put these.

@d-v-b
Copy link
Contributor Author

d-v-b commented Aug 17, 2025

yeah I should have made more clear that this is nearly all directly copy + pasted from numcodecs.zarr3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants