(feat): `typesize` declared with constructor for `Blosc` #713

ilan-gold · 2025-03-03T13:23:58Z

See zarr-developers/zarr-python#2766 and zarr-developers/zarr-python#2171 - I checked locally the reproducer in the later for the compression ratio, and we now are getting the old performance with 1383 bytes

I think a good place to start is to declare the itemsize information, which is stored anyway on the BloscCodec

This PR does no validation of typesize, although it could in theory if the incoming buffer is a numpy array. I'm not sure that's desireable though as this is purely a performance thing (i.e., a "wrong" typesize doesn't actually break compression/decompression). If you're so low-level that you're changing this while using arrays, and you get it wrong, you'd probably also notice that your files have bad compression ratios

TODO:

Unit tests and/or doctests in docstrings
Tests pass locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
Docs build locally
GitHub Actions CI passes
Test coverage to 100% (Codecov passes)

d-v-b · 2025-03-03T13:30:15Z

this a good idea, but we should be sure that we don't cause backwards compatibility problems with this, since it adds a key to zarr metadata. How do various zarr clients like zarr-python or tensorstore handle extra keys in the blosc JSON definition?

ilan-gold · 2025-03-03T13:37:03Z

this a good idea, but we should be sure that we don't cause backwards compatibility problems with this, since it adds a key to zarr metadata.

Why does it add a key? One of the reason I feel so secure doing this is precisely that I thought he were adding the key anyway: https://github.com/zarr-developers/zarr-python/blob/680142f6f370d862f305a9b37f3c0ce2ce802e76/src/zarr/codecs/blosc.py#L122-L136 so I felt extremely confident making this change. Maybe I'm misunderstanding

ilan-gold · 2025-03-03T13:38:32Z

This feels similar to the Zstd levels business - it's encode only, and can be safely (i.e., nothing breaks) ignored for reading back in. I understand why we serialize it, but it seems people sometimes ignore it anyway (thinking of the NVComp Zstd) with no effect (the utility/quality of such a decision can be debated probably)

d-v-b · 2025-03-03T13:43:44Z

this a good idea, but we should be sure that we don't cause backwards compatibility problems with this, since it adds a key to zarr metadata.

Why does it add a key? One of the reason I feel so secure doing this is precisely that I thought he were adding the key anyway: https://github.com/zarr-developers/zarr-python/blob/680142f6f370d862f305a9b37f3c0ce2ce802e76/src/zarr/codecs/blosc.py#L122-L136 so I felt extremely confident making this change. Maybe I'm misunderstanding

aha, I didn't see that we were serializing this already. So there's no backwards compatibility issues to worry about. which is great

codecov · 2025-03-03T13:44:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (8168e15) to head (80ff632).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #713   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files          63       63           
  Lines        2754     2771   +17     
=======================================
+ Hits         2753     2770   +17     
  Misses          1        1

Files with missing lines	Coverage Δ
numcodecs/tests/test_blosc.py	`100.00% <100.00%> (ø)`

numcodecs/blosc.pyx

Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713

#175) Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713

LDeakin · 2025-04-11T01:57:06Z

this a good idea, but we should be sure that we don't cause backwards compatibility problems with this, since it adds a key to zarr metadata. How do various zarr clients like zarr-python or tensorstore handle extra keys in the blosc JSON definition?

Indeed! This was a breaking change after all for implementations that reject unknown keys, as it seemingly was not serialised in the past. See this diff in https://github.com/LDeakin/zarrs/pull/174/files.

ilan-gold · 2025-04-11T07:44:48Z

@LDeakin looking back, I misunderstood @d-v-b 's point - I thought he was referring to https://github.com/zarr-developers/zarr-python/blame/018f61d93207112f68eba06ea8c2560a489767f6/src/zarr/codecs/blosc.py#L127-L136 (which has no change) and not the fact that this codec is used directly by zarr v2 file-format

d-v-b · 2025-04-11T09:46:52Z

ok so this was actually a breaking change, that's my bad, I definitely should have reviewed this more carefully. But we also really need to be catching this with tests.

@jbms @bogovicj sorry for the inconvenience but you might have to change how your implementations handle the blosc codec in zarr v2 data.

dstansby · 2025-04-11T11:40:19Z

Sorry, I don't have time to follow in detail, but this isn't released yet (right?), so we could in theory revert it if it's too much of an issue?

d-v-b · 2025-04-11T11:43:53Z

it was released in 0.16

dstansby · 2025-04-11T11:58:20Z

Sorry, for some reason I thought this was zarr-python and not numcodecs 🤦

jbms · 2025-04-11T16:08:03Z

Please revert/fix this to minimize breakage, e.g. avoid serializing the typesize parameter when it is None.

Zarr-python/numcodecs also fails on unknown keys so this also breaks compatibility with previous versions of numcodecs/zarr-python.

In general backwards compatibility is tricky so it would be great if you can take care when making any changes to the metadata.

ilan-gold added 4 commits March 3, 2025 14:13

(feat): typesize declared with constructor

ce19571

(chore): add docstring

1c10d3f

(chore): relnote

bf6e4e5

Merge branch 'main' into ig/itemsize_blosc

28015ae

(chore): format

04f775f

dstansby reviewed Mar 3, 2025

View reviewed changes

numcodecs/blosc.pyx Show resolved Hide resolved

ilan-gold added 3 commits March 3, 2025 15:50

(fix): add check for typesize<1

22d7f00

(chore): no cover for internal ValueError

a421c5b

(fix): test internal compress error

7fe0dd8

ilan-gold marked this pull request as ready for review March 3, 2025 15:39

ilan-gold requested a review from dstansby March 3, 2025 15:40

ilan-gold changed the title ~~(feat): typesize declared with constructor~~ (feat): typesize declared with constructor for Blosc Mar 4, 2025

ilan-gold added 2 commits March 4, 2025 10:46

Merge branch 'main' into ig/itemsize_blosc

cb76cdb

Merge branch 'main' into ig/itemsize_blosc

80ff632

dstansby approved these changes Mar 4, 2025

View reviewed changes

dstansby enabled auto-merge (squash) March 4, 2025 12:46

dstansby merged commit 3c933cf into zarr-developers:main Mar 4, 2025
27 of 28 checks passed

ilan-gold deleted the ig/itemsize_blosc branch March 4, 2025 12:54

ilan-gold mentioned this pull request Mar 28, 2025

(feat): allow zarr v3 writing scverse/anndata#1892

Merged

4 tasks

This was referenced Apr 9, 2025

(fix): cache partial decoder ilan-gold/zarrs-python#93

Merged

(fix): blosc v2 typesize in config ilan-gold/zarrs-python#94

Merged

LDeakin added a commit to zarrs/zarrs that referenced this pull request Apr 11, 2025

fix: permit (and ignore) typesize in blosc codec in Zarr V2 arrays

54fed4c

Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713

LDeakin mentioned this pull request Apr 11, 2025

fix: permit (and ignore) typesize in blosc codec in Zarr V2 arrays [zarrs_metadata 0.3.x] zarrs/zarrs#175

Merged

LDeakin added a commit to zarrs/zarrs that referenced this pull request Apr 11, 2025

fix: permit (and ignore) typesize in blosc codec in Zarr V2 arrays (

1719b40

#175) Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713

LDeakin mentioned this pull request Apr 11, 2025

fix: use typesize if present in numcodecs blosc V2 arrays zarrs/zarrs#174

Closed

ilan-gold mentioned this pull request Apr 13, 2025

(fix): ensure no typesize in the Blosc config #739

Merged

7 tasks

d-v-b mentioned this pull request Apr 22, 2025

relationship with zarr #742

Open

Metamess mentioned this pull request Apr 24, 2025

zarr-format v2 stores written by zarr-python v3 can no longer be opened by zarr-python v2 due to a numcodecs TypeError zarr-developers/zarr-python#3016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): `typesize` declared with constructor for `Blosc` #713

(feat): `typesize` declared with constructor for `Blosc` #713

ilan-gold commented Mar 3, 2025 •

edited

Loading

d-v-b commented Mar 3, 2025

ilan-gold commented Mar 3, 2025

ilan-gold commented Mar 3, 2025 •

edited

Loading

d-v-b commented Mar 3, 2025

codecov bot commented Mar 3, 2025 •

edited

Loading

LDeakin commented Apr 11, 2025

ilan-gold commented Apr 11, 2025 •

edited

Loading

d-v-b commented Apr 11, 2025

dstansby commented Apr 11, 2025

d-v-b commented Apr 11, 2025

dstansby commented Apr 11, 2025

jbms commented Apr 11, 2025

(feat): typesize declared with constructor for Blosc #713

(feat): typesize declared with constructor for Blosc #713

Conversation

ilan-gold commented Mar 3, 2025 • edited Loading

d-v-b commented Mar 3, 2025

ilan-gold commented Mar 3, 2025

ilan-gold commented Mar 3, 2025 • edited Loading

d-v-b commented Mar 3, 2025

codecov bot commented Mar 3, 2025 • edited Loading

Codecov Report

LDeakin commented Apr 11, 2025

ilan-gold commented Apr 11, 2025 • edited Loading

d-v-b commented Apr 11, 2025

dstansby commented Apr 11, 2025

d-v-b commented Apr 11, 2025

dstansby commented Apr 11, 2025

jbms commented Apr 11, 2025

(feat): `typesize` declared with constructor for `Blosc` #713

(feat): `typesize` declared with constructor for `Blosc` #713

ilan-gold commented Mar 3, 2025 •

edited

Loading

ilan-gold commented Mar 3, 2025 •

edited

Loading

codecov bot commented Mar 3, 2025 •

edited

Loading

ilan-gold commented Apr 11, 2025 •

edited

Loading