-
Notifications
You must be signed in to change notification settings - Fork 97
(feat): typesize
declared with constructor for Blosc
#713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this a good idea, but we should be sure that we don't cause backwards compatibility problems with this, since it adds a key to zarr metadata. How do various zarr clients like zarr-python or tensorstore handle extra keys in the blosc JSON definition? |
Why does it add a key? One of the reason I feel so secure doing this is precisely that I thought he were adding the key anyway: https://github.com/zarr-developers/zarr-python/blob/680142f6f370d862f305a9b37f3c0ce2ce802e76/src/zarr/codecs/blosc.py#L122-L136 so I felt extremely confident making this change. Maybe I'm misunderstanding |
This feels similar to the Zstd levels business - it's encode only, and can be safely (i.e., nothing breaks) ignored for reading back in. I understand why we serialize it, but it seems people sometimes ignore it anyway (thinking of the NVComp Zstd) with no effect (the utility/quality of such a decision can be debated probably) |
aha, I didn't see that we were serializing this already. So there's no backwards compatibility issues to worry about. which is great |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #713 +/- ##
=======================================
Coverage 99.96% 99.96%
=======================================
Files 63 63
Lines 2754 2771 +17
=======================================
+ Hits 2753 2770 +17
Misses 1 1
|
typesize
declared with constructortypesize
declared with constructor for Blosc
Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713
#175) Changed in `numcodecs` 0.16 zarr-developers/numcodecs#713
Indeed! This was a breaking change after all for implementations that reject unknown keys, as it seemingly was not serialised in the past. See this diff in https://github.com/LDeakin/zarrs/pull/174/files. |
@LDeakin looking back, I misunderstood @d-v-b 's point - I thought he was referring to https://github.com/zarr-developers/zarr-python/blame/018f61d93207112f68eba06ea8c2560a489767f6/src/zarr/codecs/blosc.py#L127-L136 (which has no change) and not the fact that this codec is used directly by zarr v2 file-format |
ok so this was actually a breaking change, that's my bad, I definitely should have reviewed this more carefully. But we also really need to be catching this with tests. @jbms @bogovicj sorry for the inconvenience but you might have to change how your implementations handle the blosc codec in zarr v2 data. |
Sorry, I don't have time to follow in detail, but this isn't released yet (right?), so we could in theory revert it if it's too much of an issue? |
it was released in 0.16 |
Sorry, for some reason I thought this was zarr-python and not numcodecs 🤦 |
Please revert/fix this to minimize breakage, e.g. avoid serializing the typesize parameter when it is None. Zarr-python/numcodecs also fails on unknown keys so this also breaks compatibility with previous versions of numcodecs/zarr-python. In general backwards compatibility is tricky so it would be great if you can take care when making any changes to the metadata. |
See zarr-developers/zarr-python#2766 and zarr-developers/zarr-python#2171 - I checked locally the reproducer in the later for the compression ratio, and we now are getting the old performance with 1383 bytes
I think a good place to start is to declare the itemsize information, which is stored anyway on the
BloscCodec
This PR does no validation of
typesize
, although it could in theory if the incoming buffer is a numpy array. I'm not sure that's desireable though as this is purely a performance thing (i.e., a "wrong"typesize
doesn't actually break compression/decompression). If you're so low-level that you're changing this while using arrays, and you get it wrong, you'd probably also notice that your files have bad compression ratiosTODO: