-
Notifications
You must be signed in to change notification settings - Fork 101
Description
numcodecs currently treats a return value of 0
from ZSTD_getDecompressedSize
as an input error. A value of zero could mean one of the following.
- empty
- unknown
- error
Lines 151 to 153 in 366318f
dest_size = ZSTD_getDecompressedSize(source_ptr, source_size) | |
if dest_size == 0: | |
raise RuntimeError('Zstd decompression error: invalid input data') |
Rather numcodecs should use ZSTD_getFrameContentSize
which the return value can be differentiated.
0
means empty0xffffffffffffffff
,ZSTD_CONTENTSIZE_UNKNOWN
, means unknown0xfffffffffffffffe
,ZSTD_CONTENTSIZE_ERROR
, means error
See zstd.h or the manual for a reference.
https://github.com/facebook/zstd/blob/7cf62bc274105f5332bf2d28c57cb6e5669da4d8/lib/zstd.h#L195-L203
https://facebook.github.io/zstd/zstd_manual.html
This error arose during the implementation of Zstandard in n5-zarr:
saalfeldlab/n5-zarr#35
There the compressor was producing blocks which would return ZSTD_CONTENTSIZE_UNKNOWN
. ZSTD_getDecompressedSize
would return 0
and numcodecs would incorrectly interpret this as an error.
Handling ZSTD_CONTENTSIZE_UNKNOWN
may be difficult.
- If a
dest
buffer is provided, then perhaps that should we set as the expected decompressed size and an error should occur if the decompressed size is not that. - If a
dest
buffer is not provided, we may need to either use a default or use the streaming API to build an growing buffer until all the data is decompressed.