Skip to content

Zstd: ZSTD_getDecompressedSize is obsolete and used incorrectly. #499

@mkitti

Description

@mkitti

numcodecs currently treats a return value of 0 from ZSTD_getDecompressedSize as an input error. A value of zero could mean one of the following.

  1. empty
  2. unknown
  3. error

dest_size = ZSTD_getDecompressedSize(source_ptr, source_size)
if dest_size == 0:
raise RuntimeError('Zstd decompression error: invalid input data')

Rather numcodecs should use ZSTD_getFrameContentSize which the return value can be differentiated.

  1. 0 means empty
  2. 0xffffffffffffffff, ZSTD_CONTENTSIZE_UNKNOWN, means unknown
  3. 0xfffffffffffffffe, ZSTD_CONTENTSIZE_ERROR, means error

See zstd.h or the manual for a reference.
https://github.com/facebook/zstd/blob/7cf62bc274105f5332bf2d28c57cb6e5669da4d8/lib/zstd.h#L195-L203
https://facebook.github.io/zstd/zstd_manual.html

This error arose during the implementation of Zstandard in n5-zarr:
saalfeldlab/n5-zarr#35

There the compressor was producing blocks which would return ZSTD_CONTENTSIZE_UNKNOWN. ZSTD_getDecompressedSize would return 0 and numcodecs would incorrectly interpret this as an error.

Handling ZSTD_CONTENTSIZE_UNKNOWN may be difficult.

  1. If a dest buffer is provided, then perhaps that should we set as the expected decompressed size and an error should occur if the decompressed size is not that.
  2. If a dest buffer is not provided, we may need to either use a default or use the streaming API to build an growing buffer until all the data is decompressed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions