-
Notifications
You must be signed in to change notification settings - Fork 90
tiff_to_zarr for geotiff with compression: zarr reads strange values #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Using a different json file in each case, e.g.,
Switching to |
Thanks! That already helped a lot! It seems for me this is a combination of several issues. I now get:
edit: this is the trick: from imagecodecs.numcodecs import register_codecs
register_codecs()
What jumped to my eye here is the compression level, I think this should be 9, or am I wrong?
source: https://gdal.org/drivers/raster/gtiff.html It seems the compression level is not recorded in the json? 🤔 then again for deflate it should have been 6, but
|
The compression level is an option at compression type, but all streams should be uncompressible without specifying it again (zstd must record it in the compressed block header, or maybe it's more subtle like size of dictionary that is saved).
@cgohlke , thanks for commenting on here, and I didn't realise that imagecodecs included such non-image codecs :). Do you think you can build a reproducer of the numcodecs zstd issue? @croth1 , it may be that your fsspec inastances are being cached when you call it multiple times with identical arguments (i.e., the same references file name, but with it having different content). You can always pass |
I'll try. Looks like The issues are: why does the function fail on this input? Since zarr knows the size of the decompressed chunks, why is it not passed on to the numcodecs codec, e.g. as an output buffer of correct size? |
I don't suppose it necessarily knows the output buffer size, since there may be another codec in the chain before forming the final array. I wonder, does cramjam's zstd decompressor work here? Or standard zstandard? I don't know why numcodecs needs to directly build against source when those options exist (similar discussion going on around blosc). |
GDAL/libtiff use the zstd streaming API and apparently do not write a header with content size. |
Ah, so this is a "framed" versus "unframed" thing. I parquet usage (which prompted cramjam), the buffer size is kept in other metadata. |
According to zstd.h, note 2 "decompressed size is an optional field, it may not be present, typically in streaming mode." |
|
I have made zarr-developers/numcodecs#424 . Let's see if it resolves, else can replace with imagecodecs_zstd for now (requiring an extra install). |
Tifffile 2023.3.15 now writes fsspec reference files with |
@croth1 , can you reinstall and give it a go? |
Thanks a lot to both of you. With latest tifffile and |
Great! Closing this for now. If numcodecs fixes itself, we may revisit. |
Uh oh!
There was an error while loading. Please reload this page.
When I translate geotiffs created by rasterio with tiff_to_zarr, it only seems to work for uncompressed files. As soon as I choose a compression, the values seem off:
The values it reads are much larger than
512*512*5 = 1310720
, which should be the largest value in the array. Also for some reason the array does not seem to have a compressor set, although the kerchunk generated files do mention a compressor Any ideas what I am doing wrong?The text was updated successfully, but these errors were encountered: