-
Notifications
You must be signed in to change notification settings - Fork 90
Data for variable is not inlined despite size below threshold #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I does seem that the buffer is really that size. Perhaps the chunksize is as it is because it makes it possible to append to the HDF5 file later? In any case, inlining looks at the buffer size and includes the whole buffer if appropriate. In this case, we know the array size is much smaller than that, so we should instead consider the whole size of the array and inline that instead (taking care to change the chunksize to match). So: doable, but needs a little work. |
Thanks for looking! To confirm, I think you are saying that ...
means it really is using 4096 bytes on disk. So yes, seems like inlining could get smarter, but is getting the right value. |
Yes. Note that if you combine over these values later, you will get only the actual values not the whole buffers. |
I've further examined my problem, @martindurant, and would very much appreciate a suggestion on whether I should create an issue at zarr on the behavior of My actual problem is that
You get the right value if you pass good credentials with |
I am not at all sure how that might be happening... |
You do get |
Well... it really should be "return", and then filter what comes back for things that are in self.exceptions, else raise (i.e., get in a batch should produce the same as serially getting each using |
Closing original issue, which was not a bug. Moving the secondary issue discovered to zarr as zarr-developers/zarr-python#1578. |
I am trying to build a MultiZarrToZarr from the GLDAS series distributed by GES DISC, and concatenation on time is not coming out right. I think I have traced the problem to something about the chunking of the "time" variable in the individual NetCDF4 files. A symptom of something being wrong is that the data for this variable does not get inlined and the ref is suspicious.
The "time" variable in each file has shape (1,) and dtype float64, but invoking SingleHdf5ToZarr using the default value for
inline_threshold
does not embed the data:Rather than an array with a much too big number of bytes, I'm expecting a base64 string representing the single value. The only clue I can offer is that XArray reports the encoding on "time" as having a chunk size of (512,) ... why, I have no idea.
GLDAS_NOAH025_3H.A20000101.0300.021.nc4.zip
The text was updated successfully, but these errors were encountered: