-
-
Notifications
You must be signed in to change notification settings - Fork 328
Change default fill values #2265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm a big +1 on changing the defaults. I don't really care about the values but will note NetCDF4 has also defined defaults: In [1]: from netCDF4 import default_fillvals
In [2]: default_fillvals
Out[2]:
{'S1': '\x00',
'i1': -127,
'u1': 255,
'i2': -32767,
'u2': 65535,
'i4': -2147483647,
'u4': 4294967295,
'i8': -9223372036854775806,
'u8': 18446744073709551614,
'f4': 9.969209968386869e+36,
'f8': 9.969209968386869e+36} |
I'm not opposed to moving the defaults, but it's probably worth hearing from people in other domains. In my experience in bioimaging, for both raw images and segmentations, 0 is conventionally used as a background label and people very often rely on application default values (which is probably where the convention came from in the first place). cc @jni Maybe in zarr v4 we can have proper support for nullable types to avoid this problem alltogether ;) |
xarray was able to implement a workaround, so I'll close this. |
oof. Thanks @d-v-b for the ping — there's definitely a lot of code out there that assumes that the default is 0. Much of it mine. 😂 But, fwiw, I'm not opposed here, as it seems there's an older community of practice using different values. But it should be a long deprecation cycle. |
Zarr version
v3
Numcodecs version
na
Python Version
na
Operating System
na
Installation
na
Description
Over in pydata/xarray#5475, we've been discussing an issue that's affecting xarray with Zarr v3. xarray currently interprets values equal to
fill_value
as "missing" and casts them to NaN (apparently in NetCDF (or CF conventions?) there's some understanding that its_FillValue
is understood to be outside the "valid range" of the data).There's lots of discussion there, but one thing zarr-python could do to help would be to choose default fill values that are less likely overlap with valid data. Exactly what's valid is domain / application / dataset specific, but I think that
0
(or the equivalent for some dtype) is slightly more likely to be valid than many others, and so might be a worse default.What do people thing about the following kinds of rules?
intmin
/np.iinfo(dtype).min
np.iinfo(dtype).max
nan
nan+nan0j
Steps to reproduce
na
Additional output
No response
The text was updated successfully, but these errors were encountered: