-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
apply_ufunc doesn't inherit encoding #10297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks @jpdehollain for raising this. Please see #6323 for more details. Closing in the light of #5336 and #5082. |
Thanks @kmuehlbauer and apologies for not finding that issue before posting (I searched for inherit instead of propagate 😆).. To add some context in case it aids the discussion, I have a dataset with data arrays that can contain either float or str (dtype=object) types. The arrays get updated one coordinate value at a time (which means that some data values get filled) and updates happen across sessions so I need to store the Dataset. I use encoding on the str arrays only to set the fill value, because otherwise they get converted to an incompatible type when I load the Dataset from file. Setting the encoding on the entire Dataset at the .to_netcdf feels inefficient because I only want it on the str type arrays, so in this particular case it would be inconvenient to drop the encoding property all together from the variables |
@jpdehollain Thanks for the additional context. Yes, this is a somewhat not satisfactory situation. I've issues like this in my workflows, too. Usually I'm wrapping |
You could also just stick the information in attrs. I believe the encoding logic will look there too |
Thanks for the suggestion @dcherian. I just tried that but it didn't work for me, e.g., if I create:
and then save it and load it:
The string array gets loaded with empty strings. If instead I swap for the commented line above, the dataset loads in the correct way. |
What happened?
Encoding in data arrays are lost after applying any ufunc. There should be an argument similar to
keep_attrs
inapply_ufunc
for encoding.What did you expect to happen?
The encoding property should be inherited (or resolved if more than one data arrays are involved) when applying a ufunc.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:31:24) [MSC v.1943 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 11
machine: AMD64
processor: Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Australia', '1252')
libhdf5: 1.14.6
libnetcdf: 4.9.2
xarray: 0.1.dev5937+g070af11
pandas: 2.2.3
numpy: 2.2.5
scipy: 1.15.2
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: None
mypy: None
IPython: 9.2.0
sphinx: None
The text was updated successfully, but these errors were encountered: