-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Unicode strings unexpectedly transformed to byte strings upon open_dataset
#4859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, there was a behaviour change in h5py. A fix is on the way but not yet released: h5netcdf/h5netcdf#81 For the time being you can downgrade h5py: |
Possible solution to my problem is: test_ds.coords["coils"] = test_ds.coils.values.astype(np.unicode_)
test_ds |
Thank you! |
@kripnerl The actual fix is in h5netcdf/h5netcdf#82. @mathause Would you mind having a look at the proposed changes, also with respect to the implications for xarray? |
@kniperl You can test with #4893. This should fix the But It seems that converting to string objects is by design (using "netcdf4"). coils = np.array(["A", "B", "C", "D", "E"])
test_ds = xr.Dataset()
test_ds.coords["coils"] = coils
print(test_ds)
> <xarray.Dataset>
> Dimensions: (coils: 5)
> Coordinates:
> * coils (coils) <U1 'A' 'B' 'C' 'D' 'E'
> Data variables:
> *empty*
test_ds.to_netcdf("test_netcdf4.nc", engine="netcdf4")
del test_ds
test_ds = xr.open_dataset("test_netcdf4.nc", engine="netcdf4")
print(test_ds)
> <xarray.Dataset>
> Dimensions: (coils: 5)
> Coordinates:
> * coils (coils) object 'A' 'B' 'C' 'D' 'E'
> Data variables:
> *empty*
|
@kmuehlbauer Thanks a lot, I will check it ASAP. Yop, conversion to object from U4 is, I believe, normal behaviour. However, this does not cause any trouble for me so far. |
@mathause This can be closed. |
What happened:
Unicode coordinates convert to bytes after saving/loading with
h5netcdf
backend. This results with the practically unusable dataset (bytes != string).What you expected to happen:
Load the string as a string.
Minimal Complete Verifiable Example:
Anything else we need to know?:
The issue may be related to #1638.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 19:08:05)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: cs_CZ.UTF-8
LOCALE: cs_CZ.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3
xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: 1.5.3
netCDF4: 1.5.5
pydap: None
h5netcdf: 0.8.1
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: None
seaborn: 0.11.0
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: None
The text was updated successfully, but these errors were encountered: