-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Trying to save a dataset to NetCDF using ds.to_netcdf()
will fail when one of the coordinates is a multiindex. The error message suggests using .reset_index()
to remove the multiindex. However, saving still fails after resetting the index, including after moving the offending coordinates to be data variables instead using .reset_coords()
.
What did you expect to happen?
After calling .reset_index()
, and especially after calling .reset_coords()
, the save should be successful.
As shown in the example below, a dataset that asserts identical to the dataset that throws the error saves without a problem. (this also points to a current workaround - to recreate the Dataset from scratch).
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
# Create random dataset
ds = xr.Dataset({'test':(['lat','lon'],np.random.rand(2,3))},
coords = {'lat':(['lat'],[0,1]),
'lon':(['lon'],[0,1,2])})
# Create multiindex by stacking
ds = ds.stack(locv=('lat','lon'))
# The index shows up as a MultiIndex
print(ds.indexes)
# Try to export (this fails as expected, since multiindex)
#ds.to_netcdf('test.nc')
# Now, get rid of multiindex by resetting coords (i.e.,
# turning coordinates into data variables)
ds = ds.reset_index('locv').reset_coords()
# The index is no longer a MultiIndex
print(ds.indexes)
# Try to export - this also fails!
#ds.to_netcdf('test.nc')
# A reference comparison dataset that is successfully asserted
# as identical
ds_compare = xr.Dataset({k:(['locv'],ds[k].values) for k in ds})
xr.testing.assert_identical(ds_compare,ds)
# Try exporting (this succeeds)
ds_compare.to_netcdf('test.nc')
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[109], line 1
----> 1 ds.to_netcdf('test.nc')
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
2300 encoding = {}
2301 from xarray.backends.api import to_netcdf
-> 2303 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
2304 self,
2305 path,
2306 mode=mode,
2307 format=format,
2308 group=group,
2309 engine=engine,
2310 encoding=encoding,
2311 unlimited_dims=unlimited_dims,
2312 compute=compute,
2313 multifile=False,
2314 invalid_netcdf=invalid_netcdf,
2315 )
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
1311 # to avoid this mess of conditionals
1312 try:
1313 # TODO: allow this work (setting up the file for writing array data)
1314 # to be parallelized with dask
-> 1315 dump_to_store(
1316 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1317 )
1318 if autoclose:
1319 store.close()
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1359 if encoder:
1360 variables, attrs = encoder(variables, attrs)
-> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
349 if writer is None:
350 writer = ArrayWriter()
--> 352 variables, attributes = self.encode(variables, attributes)
354 self.set_attributes(attributes)
355 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:441, in WritableCFDataStore.encode(self, variables, attributes)
438 def encode(self, variables, attributes):
439 # All NetCDF files get CF encoded by default, without this attempting
440 # to write times, for example, would fail.
--> 441 variables, attributes = cf_encoder(variables, attributes)
442 variables = {k: self.encode_variable(v) for k, v in variables.items()}
443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:791, in cf_encoder(variables, attributes)
788 # add encoding for time bounds variables if present.
789 _update_bounds_encoding(variables)
--> 791 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
793 # Remove attrs from bounds variables (issue #2921)
794 for var in new_vars.values():
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:179, in encode_cf_variable(var, needs_copy, name)
157 def encode_cf_variable(
158 var: Variable, needs_copy: bool = True, name: T_Name = None
159 ) -> Variable:
160 """
161 Converts a Variable into a Variable which follows some
162 of the CF conventions:
(...)
177 A variable which has been encoded as described above.
178 """
--> 179 ensure_not_multiindex(var, name=name)
181 for coder in [
182 times.CFDatetimeCoder(),
183 times.CFTimedeltaCoder(),
(...)
190 variables.BooleanCoder(),
191 ]:
192 var = coder.encode(var, name=name)
File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:88, in ensure_not_multiindex(var, name)
86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None:
87 if isinstance(var._data, indexing.PandasMultiIndexingAdapter):
---> 88 raise NotImplementedError(
89 f"variable {name!r} is a MultiIndex, which cannot yet be "
90 "serialized. Instead, either use reset_index() "
91 "to convert MultiIndex levels into coordinate variables instead "
92 "or use https://cf-xarray.readthedocs.io/en/latest/coding.html."
93 )
NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html.
Anything else we need to know?
This is a recent error that came up in some automated tests - an older version of it is still working; so xarray v2023.1.0
does not have this issue.
Given that saving works with a dataset that xr.testing.assert_identical()
asserts is identical to the dataset that fails, and that ds.indexes()
no longer shows a MultiIndex on the dataset that fails, perhaps the issue is in the error itself - i.e., in xarray.conventions.ensure_not_multiindex
?
Looks like it was added recently f9f4c73 to address another bug.
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:05:03) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2024.1.1
pandas: 2.2.0
numpy: 1.26.3
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.21.0
sphinx: None