Skip to content

.reset_index()/.reset_coords() maintain MultiIndex status  #8743

@ks905383

Description

@ks905383

What happened?

Trying to save a dataset to NetCDF using ds.to_netcdf() will fail when one of the coordinates is a multiindex. The error message suggests using .reset_index() to remove the multiindex. However, saving still fails after resetting the index, including after moving the offending coordinates to be data variables instead using .reset_coords().

What did you expect to happen?

After calling .reset_index(), and especially after calling .reset_coords(), the save should be successful.

As shown in the example below, a dataset that asserts identical to the dataset that throws the error saves without a problem. (this also points to a current workaround - to recreate the Dataset from scratch).

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

# Create random dataset
ds = xr.Dataset({'test':(['lat','lon'],np.random.rand(2,3))},
                coords = {'lat':(['lat'],[0,1]),
                          'lon':(['lon'],[0,1,2])})

# Create multiindex by stacking
ds = ds.stack(locv=('lat','lon'))
# The index shows up as a MultiIndex
print(ds.indexes)

# Try to export (this fails as expected, since multiindex)
#ds.to_netcdf('test.nc')

# Now, get rid of multiindex by resetting coords (i.e., 
# turning coordinates into data variables)
ds = ds.reset_index('locv').reset_coords()

# The index is no longer a MultiIndex
print(ds.indexes)

# Try to export - this also fails! 
#ds.to_netcdf('test.nc')

# A reference comparison dataset that is successfully asserted
# as identical 
ds_compare = xr.Dataset({k:(['locv'],ds[k].values) for k in ds})
xr.testing.assert_identical(ds_compare,ds)

# Try exporting (this succeeds)
ds_compare.to_netcdf('test.nc')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[109], line 1
----> 1 ds.to_netcdf('test.nc')

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   2300     encoding = {}
   2301 from xarray.backends.api import to_netcdf
-> 2303 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   2304     self,
   2305     path,
   2306     mode=mode,
   2307     format=format,
   2308     group=group,
   2309     engine=engine,
   2310     encoding=encoding,
   2311     unlimited_dims=unlimited_dims,
   2312     compute=compute,
   2313     multifile=False,
   2314     invalid_netcdf=invalid_netcdf,
   2315 )

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1311 # to avoid this mess of conditionals
   1312 try:
   1313     # TODO: allow this work (setting up the file for writing array data)
   1314     # to be parallelized with dask
-> 1315     dump_to_store(
   1316         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1317     )
   1318     if autoclose:
   1319         store.close()

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1359 if encoder:
   1360     variables, attrs = encoder(variables, attrs)
-> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    349 if writer is None:
    350     writer = ArrayWriter()
--> 352 variables, attributes = self.encode(variables, attributes)
    354 self.set_attributes(attributes)
    355 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:441, in WritableCFDataStore.encode(self, variables, attributes)
    438 def encode(self, variables, attributes):
    439     # All NetCDF files get CF encoded by default, without this attempting
    440     # to write times, for example, would fail.
--> 441     variables, attributes = cf_encoder(variables, attributes)
    442     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    443     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:791, in cf_encoder(variables, attributes)
    788 # add encoding for time bounds variables if present.
    789 _update_bounds_encoding(variables)
--> 791 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    793 # Remove attrs from bounds variables (issue #2921)
    794 for var in new_vars.values():

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:179, in encode_cf_variable(var, needs_copy, name)
    157 def encode_cf_variable(
    158     var: Variable, needs_copy: bool = True, name: T_Name = None
    159 ) -> Variable:
    160     """
    161     Converts a Variable into a Variable which follows some
    162     of the CF conventions:
   (...)
    177         A variable which has been encoded as described above.
    178     """
--> 179     ensure_not_multiindex(var, name=name)
    181     for coder in [
    182         times.CFDatetimeCoder(),
    183         times.CFTimedeltaCoder(),
   (...)
    190         variables.BooleanCoder(),
    191     ]:
    192         var = coder.encode(var, name=name)

File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:88, in ensure_not_multiindex(var, name)
     86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None:
     87     if isinstance(var._data, indexing.PandasMultiIndexingAdapter):
---> 88         raise NotImplementedError(
     89             f"variable {name!r} is a MultiIndex, which cannot yet be "
     90             "serialized. Instead, either use reset_index() "
     91             "to convert MultiIndex levels into coordinate variables instead "
     92             "or use https://cf-xarray.readthedocs.io/en/latest/coding.html."
     93         )

NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html.

Anything else we need to know?

This is a recent error that came up in some automated tests - an older version of it is still working; so xarray v2023.1.0 does not have this issue.

Given that saving works with a dataset that xr.testing.assert_identical() asserts is identical to the dataset that fails, and that ds.indexes() no longer shows a MultiIndex on the dataset that fails, perhaps the issue is in the error itself - i.e., in xarray.conventions.ensure_not_multiindex ?

Looks like it was added recently f9f4c73 to address another bug.

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:05:03) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.1.1
pandas: 2.2.0
numpy: 1.26.3
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: 0.15.1
flox: None
numpy_groupies: None
setuptools: 69.0.3
pip: 24.0
conda: None
pytest: 7.4.0
mypy: None
IPython: 8.21.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions