Skip to content

Broken state when using assign_coords with multiindex #7097

@znichollscr

Description

@znichollscr

What happened?

I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.

What did you expect to happen?

I think the issue is with the updating of _coord_names, perhaps in

def _maybe_drop_multiindex_coords(self, coords: set[Hashable]) -> None:
.

I expected to just be able to assign the coords and then print the array to see the result.

Minimal Complete Verifiable Example

import xarray as xr


ds = xr.DataArray(
    [[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
    dims=("lat", "year", "month"),
    coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
    name="test",
).to_dataset()

stacked = ds.stack(time=("year", "month"))
stacked = stacked.assign_coords(
    {"time": [y + m / 12 for y, m in stacked["time"].values]}
)

# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Traceback (most recent call last):
  File "mre.py", line 17, in <module>
    len(stacked)
  File ".../xarray-tests/xarray/core/dataset.py", line 1364, in __len__
    return len(self.data_vars)
ValueError: __len__() should return >= 0

Anything else we need to know?

Here's a test (I put it in test_dataarray.py but maybe there is a better spot)

def test_assign_coords_drop_coord_names(self) -> None:
        ds = DataArray(
            [[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
            dims=("lat", "year", "month"),
            coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
            name="test",
        ).to_dataset()

        stacked = ds.stack(time=("year", "month"))
        stacked = stacked.assign_coords(
            {"time": [y + m / 12 for y, m in stacked["time"].values]}
        )

        # this seems to be handled correctly
        assert set(stacked._variables.keys()) == {"test", "time", "lat"}
        # however, _coord_names doesn't seem to update as expected
        # the below fails
        assert set(stacked._coord_names) == {"time", "lat"}

        # the incorrect value of _coord_names means that all the below fails too
        # The failure is because the length of a dataset is calculated as (via len(data_vars))
        # len(dataset._variables) - len(dataset._coord_names). For the situation
        # above, where len(dataset._coord_names) is greater than len(dataset._variables),
        # you get a length less than zero which then fails because length must return
        # a value greater than zero

        # Both these fail with ValueError: __len__() should return >= 0
        len(stacked)
        print(stacked)

Environment

INSTALLED VERSIONS

commit: e678a1d
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions