-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
I was trying to assign coordinates on a dataset that had been created by using stack. After assigning the coordinates, the dataset was in a state where its length was coming out as less than zero, which caused all sorts of issues.
What did you expect to happen?
I think the issue is with the updating of _coord_names
, perhaps in
xarray/xarray/core/coordinates.py
Line 389 in 18454c2
def _maybe_drop_multiindex_coords(self, coords: set[Hashable]) -> None: |
I expected to just be able to assign the coords and then print the array to see the result.
Minimal Complete Verifiable Example
import xarray as xr
ds = xr.DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=("lat", "year", "month"),
coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
name="test",
).to_dataset()
stacked = ds.stack(time=("year", "month"))
stacked = stacked.assign_coords(
{"time": [y + m / 12 for y, m in stacked["time"].values]}
)
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
Traceback (most recent call last):
File "mre.py", line 17, in <module>
len(stacked)
File ".../xarray-tests/xarray/core/dataset.py", line 1364, in __len__
return len(self.data_vars)
ValueError: __len__() should return >= 0
Anything else we need to know?
Here's a test (I put it in test_dataarray.py
but maybe there is a better spot)
def test_assign_coords_drop_coord_names(self) -> None:
ds = DataArray(
[[[1, 1], [0, 0]], [[2, 2], [1, 1]]],
dims=("lat", "year", "month"),
coords={"lat": [-60, 60], "year": [2010, 2020], "month": [3, 6]},
name="test",
).to_dataset()
stacked = ds.stack(time=("year", "month"))
stacked = stacked.assign_coords(
{"time": [y + m / 12 for y, m in stacked["time"].values]}
)
# this seems to be handled correctly
assert set(stacked._variables.keys()) == {"test", "time", "lat"}
# however, _coord_names doesn't seem to update as expected
# the below fails
assert set(stacked._coord_names) == {"time", "lat"}
# the incorrect value of _coord_names means that all the below fails too
# The failure is because the length of a dataset is calculated as (via len(data_vars))
# len(dataset._variables) - len(dataset._coord_names). For the situation
# above, where len(dataset._coord_names) is greater than len(dataset._variables),
# you get a length less than zero which then fails because length must return
# a value greater than zero
# Both these fail with ValueError: __len__() should return >= 0
len(stacked)
print(stacked)
Environment
INSTALLED VERSIONS
commit: e678a1d
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:14)
[Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 0.1.dev4312+ge678a1d.d20220928
pandas: 1.5.0
numpy: 1.22.4
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.1
cfgrib: 0.9.10.1
iris: 3.3.0
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None