groupby().apply() on variable with NaNs raises IndexError #2383

Huite · 2018-08-27T12:27:06Z

Code Sample

import xarray as xr
import numpy as np

def standardize(x):
      return (x - x.mean()) / x.std()

ds = xr.Dataset()
ds["variable"] = xr.DataArray(np.random.rand(4,3,5), 
                               {"lat":np.arange(4), "lon":np.arange(3), "time":np.arange(5)}, 
                               ("lat", "lon", "time"),
                              )

ds["id"] = xr.DataArray(np.arange(12.0).reshape((4,3)),
                         {"lat": np.arange(4), "lon":np.arange(3)},
                         ("lat", "lon"),
                        )

ds["id"].values[0,0] = np.nan

ds.groupby("id").apply(standardize)

Problem description

This results in an IndexError. This is mildly confusing, it took me a little while to figure out the NaN's were to blame. I'm guessing the NaN doesn't get filtered out everywhere.

The traceback:


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-267ba57bc264> in <module>()
     15 ds["id"].values[0,0] = np.nan
     16
---> 17 ds.groupby("id").apply(standardize)

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in apply(self, func, **kwargs)
    607         kwargs.pop('shortcut', None)  # ignore shortcut if set (for now)
    608         applied = (func(ds, **kwargs) for ds in self._iter_grouped())
--> 609         return self._combine(applied)
    610
    611     def _combine(self, applied):

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _combine(self, applied)
    614         coord, dim, positions = self._infer_concat_args(applied_example)
    615         combined = concat(applied, dim)
--> 616         combined = _maybe_reorder(combined, dim, positions)
    617         if coord is not None:
    618             combined[coord.name] = coord

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _maybe_reorder(xarray_obj, dim, positions)
    428
    429 def _maybe_reorder(xarray_obj, dim, positions):
--> 430     order = _inverse_permutation_indices(positions)
    431
    432     if order is None:

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _inverse_permutation_indices(positions)
    109         positions = [np.arange(sl.start, sl.stop, sl.step) for sl in positions]
    110
--> 111     indices = nputils.inverse_permutation(np.concatenate(positions))
    112     return indices
    113

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\nputils.py in inverse_permutation(indices)
     52     # use intp instead of int64 because of windows :(
     53     inverse_permutation = np.empty(len(indices), dtype=np.intp)
---> 54     inverse_permutation[indices] = np.arange(len(indices), dtype=np.intp)
     55     return inverse_permutation
     56

IndexError: index 11 is out of bounds for axis 0 with size 11

Expected Output

My assumption was that it would throw out the values that fall within the NaN group, likepandas:

import pandas as pd
import numpy as np

df = pd.DataFrame()
df["var"] = np.random.rand(10)
df["id"] = np.arange(10)
df["id"].iloc[0:2] = np.nan
df.groupby("id").mean()

Out:

          var
id
2.0  0.565366
3.0  0.744443
4.0  0.190983
5.0  0.196922
6.0  0.377282
7.0  0.141419
8.0  0.957526
9.0  0.207360

Output of `xr.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

xarray: 0.10.8
pandas: 0.23.3
numpy: 1.15.0
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.2
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 3.7.1
IPython: 6.4.0
sphinx: 1.7.5

The text was updated successfully, but these errors were encountered:

shoyer · 2018-08-27T23:31:07Z

I agree, this is definitely confusing. We should probably drop these groups automatically, like pandas.

shoyer added the bug label Aug 27, 2018

dcherian added the topic-groupby label Oct 26, 2018

dcherian mentioned this issue Oct 16, 2019

Drop groups associated with nans in group variable #3406

Merged

4 tasks

max-sixty closed this as completed in #3406 Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

groupby().apply() on variable with NaNs raises IndexError #2383

groupby().apply() on variable with NaNs raises IndexError #2383

Huite commented Aug 27, 2018

shoyer commented Aug 27, 2018

Uh oh!

Uh oh!

groupby().apply() on variable with NaNs raises IndexError #2383

groupby().apply() on variable with NaNs raises IndexError #2383

Comments

Huite commented Aug 27, 2018

Code Sample

Problem description

Expected Output

Output of xr.show_versions()

shoyer commented Aug 27, 2018

Uh oh!

Output of `xr.show_versions()`