Skip to content

groupby().apply() on variable with NaNs raises IndexError #2383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Huite opened this issue Aug 27, 2018 · 1 comment · Fixed by #3406
Closed

groupby().apply() on variable with NaNs raises IndexError #2383

Huite opened this issue Aug 27, 2018 · 1 comment · Fixed by #3406

Comments

@Huite
Copy link
Contributor

Huite commented Aug 27, 2018

Code Sample

import xarray as xr
import numpy as np

def standardize(x):
      return (x - x.mean()) / x.std()

ds = xr.Dataset()
ds["variable"] = xr.DataArray(np.random.rand(4,3,5), 
                               {"lat":np.arange(4), "lon":np.arange(3), "time":np.arange(5)}, 
                               ("lat", "lon", "time"),
                              )

ds["id"] = xr.DataArray(np.arange(12.0).reshape((4,3)),
                         {"lat": np.arange(4), "lon":np.arange(3)},
                         ("lat", "lon"),
                        )

ds["id"].values[0,0] = np.nan

ds.groupby("id").apply(standardize)

Problem description

This results in an IndexError. This is mildly confusing, it took me a little while to figure out the NaN's were to blame. I'm guessing the NaN doesn't get filtered out everywhere.

The traceback:


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-267ba57bc264> in <module>()
     15 ds["id"].values[0,0] = np.nan
     16
---> 17 ds.groupby("id").apply(standardize)

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in apply(self, func, **kwargs)
    607         kwargs.pop('shortcut', None)  # ignore shortcut if set (for now)
    608         applied = (func(ds, **kwargs) for ds in self._iter_grouped())
--> 609         return self._combine(applied)
    610
    611     def _combine(self, applied):

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _combine(self, applied)
    614         coord, dim, positions = self._infer_concat_args(applied_example)
    615         combined = concat(applied, dim)
--> 616         combined = _maybe_reorder(combined, dim, positions)
    617         if coord is not None:
    618             combined[coord.name] = coord

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _maybe_reorder(xarray_obj, dim, positions)
    428
    429 def _maybe_reorder(xarray_obj, dim, positions):
--> 430     order = _inverse_permutation_indices(positions)
    431
    432     if order is None:

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _inverse_permutation_indices(positions)
    109         positions = [np.arange(sl.start, sl.stop, sl.step) for sl in positions]
    110
--> 111     indices = nputils.inverse_permutation(np.concatenate(positions))
    112     return indices
    113

C:\Miniconda3\envs\main\lib\site-packages\xarray\core\nputils.py in inverse_permutation(indices)
     52     # use intp instead of int64 because of windows :(
     53     inverse_permutation = np.empty(len(indices), dtype=np.intp)
---> 54     inverse_permutation[indices] = np.arange(len(indices), dtype=np.intp)
     55     return inverse_permutation
     56

IndexError: index 11 is out of bounds for axis 0 with size 11

Expected Output

My assumption was that it would throw out the values that fall within the NaN group, likepandas:

import pandas as pd
import numpy as np

df = pd.DataFrame()
df["var"] = np.random.rand(10)
df["id"] = np.arange(10)
df["id"].iloc[0:2] = np.nan
df.groupby("id").mean()

Out:

          var
id
2.0  0.565366
3.0  0.744443
4.0  0.190983
5.0  0.196922
6.0  0.377282
7.0  0.141419
8.0  0.957526
9.0  0.207360

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

xarray: 0.10.8
pandas: 0.23.3
numpy: 1.15.0
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.2
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 3.7.1
IPython: 6.4.0
sphinx: 1.7.5
@shoyer
Copy link
Member

shoyer commented Aug 27, 2018

I agree, this is definitely confusing. We should probably drop these groups automatically, like pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants