Skip to content

Pointwise indexing #3768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ivirshup opened this issue Feb 13, 2020 · 6 comments · Fixed by #4711
Closed

Pointwise indexing #3768

ivirshup opened this issue Feb 13, 2020 · 6 comments · Fixed by #4711

Comments

@ivirshup
Copy link

MCVE Code Sample

import xarray as xr
import numpy as np

da = xr.DataArray(
    np.arange(56).reshape((7, 8)),
    coords={
        'x': list('abcdefg'),
        'y': 10 * np.arange(8)
        },
    dims=['x', 'y']
)

# Shouldn't this be (2,)?
assert da.isel(x=[0, 1], y=[0, 1]).shape == (2, 2)

Expected Output

I had expected da.isel(x=[0, 1], y=[0, 1]) to have shape (2,). I had generally expected indexing with isel to behave more like numpy indexing. It's very possible I'm just missing something, or that this is more of a documentation issue more than a behavior issue.

Problem Description

Going off this example in #507:

In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')
Out[3]:
<xray.DataArray (points: 3)>
array([ 0,  9, 48])
Coordinates:
    y        (points) int64 0 10 0
    x        (points) |S1 'a' 'b' 'g'
  * points   (points) int64 0 1 2

and the deprecation of isel_points with isel, I had expected to get numpy-like coordinate indexing using isel.

This was made a little bit more confusing by the documentation for setting values by index. In particular the example:

In [68]: da[ind_x, ind_y] = -2  # assign -2 to (ix, iy) = (0, 0) and (1, 1)

In [69]: da
Out[69]: 
<xarray.DataArray (x: 3, y: 4)>
array([[-2, -2, -1, -1],
       [-2, -2,  6,  7],
       [ 8,  9, 10, 11]])

To me, the comment # assign -2 to (ix, iy) = (0, 0) and (1, 1) makes it sound like values will be assigned at the coordinates (0, 0) and (1, 1), not (0, 0), (0, 1), (1, 0), and (1, 1).

All in all, I'm not sure if this is a bug, or an issue with documentation. If isel is not meant to behave like isel_points, it would be nice to see that in the documentation. If it is possible to get and set points by coordinate (without looping over single coordinates) it would be nice to see an example in the documentation where that's shown.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.7.6 (default, Jan 4 2020, 12:18:30)
[Clang 11.0.0 (clang-1100.0.33.16)]
python-bits: 64
OS: Darwin
OS-release: 19.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.2
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.2
distributed: 2.9.3
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.11.1
sphinx: 2.3.1

@dcherian
Copy link
Contributor

The documentation for what you want is here: https://xarray.pydata.org/en/stable/indexing.html#more-advanced-indexing. Basically you need to provide DataArrays with a new dimension instead of lists.

PRs to improve the documentation are very welcome (ref #1552). We actually have a nice schematic here: https://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation

@ivirshup
Copy link
Author

The documentation for what you want is here: https://xarray.pydata.org/en/stable/indexing.html#more-advanced-indexing. Basically you need to provide DataArrays with a new dimension instead of lists.

Thanks! I must have missed this, I suspect since my use was actually setting the values at some coordinates. Is there an efficient way to do that?

I'd be happy to add some notes to the documentation about that. The .sel and isel methods could definitely use longer doc strings. It would be useful to have a better understanding of the concept this implements though. Why do DataArrays have different behavior than other array-likes for these methods?

@max-sixty
Copy link
Collaborator

Why do DataArrays have different behavior than other array-likes for these methods?

Could you expand a bit? Do you mean re sel etc rather than positional indexing?

@ivirshup
Copy link
Author

Why do DataArrays have different behavior than other array-likes for these methods?

For sel and isel. What the reasoning for making these statements behave differently?

Setup
import xarray as xr
import numpy as np

da = xr.DataArray( 
    np.arange(56).reshape((7, 8)), 
    coords={ 
        'x': list('abcdefg'), 
        'y': 10 * np.arange(8) 
        }, 
    dims=['x', 'y'] 
)
xidx = np.array([1, 2, 3])
yidx = np.array([1, 2, 3])

da.isel(x=xidx, y=yidx)
# <xarray.DataArray (x: 3, y: 3)>
# array([[ 9, 10, 11],
#        [17, 18, 19],
#        [25, 26, 27]])
# Coordinates:
#   * x        (x) <U1 'b' 'c' 'd'
#   * y        (y) int64 10 20 30

da.isel(x=xr.DataArray(xidx), y=xr.DataArray(yidx))  
# <xarray.DataArray (dim_0: 3)>
# array([ 9, 18, 27])
# Coordinates:
#     x        (dim_0) <U1 'b' 'c' 'd'
#     y        (dim_0) int64 10 20 30
# Dimensions without coordinates: dim_0

@max-sixty
Copy link
Collaborator

max-sixty commented Feb 14, 2020

Thanks for the clear question.

Having array and xr.DataArray(array) behave differently as arguments is (fairly) rare, so I can understand that it might be surprising at first.

One advantage of this is it allowing pointwise indexing over a new dimension. Generally that would be a usefully named dimension (e.g. z in the docs), rather than dim_0 though...

I'll defer to @shoyer on commenting on his design choices if he sees this thread

I do think it's complicated though (I find myself re-reading the docs and trying to remember how it works depending on the type & order of the arguments!), and a nicely presented table with the possible indexing methods and their results would be great.

@shoyer
Copy link
Member

shoyer commented Feb 14, 2020

The model of how indexing with non-DataArray objects is described here under vectorized indexing: “Slices or sequences/arrays without named-dimensions are treated as if they have the same dimension which is indexed along”

For better or worse, xarray doesn’t have any way to distinguish between “meaningful” and “default” dimension names. This means that a DataArray without explicitly named dimensions will indeed broadcast differently (e.g., in either arithmetic or indexing) than unlabeled NumPy arrays.

These were intentional design choices: we like our name based broadcasting rules better than NumPy’s positional rules. And for the most part, the default dimension names like dim_0 are just there so a user can see how they need to change their data instead of getting an error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants