-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem?
xr.Dataset.where
and xr.DataArray.where
currently do not allow providing both other
and drop=True
, stating that they are mutually exclusive. Conceptually, this doesn't strike me as true. Drop does not drop all points where cond
is False, only those where an entire coordinate label evaluates to False. This is most important when trying to avoid type promotion, such as in the below example, where an integer FillValue might be preferred over dtypes.NA
.
In [2]: da = xr.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
In [3]: da
Out[3]:
<xarray.DataArray (x: 4, y: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Dimensions without coordinates: x, y
If other
is not provided, the array is promoted to float64
to accommodate the default,dtypes.NA
:
In [4]: da.where(da > 6, drop=True)
Out[4]:
<xarray.DataArray (x: 3, y: 4)>
array([[nan, nan, nan, 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.]])
Dimensions without coordinates: x, y
However, the combination of other
and drop=True
is not allowed:
In [5]: da.where(da > 6, -1, drop=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [5], in <module>
----> 1 da.where(da > 6, -1, drop=True)
File ~/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/xarray/core/common.py:1268, in DataWithCoords.where(self, cond, other, drop)
1266 if drop:
1267 if other is not dtypes.NA:
-> 1268 raise ValueError("cannot set `other` if drop=True")
1270 if not isinstance(cond, (Dataset, DataArray)):
1271 raise TypeError(
1272 f"cond argument is {cond!r} but must be a {Dataset!r} or {DataArray!r}"
1273 )
ValueError: cannot set `other` if drop=True
Describe the solution you'd like
Current implementation
The current behavior is enforced within the block handling the drop
argument (currently https://github.com/pydata/xarray/blob/main/xarray/core/common.py#L1266-L1268):
if drop:
if other is not dtypes.NA:
raise ValueError("cannot set `other` if drop=True")
Proposed fix
I just removed the above if statement on a fork, and the example now works!
>>> import xarray as xr, numpy as np
>>> da = xr.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
>>> da
<xarray.DataArray (x: 4, y: 4)>
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Dimensions without coordinates: x, y
>>> da.where(da > 6, drop=True)
<xarray.DataArray (x: 3, y: 4)>
array([[nan, nan, nan, 7.],
[ 8., 9., 10., 11.],
[12., 13., 14., 15.]])
Dimensions without coordinates: x, y
>>> da.where(da > 6, -1, drop=True)
<xarray.DataArray (x: 3, y: 4)>
array([[-1, -1, -1, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Dimensions without coordinates: x, y
Making this change fails a single test in https://github.com/pydata/xarray/blob/main/xarray/tests/test_dataset.py#L4548-L4549, which explicitly checks for this behavior:
with pytest.raises(ValueError, match=r"cannot set"):
ds.where(ds > 1, other=0, drop=True)
This could easily be reworked to check for valid handling of both arguments.
Describe alternatives you've considered
No response
Additional context
I haven't yet investigated what would happen with chunked, sparse, or other complex arrays, or if it's compatible with trees and other things on the roadmap. It's possible this breaks things I'm not imagining. Currently, where(cond, other)
and where(cond, drop=True)
are well-tested, flexible operations, and I don't see why allowing their union would break anything, but I'll wait to hear from the experts on that front!
I'm definitely open to creating a pull request (and have the simple implementation I've outlined here ready to go).