Skip to content

xr.full_like (often) fails when other is chunked and fill_value is non-scalar #3977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Huite opened this issue Apr 16, 2020 · 1 comment · Fixed by #3979
Closed

xr.full_like (often) fails when other is chunked and fill_value is non-scalar #3977

Huite opened this issue Apr 16, 2020 · 1 comment · Fixed by #3979

Comments

@Huite
Copy link
Contributor

Huite commented Apr 16, 2020

I've been running into some issues when using xr.full_like, when my other.data is a chunked dask array, and the fill_value is a numpy array.

Now, I just checked, full_like mentions only scalar in the signature. However, this is a very convenient way to get all the coordinates and dimensions attached to an array like this, so it feels like desirable functionality. And as I mention below, both numpy and dask function similary, taking much more than just scalars.
https://xarray.pydata.org/en/stable/generated/xarray.full_like.html

MCVE Code Sample

x = [1, 2, 3, 4]
y = [1, 2, 3]
da1 = xr.DataArray(dask.array.ones((3, 4), chunks=(1, 4)), {"y": y, "x": x}, ("y", "x"))
da2 = xr.full_like(da1, np.ones((3, 4)))
print(da2.values)

This results in an error:
ValueError: could not broadcast input array from shape (1,3) into shape (1,4)

Expected Output

Expected is a DataArray with the dimensions and coords of other, and the numpy array of fill_value as its data.

Problem Description

The issue lies here:

xarray/xarray/core/common.py

Lines 1420 to 1436 in 2c77eb5

def _full_like_variable(other, fill_value, dtype: DTypeLike = None):
"""Inner function of full_like, where other must be a variable
"""
from .variable import Variable
if isinstance(other.data, dask_array_type):
import dask.array
if dtype is None:
dtype = other.dtype
data = dask.array.full(
other.shape, fill_value, dtype=dtype, chunks=other.data.chunks
)
else:
data = np.full_like(other, fill_value, dtype=dtype)
return Variable(dims=other.dims, data=data, attrs=other.attrs)

Calling dask.array.full with the given number of chunks results in it trying to to apply the fill_value for every individual chunk.

As one would expect, if I set fill_value to the size of a single chunk it doesn't error:

da2 = xr.full_like(da1, np.ones((1, 4)))
print(da2.values)

It does fail on a similarly chunked dask array (since it's applying it for every chunk):

da2 = xr.full_like(da1, dask.array.ones((3, 4)))
print(da2.values)

The most obvious solution would be to force it down the np.full_like route, since all the values already exist in memory anyway. So maybe another type check does the trick. However, full() accepts quite a variety of arguments for the fill value (scalars, numpy arrays, lists, tuples, ranges). The dask docs mention only a scalar in the signature for dask.array.full:
https://docs.dask.org/en/latest/array-api.html#dask.array.full
As does numpy.full:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html

However, in all cases, they still broadcast automatically...

a = np.full((2, 2), [1, 2]
>>> array([[1, 2],
       [1, 2]])

So kind of undefined behavior of a blocked full?

Versions

Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3

xarray: 0.15.1
pandas: 0.25.3
numpy: 1.17.5
scipy: 1.3.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.2
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.9.2
distributed: 2.10.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200325
pip: 20.0.2
conda: None
pytest: 5.3.4
IPython: 7.13.0
sphinx: 2.3.1

@Huite Huite changed the title xr.full_like fails when other is chunked xr.full_like (often) fails when other is chunked and fill_value is non-scalar Apr 16, 2020
@dcherian
Copy link
Contributor

da2 = da1.copy(data=np.ones((3,4))) should be what you want.

We should raise an error when the input to full_like is not scalar

shoyer pushed a commit that referenced this issue Apr 24, 2020
* Avoid multiplication DeprecationWarning in rasterio backend

* full_like: error on non-scalar fill_value

Fixes #3977

* Added test

* Updated what's new

* core.utils.is_scalar instead of numpy.is_scalar

* More informative error message

* raises_regex for error test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants