-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Sum and prod with min_count forces evaluation #4898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
grepping the code, the only other function that calls import numpy as np
import xarray as xr
def worker(da):
if da.shape == (0, 0):
return da
raise RuntimeError("I was evaluated")
da = xr.DataArray(
np.random.normal(size=(20, 500)),
dims=("x", "y"),
coords=(np.arange(20), np.arange(500)),
)
da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result1 = lazy.prod("x", skipna=True)
result2 = lazy.prod("x", skipna=True, min_count=5) |
Can we use |
A quick check with the debugger and it is the I think I've found another problem with import numpy as np
import xarray as xr
def worker(da):
if da.shape == (0, 0):
return da
res = xr.full_like(da, np.nan)
res[0, 0] = 1
return res
da = xr.DataArray(
np.random.normal(size=(20, 500)),
dims=("x", "y"),
coords=(np.arange(20), np.arange(500)),
)
da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result_allaxes = lazy.sum(skipna=True, min_count=5)
result_allaxes.load() I would expect The problem seems to be the dtype check: Line 39 in 5296ed1
The test returns True for float64 and so the block isn't run. Another MCVE: import numpy as np
from xarray.core import dtypes
print(dtypes.NAT_TYPES)
print(np.dtype("float64") in dtypes.NAT_TYPES) Output: (numpy.datetime64('NaT'), numpy.timedelta64('NaT'))
True where I think False would be expected. Should I open a separate issue for this or can we track it here too? |
@dcherian it looks like that works. A better test script: import numpy as np
import xarray as xr
from xarray.tests import raise_if_dask_computes
def worker(da):
if da.shape == (0, 0):
return da
return da.where(da > 1)
np.random.seed(1023)
da = xr.DataArray(
np.random.normal(size=(20, 500)),
dims=("x", "y"),
coords=(np.arange(20), np.arange(500)),
)
da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
with raise_if_dask_computes():
result = lazy.sum("x", skipna=True, min_count=5)
result.load()
assert np.isnan(result[0])
assert not np.isnan(result[6]) If I then remove the dtype, fill_value = dtypes.maybe_promote(result.dtype)
result = result.astype(dtype)
result = np.where(null_mask, fill_value, result) it passes. I can start working on a pull request with these tests and changes if that looks acceptable to you.
|
If I use the
sum
method on a lazy array withmin_count != None
then evaluation is forced. If there is some limitation of the implementation which means it cannot be added to the computation graph for lazy evaluation then this should be mentioned in the docs.Minimal Complete Verifiable Example:
What happened:
RuntimeError: I was evaluated
What you expected to happen: No output or exceptions, as the result1 and result2 arrays are not printed or saved.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.9.1 (default, Feb 6 2021, 06:49:13)
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.10.15-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: 0.9.0
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 53.0.0
pip: 20.3.1
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: 3.4.3
The text was updated successfully, but these errors were encountered: