Sum and prod with min_count forces evaluation #4898

bcbnz · 2021-02-12T09:42:06Z

If I use the sum method on a lazy array with min_count != None then evaluation is forced. If there is some limitation of the implementation which means it cannot be added to the computation graph for lazy evaluation then this should be mentioned in the docs.

Minimal Complete Verifiable Example:

import numpy as np
import xarray as xr


def worker(da):
    if da.shape == (0, 0):
        return da

    raise RuntimeError("I was evaluated")


da = xr.DataArray(
    np.random.normal(size=(20, 500)),
    dims=("x", "y"),
    coords=(np.arange(20), np.arange(500)),
)

da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result1 = lazy.sum("x", skipna=True)
result2 = lazy.sum("x", skipna=True, min_count=5)

What happened: RuntimeError: I was evaluated

What you expected to happen: No output or exceptions, as the result1 and result2 arrays are not printed or saved.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.1 (default, Feb 6 2021, 06:49:13)
[GCC 10.2.0]
python-bits: 64
OS: Linux
OS-release: 5.10.15-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.16.2
pandas: 1.2.1
numpy: 1.20.0
scipy: 1.6.0
netCDF4: 1.5.5.1
pydap: None
h5netcdf: 0.9.0
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.0
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.4
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 53.0.0
pip: 20.3.1
conda: None
pytest: 6.2.1
IPython: 7.19.0
sphinx: 3.4.3

The text was updated successfully, but these errors were encountered:

mathause · 2021-02-12T16:24:10Z

Thanks for the report, the culprit is likely

xarray/xarray/core/nanops.py

Line 34 in 5296ed1

if null_mask.any():

We fixed a similar problem in weighted.

bcbnz · 2021-02-12T16:55:11Z

grepping the code, the only other function that calls _maybe_null_out is prod, and I can confirm the problem also exists there. Updated the title, MCVE for prod:

import numpy as np
import xarray as xr


def worker(da):
    if da.shape == (0, 0):
        return da

    raise RuntimeError("I was evaluated")


da = xr.DataArray(
    np.random.normal(size=(20, 500)),
    dims=("x", "y"),
    coords=(np.arange(20), np.arange(500)),
)

da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result1 = lazy.prod("x", skipna=True)
result2 = lazy.prod("x", skipna=True, min_count=5)

dcherian · 2021-02-12T17:17:21Z

Can we use np.where instead of this if condition?

bcbnz · 2021-02-12T17:23:51Z

A quick check with the debugger and it is the null_mask.any() call that is causing it to compute.

I think I've found another problem with _maybe_null_out if it is reducing over all dimensions. With the altered MCVE

import numpy as np
import xarray as xr

def worker(da):
    if da.shape == (0, 0):
        return da

    res = xr.full_like(da, np.nan)
    res[0, 0] = 1
    return res


da = xr.DataArray(
    np.random.normal(size=(20, 500)),
    dims=("x", "y"),
    coords=(np.arange(20), np.arange(500)),
)

da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)
result_allaxes = lazy.sum(skipna=True, min_count=5)
result_allaxes.load()

I would expect result_allaxes to be nan since there are four blocks and therefore four non-nan values, less than min_count. Instead it is 4.

The problem seems to be the dtype check:

xarray/xarray/core/nanops.py

Line 39 in 5296ed1

elif getattr(result, "dtype", None) not in dtypes.NAT_TYPES:

The test returns True for float64 and so the block isn't run. Another MCVE:

import numpy as np
from xarray.core import dtypes

print(dtypes.NAT_TYPES)
print(np.dtype("float64") in dtypes.NAT_TYPES)

Output:

(numpy.datetime64('NaT'), numpy.timedelta64('NaT'))
True

where I think False would be expected. Should I open a separate issue for this or can we track it here too?

bcbnz · 2021-02-12T17:44:55Z

@dcherian it looks like that works. A better test script:

import numpy as np
import xarray as xr
from xarray.tests import raise_if_dask_computes


def worker(da):
    if da.shape == (0, 0):
        return da

    return da.where(da > 1)


np.random.seed(1023)
da = xr.DataArray(
    np.random.normal(size=(20, 500)),
    dims=("x", "y"),
    coords=(np.arange(20), np.arange(500)),
)

da = da.chunk(dict(x=5))
lazy = da.map_blocks(worker)

with raise_if_dask_computes():
    result = lazy.sum("x", skipna=True, min_count=5)

result.load()

assert np.isnan(result[0])
assert not np.isnan(result[6])

If I then remove the if null_mask.any() check and the following block, and replace it with

dtype, fill_value = dtypes.maybe_promote(result.dtype)
result = result.astype(dtype)
result = np.where(null_mask, fill_value, result)

it passes. I can start working on a pull request with these tests and changes if that looks acceptable to you.

How would you suggest handling the possible type promotion from the current dtype, fill_value = dtypes.maybe_promote(result.dtype) line? Currently it only tries promoting if the mask is True anywhere. Always promote, or just use the fill value and hope it works out?

bcbnz changed the title ~~Sum with min_count forces evaluation~~ Sum and prod with min_count forces evaluation Feb 12, 2021

dcherian added bug topic-dask labels Feb 12, 2021

bcbnz mentioned this issue Feb 15, 2021

Fix behaviour of min_count in reducing functions #4911

Merged

4 tasks

raybellwaves mentioned this issue Feb 16, 2021

create weights as dask array in test_deterministic.py xarray-contrib/xskillscore#265

Closed

mathause closed this as completed in #4911 Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sum and prod with min_count forces evaluation #4898

Sum and prod with min_count forces evaluation #4898

bcbnz commented Feb 12, 2021

INSTALLED VERSIONS

mathause commented Feb 12, 2021

bcbnz commented Feb 12, 2021

dcherian commented Feb 12, 2021

bcbnz commented Feb 12, 2021

bcbnz commented Feb 12, 2021 •

edited

Loading

Sum and prod with min_count forces evaluation #4898

Sum and prod with min_count forces evaluation #4898

Comments

bcbnz commented Feb 12, 2021

INSTALLED VERSIONS

mathause commented Feb 12, 2021

bcbnz commented Feb 12, 2021

dcherian commented Feb 12, 2021

bcbnz commented Feb 12, 2021

bcbnz commented Feb 12, 2021 • edited Loading

bcbnz commented Feb 12, 2021 •

edited

Loading