-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
.reduce() on a DataArray with Dask distributed immediately executes the preceding portions of the computational graph #3161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yikes, this is pretty bad! Thanks for the clear code to reproduce it. |
This ends up being because xarray/xarray/core/variable.py Lines 1412 to 1460 in 3f9069b
I don't see why we need this kwarg or why it shouldn't be Lines 3997 to 4029 in 3f9069b
and invisible in xarray/xarray/core/dataarray.py Lines 2106 to 2139 in 3f9069b
|
I don't remember exactly why I added the We do something similar in At this point, I think we would probably just remove the argument and always default to |
This seems best. |
MCVE Code Sample
.mean()
on aDataArray
pointing to a Dask array returns a Dask array-containingDataArray
as expected:Calling
.compute()
on this result produces the expected result:The
.reduce()
method immediately executes all of the previously queued computations leading up to the new reduce method before even calling the supplied function.Expected Output
A Dask array when
.reduce(func)
isn't followed up by.compute()
.Problem Description
When using Dask distributed, the computational graph you are constructing is immediately executed if you call
.reduce()
instead of adding that function as another node in the DAG. This graph execution happens before the function you pass to reduce is called.Output of
xr.show_versions()
xarray: 0.12.3
pandas: 0.25.0
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.0.1
pip: 19.2.1
conda: None
pytest: 5.0.1
IPython: 7.6.1
sphinx: 2.1.2
The text was updated successfully, but these errors were encountered: