Skip to content

Upgrading dask-core and distributed packages to 2023.1.1 breaks tests #7483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 4 tasks
JoelJaeschke opened this issue Jan 29, 2023 · 1 comment · Fixed by #7489
Closed
1 of 4 tasks

Upgrading dask-core and distributed packages to 2023.1.1 breaks tests #7483

JoelJaeschke opened this issue Jan 29, 2023 · 1 comment · Fixed by #7489
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-dask

Comments

@JoelJaeschke
Copy link
Contributor

JoelJaeschke commented Jan 29, 2023

What happened?

Creating a fresh testing environment and running the tests causes those in test_distributed.py to fail.

What did you expect to happen?

All tests should pass.

Minimal Complete Verifiable Example

Environment was setup as follows (according to contribution guidelines)

// Env setup
conda create -c conda-forge -n xarray-tests python=3.10
conda env update -f ci/requirements/environment.yml
conda activate xarray-tests
pip install -e .

// Running tests
pytest xarray/tests/test_distributed.py

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

This can be remedied when manually downgrading dask-core and distributed packages from 2023.1.1 to 2023.1.0. Fixing their versions should fix this error until the underlying reason can be figured out.

Environment

INSTALLED VERSIONS ------------------ commit: d385e20 python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 6.1.6-200.fc37.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2023.1.1.dev14+gd385e206
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.0
netCDF4: 1.6.0
pydap: installed
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.3.4
cfgrib: 0.9.10.3
iris: 3.4.0
bottleneck: 1.3.6
dask: 2023.1.1
distributed: 2023.1.1
matplotlib: 3.6.3
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: 0.2.2
fsspec: 2023.1.0
cupy: None
pint: 0.20.1
sparse: 0.13.0
flox: 0.6.7
numpy_groupies: 0.9.20
setuptools: 66.1.1
pip: 22.3.1
conda: 22.11.1
pytest: 7.2.1
mypy: None
IPython: None
sphinx: None

@JoelJaeschke JoelJaeschke added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 29, 2023
@dcherian
Copy link
Contributor

@jrbourbeau This seems to have fallen out of dask/distributed#7482 . What do you suggest we do?

Our detection of scheduler type is now failing

def _get_scheduler(get=None, collection=None) -> str | None:
"""Determine the dask scheduler that is being used.
None is returned if no dask scheduler is active.
See Also
--------
dask.base.get_scheduler
"""
try:
# Fix for bug caused by dask installation that doesn't involve the toolz library
# Issue: 4164
import dask
from dask.base import get_scheduler # noqa: F401
actual_get = get_scheduler(get, collection)
except ImportError:
return None
try:
from dask.distributed import Client
if isinstance(actual_get.__self__, Client):
return "distributed"
except (ImportError, AttributeError):
pass
try:
# As of dask=2.6, dask.multiprocessing requires cloudpickle to be installed
# Dependency removed in https://github.com/dask/dask/pull/5511
if actual_get is dask.multiprocessing.get:
return "multiprocessing"
except AttributeError:
pass
return "threaded"

which means we don't raise the error here, and so the test fails.

xarray/xarray/backends/api.py

Lines 1181 to 1189 in d385e20

scheduler = _get_scheduler()
have_chunks = any(v.chunks is not None for v in dataset.variables.values())
autoclose = have_chunks and scheduler in ["distributed", "multiprocessing"]
if autoclose and engine == "scipy":
raise NotImplementedError(
f"Writing netCDF files with the {engine} backend "
f"is not currently supported with dask's {scheduler} scheduler"
)

@dcherian dcherian added topic-dask needs triage Issue that has not been reviewed by xarray team member and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-dask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants