Skip to content

Check with Xarray #904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mrocklin opened this issue Feb 28, 2024 · 14 comments · Fixed by dask/dask#11003
Closed

Check with Xarray #904

mrocklin opened this issue Feb 28, 2024 · 14 comments · Fixed by dask/dask#11003

Comments

@mrocklin
Copy link
Member

We'd like to make sure that Xarray works with the new version of dask dataframe

I took a quick look at the codebase, and it looks like they're using dd.from_dask_array, so everything should be ok. It might be worth checking with them explicitly though. cc @dcherian

@dcherian
Copy link

Our tests are failing since the env doesn't contain dask-expr

 ../../../micromamba/envs/xarray-tests/lib/python3.12/site-packages/dask/dataframe/__init__.py:22: in _dask_expr_enabled
    import dask_expr  # noqa: F401
E   ModuleNotFoundError: No module named 'dask_expr'

During handling of the above exception, another exception occurred:
xarray/tests/test_dask.py:33: in <module>
    dd = pytest.importorskip("dask.dataframe")
../../../micromamba/envs/xarray-tests/lib/python3.12/site-packages/dask/dataframe/__init__.py:87: in <module>
    if _dask_expr_enabled():
../../../micromamba/envs/xarray-tests/lib/python3.12/site-packages/dask/dataframe/__init__.py:24: in _dask_expr_enabled
    raise ValueError("Must install dask-expr to activate query planning.")
E   ValueError: Must install dask-expr to activate query planning.

It looks like the _dask_expr_enabled() function doesn't catch an ImportError

@mrocklin
Copy link
Member Author

I wonder if, for the short term, we should make this a warning and fall back to the legacy implementation.

cc @fjetter @phofl

@dcherian
Copy link

Without that you'd have to make dask-expr a required dependency (which would be fine by us :) )

@mrocklin
Copy link
Member Author

We have actually. You need to pip install dask[dataframe] or pip install dask[complete]. My guess is that xarray is depending only on dask[array] which doesn't even strictly require pandas.

@jrbourbeau
Copy link
Member

dask-expr is a hard dependency for dask.dataframe. It looks like you're installing dask-core in CI (doesn't have dependencies like numpy / pandas / etc). Does changing that to dask instead of dask-core work for Xarray? That would pull in dask-expr automatically

@phofl
Copy link
Collaborator

phofl commented Mar 13, 2024

I wonder if, for the short term, we should make this a warning and fall back to the legacy implementation.

I don't think that this is a good idea if we want people to actually use it

@mrocklin
Copy link
Member Author

We would warn and then gracefully fall back to old behavior.

That seems better to me than failing hard.

@phofl
Copy link
Collaborator

phofl commented Mar 13, 2024

People mostly ignored the deprecation warning even though it was very noisy, I don't think that this would be any different.

@dcherian
Copy link

dcherian commented Mar 13, 2024

Ah yes, we missed adding dask-expr to one of our environments. (we are installing dask-core from conda)

@mrocklin
Copy link
Member Author

People mostly ignored the deprecation warning even though it was very noisy, I don't think that this would be any different.

Well, like in that case it was pretty disruptive and people were sad. We've learned that people don't like being disrupted. I'm inclined to give them the warning information for now but also gracefully fall back. I don't want to force people's code to break in order to get them to upgrade; at least not yet.

@dcherian
Copy link

We had another subtle issue from dask-expr setting a pandas global option mode.copy_on_write: pydata/xarray#8829 (comment)

@fjetter
Copy link
Member

fjetter commented Mar 14, 2024

We had another subtle issue from dask-expr setting a pandas global option mode.copy_on_write: pydata/xarray#8829 (comment)

This is a known issue dask/dask#10996

@dcherian
Copy link

More failures :) See https://github.com/pydata/xarray/actions/runs/8288508222/job/22683171250?pr=8790

_____________________ TestDataArray.test_to_dask_dataframe _____________________
[gw1] linux -- Python 3.9.18 /home/runner/micromamba/envs/xarray-tests/bin/python

self = (Concat(frames=[FromGraph(58f3d2e), FromGraph(84f3e3e), FromGraph(b57bdfe)], axis=1))['foo']
key = 'values'

    def __getattr__(self, key):
        try:
>           return object.__getattribute__(self, key)
E           AttributeError: 'Projection' object has no attribute 'values'

/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_core.py:446: AttributeError

During handling of the above exception, another exception occurred:

self = Dask Series Structure:
npartitions=1
0     int64
11      ...
Dask Name: getitem, 5 expressions
Expr=(Concat(frames=[FromGraph(58f3d2e), FromGraph(84f3e3e), FromGraph(b57bdfe)], axis=1))['foo']
key = 'values'

    def __getattr__(self, key):
        try:
            # Prioritize `FrameBase` attributes
            return object.__getattribute__(self, key)
        except AttributeError as err:
            try:
                # Fall back to `expr` API
                # (Making sure to convert to/from Expr)
>               val = getattr(self.expr, key)

/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_collection.py:511: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = (Concat(frames=[FromGraph(58f3d2e), FromGraph(84f3e3e), FromGraph(b57bdfe)], axis=1))['foo']
key = 'values'

    def __getattr__(self, key):
        try:
            return object.__getattribute__(self, key)
        except AttributeError as err:
            if key.startswith("_meta"):
                # Avoid a recursive loop if/when `self._meta*`
                # produces an `AttributeError`
                raise RuntimeError(
                    f"Failed to generate metadata for {self}. "
                    "This operation may not be supported by the current backend."
                )
    
            # Allow operands to be accessed as attributes
            # as long as the keys are not already reserved
            # by existing methods/properties
            _parameters = type(self)._parameters
            if key in _parameters:
                idx = _parameters.index(key)
                return self.operands[idx]
            if is_dataframe_like(self._meta) and key in self._meta.columns:
                return self[key]
    
            link = "https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage"
>           raise AttributeError(
                f"{err}\n\n"
                "This often means that you are attempting to use an unsupported "
                f"API function. Current API coverage is documented here: {link}."
            )
E           AttributeError: 'Projection' object has no attribute 'values'
E           
E           This often means that you are attempting to use an unsupported API function. Current API coverage is documented here: https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage.

/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_core.py:467: AttributeError

During handling of the above exception, another exception occurred:

self = <xarray.tests.test_dataarray.TestDataArray object at 0x7187713d36a0>

    @requires_dask_expr
    @requires_dask
    def test_to_dask_dataframe(self) -> None:
        arr_np = np.arange(3 * 4).reshape(3, 4)
        arr = DataArray(arr_np, [("B", [1, 2, 3]), ("A", list("cdef"))], name="foo")
        expected = arr.to_series()
        actual = arr.to_dask_dataframe()["foo"]
    
>       assert_array_equal(actual.values, expected.values)

/home/runner/work/xarray/xarray/xarray/tests/test_dataarray.py:3429: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_collection.py:517: in __getattr__
    raise err
/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_collection.py:506: in __getattr__
    return object.__getattribute__(self, key)
/home/runner/micromamba/envs/xarray-tests/lib/python3.9/site-packages/dask_expr/_collection.py:1295: in values
    return self.to_dask_array()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = Dask Series Structure:
npartitions=1
0     int64
11      ...
Dask Name: getitem, 5 expressions
Expr=(Concat(frames=[FromGraph(58f3d2e), FromGraph(84f3e3e), FromGraph(b57bdfe)], axis=1))['foo']
lengths = None, meta = None, optimize = True, optimize_kwargs = {}

    def to_dask_array(
        self, lengths=None, meta=None, optimize: bool = True, **optimize_kwargs
    ) -> Array:
        """Convert a dask DataFrame to a dask array.
    
        Parameters
        ----------
        lengths : bool or Sequence of ints, optional
            How to determine the chunks sizes for the output array.
            By default, the output array will have unknown chunk lengths
            along the first axis, which can cause some later operations
            to fail.
    
            * True : immediately compute the length of each partition
            * Sequence : a sequence of integers to use for the chunk sizes
              on the first axis. These values are *not* validated for
              correctness, beyond ensuring that the number of items
              matches the number of partitions.
        meta : object, optional
            An optional `meta` parameter can be passed for dask to override the
            default metadata on the underlying dask array.
        optimize : bool
            Whether to optimize the expression before converting to an Array.
    
        Returns
        -------
        A Dask Array
        """
>       return self.to_dask_dataframe(optimize, **optimize_kwargs).to_dask_array(
            lengths=lengths, meta=meta
        )
E       AttributeError: 'Scalar' object has no attribute 'to_dask_array'

@phofl
Copy link
Collaborator

phofl commented Mar 15, 2024

@dcherian put up a pr to fix this test: #981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants