Skip to content

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Aug 12, 2025

Copy link

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks brilliant to me, cheers very much @kmuehlbauer 🍻

@kmuehlbauer
Copy link
Contributor Author

@keewis It would be great if you could have a look on the wording here, too before I get this in. Thanks!

@valeriupredoi
Copy link

valeriupredoi commented Aug 13, 2025

Thanks very much for this @kmuehlbauer and cheers for reviewing @dcherian 🍻

I was talking to @pp-mo about this in pp-mo/ncdata#145 where we are trying to pop in a user warning if any of the Zarr-loaded Xarray variables have realized data ie Numpy ndarrays, and it's actually rather difficult for me to have a robust warning that doesn't emit false alarms or sits cushy on true positives, so I thought I'd ask you if you'd be willing to pop such a warning in Xarray. I can most definitely open a new issue on this, but I thought I'd first touch base with you - something a la:

import warnings


def _raise_warning(var):
    """Raise a warnings.warning if variable data not lazy."""
    warn_msg = (
        f"Variable {var.name}{var.dims} has fully realized "
        "data, if you need lazy data, then add "
        "chunks={} or chunks="auto" as argument to Xarray open_dataset."
    )
    warnings.warn(warn_msg, UserWarning, stacklevel=2)


in func: or class:
            if isinstance(var.data, np.ndarray):

@dcherian
Copy link
Contributor

if any of the Zarr-loaded Xarray variables have realized data ie Numpy ndarrays

This only happens when we create a Pandas index, I don't think a warning is needed. Certainly it would be good to make our test quite better about checking these things

@kmuehlbauer kmuehlbauer enabled auto-merge (squash) August 13, 2025 14:10
@kmuehlbauer kmuehlbauer merged commit b96d607 into pydata:main Aug 13, 2025
36 checks passed
@kmuehlbauer kmuehlbauer deleted the lazy-eager branch August 13, 2025 14:24
@valeriupredoi
Copy link

if any of the Zarr-loaded Xarray variables have realized data ie Numpy ndarrays

This only happens when we create a Pandas index, I don't think a warning is needed. Certainly it would be good to make our test quite better about checking these things

well, it happens with chunks=None, regardless of the input data (well, Zarr, in my used case), and if some, like myself above, miss that, they'll end up with a whole lotta Numpys 😀 Hence me wondering if you folks think a warning would be useful.

Hopefully with this in now, no more numpties like I was 🍺

dcherian added a commit to dhruvak001/xarray that referenced this pull request Aug 24, 2025
* main: (46 commits)
  use the new syntax of ignoring bots (pydata#10668)
  modification methods on `Coordinates` (pydata#10318)
  Silence warnings from test_tutorial.py (pydata#10661)
  test: update write_empty test for zarr 3.1.2 (pydata#10665)
  Bump actions/checkout from 4 to 5 in the actions group (pydata#10652)
  Add load_datatree function (pydata#10649)
  Support compute=False from DataTree.to_netcdf (pydata#10625)
  Fix typos (pydata#10655)
  In case of misconfiguration of dataset.encoding `unlimited_dims` warn instead of raise (pydata#10648)
  fix ``auto_complex`` for ``open_datatree`` (pydata#10632)
  Fix bug indexing with boolean scalars (pydata#10635)
  Improve DataTree typing (pydata#10644)
  Update Cartopy and Iris references (pydata#10645)
  Empty release notes (pydata#10642)
  release notes for v2025.08.0 (pydata#10641)
  Fix `ds.merge` to prevent altering original object depending on join value (pydata#10596)
  Add asynchronous load method (pydata#10327)
  Add DataTree.prune() method              … (pydata#10598)
  Avoid refining parent dimensions in NetCDF files (pydata#10623)
  clarify lazy behaviour and eager loading chunks=None in open_*-functions (pydata#10627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WritableCFDataStore realizes variable data when loading with object stored Zarr store
3 participants