Skip to content

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Aug 13, 2025

Additionally to the fix, this pipes TestNetCDF4Data through open_datatree as regression test on that code-path.

@kmuehlbauer
Copy link
Contributor Author

To test this successfully I added a TestNetCDF4DataTree subclassing from TestNetCDF4Data, and override the open function. This will essentially run all tests from TestNetCDF4Data through open_datatree.

@kmuehlbauer
Copy link
Contributor Author

@TomNicholas One quick question, would this work? Or do you see another way of testing this? Thanks!

To test this successfully I added a TestNetCDF4DataTree subclassing from TestNetCDF4Data, and override the open function. This will essentially run all tests from TestNetCDF4Data through open_datatree.

@kmuehlbauer
Copy link
Contributor Author

The failing test seems to reveal an issue with aligning after the merge of #10623. (cc) @shoyer

Redefinition like below can be written. But can't be loaded again using datatree.

MCVE:

base = xr.Dataset(coords={"x": [1, 2]})
child = xr.Dataset(coords={"x": [1, 2, 3]})
base.to_netcdf("test.nc", mode="w")
child.to_netcdf("test.nc", group="child", mode="a")
ds = xr.open_datatree("test.nc")
Error Traceback
---------------------------------------------------------------------------
AlignmentError                            Traceback (most recent call last)
File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py:168](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py#line=167), in check_alignment(path, node_ds, parent_ds, children)
    167 try:
--> 168     align(node_ds, parent_ds, join="exact", copy=False)
    169 except ValueError as e:

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py:968](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py#line=967), in align(join, copy, indexes, exclude, fill_value, *objects)
    960 aligner = Aligner(
    961     objects,
    962     join=join,
   (...)
    966     fill_value=fill_value,
    967 )
--> 968 aligner.align()
    969 return aligner.results

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py:661](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py#line=660), in Aligner.align(self)
    660 self.align_indexes()
--> 661 self.assert_unindexed_dim_sizes_equal()
    663 if self.join == "override":

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py:523](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/structure/alignment.py#line=522), in Aligner.assert_unindexed_dim_sizes_equal(self)
    522 if len(sizes) > 1:
--> 523     raise AlignmentError(
    524         f"cannot reindex or align along dimension {dim!r} "
    525         f"because of conflicting dimension sizes: {sizes!r}" + add_err_msg
    526     )

AlignmentError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3}

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[51], line 1
----> 1 ds = xr.open_datatree("test.nc")

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/api.py:1220](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/api.py#line=1219), in open_datatree(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, create_default_indexes, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
   1208 decoders = _resolve_decoders_kwargs(
   1209     decode_cf,
   1210     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
   1216     decode_coords=decode_coords,
   1217 )
   1218 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
-> 1220 backend_tree = backend.open_datatree(
   1221     filename_or_obj,
   1222     drop_variables=drop_variables,
   1223     **decoders,
   1224     **kwargs,
   1225 )
   1227 tree = _datatree_from_backend_datatree(
   1228     backend_tree,
   1229     filename_or_obj,
   (...)
   1240     **kwargs,
   1241 )
   1243 return tree

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:738](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=737), in NetCDF4BackendEntrypoint.open_datatree(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, format, clobber, diskless, persist, auto_complex, lock, autoclose, **kwargs)
    698 def open_datatree(
    699     self,
    700     filename_or_obj: T_PathFileOrDataStore,
   (...)
    717     **kwargs,
    718 ) -> DataTree:
    719     groups_dict = self.open_groups_as_dict(
    720         filename_or_obj,
    721         mask_and_scale=mask_and_scale,
   (...)
    735         **kwargs,
    736     )
--> 738     return datatree_from_dict_with_io_cleanup(groups_dict)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/common.py:276](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/backends/common.py#line=275), in datatree_from_dict_with_io_cleanup(groups_dict)
    274 """DataTree.from_dict with file clean-up."""
    275 try:
--> 276     tree = DataTree.from_dict(groups_dict)
    277 except Exception:
    278     for ds in groups_dict.values():

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py:1221](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py#line=1220), in DataTree.from_dict(cls, d, name)
   1219         else:
   1220             raise TypeError(f"invalid values: {data}")
-> 1221         obj._set_item(
   1222             path,
   1223             new_node,
   1224             allow_overwrite=False,
   1225             new_nodes_along_path=True,
   1226         )
   1228 # TODO: figure out why mypy is raising an error here, likely something
   1229 # to do with the return type of Dataset.copy()
   1230 return obj

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py:650](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py#line=649), in TreeNode._set_item(self, path, item, new_nodes_along_path, allow_overwrite)
    648         raise KeyError(f"Already a node object at path {path}")
    649 else:
--> 650     current_node._set(name, item)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py:967](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py#line=966), in DataTree._set(self, key, val)
    965     new_node = val.copy(deep=False)
    966     new_node.name = key
--> 967     new_node._set_parent(new_parent=self, child_name=key)
    968 else:
    969     if not isinstance(val, DataArray | Variable):
    970         # accommodate other types that can be coerced into Variables

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py:115](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py#line=114), in TreeNode._set_parent(self, new_parent, child_name)
    113 self._check_loop(new_parent)
    114 self._detach(old_parent)
--> 115 self._attach(new_parent, child_name)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py:152](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/treenode.py#line=151), in TreeNode._attach(self, parent, child_name)
    147 if child_name is None:
    148     raise ValueError(
    149         "To directly set parent, child needs a name, but child is unnamed"
    150     )
--> 152 self._pre_attach(parent, child_name)
    153 parentchildren = parent._children
    154 assert not any(child is self for child in parentchildren), (
    155     "Tree is corrupt."
    156 )

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py:551](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py#line=550), in DataTree._pre_attach(self, parent, name)
    549 node_ds = self.to_dataset(inherit=False)
    550 parent_ds = parent._to_dataset_view(rebuild_dims=False, inherit=True)
--> 551 check_alignment(path, node_ds, parent_ds, self.children)
    552 _deduplicate_inherited_coordinates(self, parent)

File [/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py:172](http://localhost:8888/lab/tree/home/kai/python/gists/xarray/triage/10636/home/kai/data/mambaforge/envs/xr_312_np2/lib/python3.12/site-packages/xarray/core/datatree.py#line=171), in check_alignment(path, node_ds, parent_ds, children)
    170         node_repr = _indented(_without_header(repr(node_ds)))
    171         parent_repr = _indented(dims_and_coords_repr(parent_ds))
--> 172         raise ValueError(
    173             f"group {path!r} is not aligned with its parents:\n"
    174             f"Group:\n{node_repr}\nFrom parents:\n{parent_repr}"
    175         ) from e
    177 if children:
    178     if parent_ds is not None:

ValueError: group '/child' is not aligned with its parents:
Group:
    Dimensions:  (x: 3)
    Coordinates:
        x        (x) int64 24B ...
    Data variables:
        *empty*
From parents:
    Dimensions:  (x: 2)
    Coordinates:
        x        (x) int64 16B ...

@kmuehlbauer
Copy link
Contributor Author

Nevermind, I was wrongly assuming redefinitions were possible

I've consulted the docs:

The constraint that this puts on a DataTree is that dimensions and indices that are inherited must be aligned with any direct descendant node’s existing dimension or index. This allows descendants to use dimensions defined in ancestor nodes, without duplicating that information. But as a consequence, if a dimension-name is defined in on a node and that same dimension-name exists in one of its ancestors, they must align (have the same index and size).

So all good here, I'll think how to fix or skip that particular test. Suggestions welcome.

@kmuehlbauer
Copy link
Contributor Author

This is ready for review.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kai, looks great!

@kmuehlbauer kmuehlbauer merged commit 5332a86 into pydata:main Aug 18, 2025
37 checks passed
@kmuehlbauer kmuehlbauer deleted the fix-autocomplex branch August 18, 2025 05:52
@kmuehlbauer
Copy link
Contributor Author

Thanks @shoyer for the review!

dcherian added a commit to dhruvak001/xarray that referenced this pull request Aug 24, 2025
* main: (46 commits)
  use the new syntax of ignoring bots (pydata#10668)
  modification methods on `Coordinates` (pydata#10318)
  Silence warnings from test_tutorial.py (pydata#10661)
  test: update write_empty test for zarr 3.1.2 (pydata#10665)
  Bump actions/checkout from 4 to 5 in the actions group (pydata#10652)
  Add load_datatree function (pydata#10649)
  Support compute=False from DataTree.to_netcdf (pydata#10625)
  Fix typos (pydata#10655)
  In case of misconfiguration of dataset.encoding `unlimited_dims` warn instead of raise (pydata#10648)
  fix ``auto_complex`` for ``open_datatree`` (pydata#10632)
  Fix bug indexing with boolean scalars (pydata#10635)
  Improve DataTree typing (pydata#10644)
  Update Cartopy and Iris references (pydata#10645)
  Empty release notes (pydata#10642)
  release notes for v2025.08.0 (pydata#10641)
  Fix `ds.merge` to prevent altering original object depending on join value (pydata#10596)
  Add asynchronous load method (pydata#10327)
  Add DataTree.prune() method              … (pydata#10598)
  Avoid refining parent dimensions in NetCDF files (pydata#10623)
  clarify lazy behaviour and eager loading chunks=None in open_*-functions (pydata#10627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

auto_complex appears to be broken for DataTrees
2 participants