From ae71437b5517b94be93dc1707479e173b67c7861 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 11:15:35 -0400 Subject: [PATCH 01/30] remove too-long underline --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 450daf3f06d..d8c4cd63d3d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -1,7 +1,7 @@ .. _hierarchical-data: Hierarchical data -============================== +================= .. ipython:: python :suppress: From 928767a4f78137bfe65fbf7819bcc14089f6df61 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 13:25:13 -0400 Subject: [PATCH 02/30] draft section on data alignment --- doc/user-guide/hierarchical-data.rst | 67 ++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d8c4cd63d3d..0491ed85477 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -644,3 +644,70 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power + +Alignment and Coordinate Inheritance +------------------------------------ + +Data Alignment +~~~~~~~~~~~~~~ + +The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) "vertically", i.e. aligned with those in their parent nodes. + +.. note:: + If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! + In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. + This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). + +To demonstrate, let's first generate some example datasets which are not aligned with one another + +.. ipython:: python + + da_daily = ds.resample(time="d").mean("time") + da_weekly = ds.resample(time="w").mean("time") + da_monthly = ds.resample(time="ME").mean("time") + +These datasets have different lengths along the time dimension, and are therefore not aligned along that dimension. + +.. ipython:: python + + da.daily.sizes + da.weekly.sizes + da.monthly.sizes + +Whilst we cannot store these non-alignable variables on a single `Dataset` object (DEMONSTRATE THIS?), this could be a good use for `DataTree`! + +However, if we try to create a `DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. + +.. ipython:: python + :okexcept: + + DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) + +This is because DataTree checks alignment up through the tree, all the way to the root. + +.. note:: + This is similar to netCDF's concept of inherited dimensions. + +To represent this unalignable data in a single `DataTree`, we must place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. + +.. ipython:: python + + dt = DataTree.from_dict( + {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly} + ) + dt + +Now we have a valid `DataTree` structure which contains the data at different time frequencies. + +We is a useful way to organise our data because we can still operate on all the groups at once. +For example we can extract all three timeseries at a specific lat-lon location + +.. ipython:: python + + dt.sel(lat=75, lon=300) + +or find how the standard deviation of these timeseries has been affected by the averaging we did + +.. ipython:: python + + dt.std(dim="time") From 1adb94523f70181efe3dafa75c150cd78a2c4dc9 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 14:24:14 -0400 Subject: [PATCH 03/30] fixes --- doc/user-guide/hierarchical-data.rst | 39 +++++++++++++++------------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 0491ed85477..173324fb543 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -651,62 +651,65 @@ Alignment and Coordinate Inheritance Data Alignment ~~~~~~~~~~~~~~ -The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) "vertically", i.e. aligned with those in their parent nodes. +The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) with those in their parent nodes. .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). -To demonstrate, let's first generate some example datasets which are not aligned with one another +To demonstrate, let's first generate some example datasets which are not aligned with one another: .. ipython:: python - da_daily = ds.resample(time="d").mean("time") - da_weekly = ds.resample(time="w").mean("time") - da_monthly = ds.resample(time="ME").mean("time") + # (drop the attributes just to make the printed representation shorter) + ds = xr.tutorial.open_dataset("air_temperature").drop_attrs() -These datasets have different lengths along the time dimension, and are therefore not aligned along that dimension. + ds_daily = ds.resample(time="D").mean("time") + ds_weekly = ds.resample(time="W").mean("time") + ds_monthly = ds.resample(time="ME").mean("time") + +These datasets have different lengths along the ``time`` dimension, and are therefore not aligned along that dimension. .. ipython:: python - da.daily.sizes - da.weekly.sizes - da.monthly.sizes + ds_daily.sizes + ds_weekly.sizes + ds_monthly.sizes -Whilst we cannot store these non-alignable variables on a single `Dataset` object (DEMONSTRATE THIS?), this could be a good use for `DataTree`! +Whilst we cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object (DEMONSTRATE THIS?), this could be a good use for :py:class:`~xarray.DataTree`! -However, if we try to create a `DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. +However, if we try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. .. ipython:: python :okexcept: - DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) + xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) This is because DataTree checks alignment up through the tree, all the way to the root. .. note:: This is similar to netCDF's concept of inherited dimensions. -To represent this unalignable data in a single `DataTree`, we must place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent this unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. .. ipython:: python - dt = DataTree.from_dict( + dt = xr.DataTree.from_dict( {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly} ) dt -Now we have a valid `DataTree` structure which contains the data at different time frequencies. +Now we have a valid :py:class:`~xarray.DataTree` structure which contains the data at different time frequencies. -We is a useful way to organise our data because we can still operate on all the groups at once. -For example we can extract all three timeseries at a specific lat-lon location +This is a useful way to organise our data because we can still operate on all the groups at once. +For example we can extract all three timeseries at a specific lat-lon location: .. ipython:: python dt.sel(lat=75, lon=300) -or find how the standard deviation of these timeseries has been affected by the averaging we did +or compute the standard deviation of each timeseries to find out how it varies with sampling frequency: .. ipython:: python From ae1bcfd8044bf693dc3f94b4badc612a84e8ce81 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 14:24:33 -0400 Subject: [PATCH 04/30] draft section on coordinate inheritance --- doc/user-guide/hierarchical-data.rst | 51 ++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 173324fb543..d87618c667a 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -714,3 +714,54 @@ or compute the standard deviation of each timeseries to find out how it varies w .. ipython:: python dt.std(dim="time") + +Coordinate Inheritance +~~~~~~~~~~~~~~~~~~~~~~ + +Notice that in the tree we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical in each group. +We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. + +.. note:: + This is also a new feature relative to the prototype `xarray-contrib/datatree `_ package. + +.. ipython:: python + + dt = xr.DataTree.from_dict( + { + "/": ds.drop_dims("time"), + "daily": ds_daily.drop_vars(["lat", "lon"]), + "weekly": ds_weekly.drop_vars(["lat", "lon"]), + "monthly": ds_monthly.drop_vars(["lat", "lon"]), + } + ) + dt + +We say that the `lat` and `lon` coordinates in the child groups have been "inherited" from their common parent group. + +This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. +Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. + +We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups: + +.. ipython:: python + + dt.daily.coords + dt["daily/lat"] + +(TODO: the repr of ``dt.coords`` should display which coordinates are inherited) + +If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: + +.. ipython:: python + + dt["/daily"] + +We can also still perform all the same operations on the whole tree: + +.. ipython:: python + + dt.sel(lat=75, lon=300) + + dt.std(dim="time") + +EXPLAIN DEDUPLICATION? From f025371d0924b5648b489153fae9e0605d112535 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:10:00 -0400 Subject: [PATCH 05/30] various improvements --- doc/user-guide/hierarchical-data.rst | 52 +++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d87618c667a..1127fb008aa 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -645,9 +645,13 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power +.. _alignment and coordinate inheritance: + Alignment and Coordinate Inheritance ------------------------------------ +.. _data alignment: + Data Alignment ~~~~~~~~~~~~~~ @@ -656,7 +660,7 @@ The data in different datatree nodes are not totally independent. In particular .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. - This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). + This allows us to provide features like :ref:`coordinate inheritance`. See the migration guide for more details on the differences (LINK). To demonstrate, let's first generate some example datasets which are not aligned with one another: @@ -677,21 +681,35 @@ These datasets have different lengths along the ``time`` dimension, and are ther ds_weekly.sizes ds_monthly.sizes -Whilst we cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object (DEMONSTRATE THIS?), this could be a good use for :py:class:`~xarray.DataTree`! +We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align: + +.. ipython:: python + :okexcept: + + xr.align(ds_daily, ds_weekly, join="exact") -However, if we try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. +But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? +If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: .. ipython:: python :okexcept: xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) -This is because DataTree checks alignment up through the tree, all the way to the root. +(TODO: Looks like this error message could be improved by including information about which sizes are not equal.) + +This is because DataTree checks that data in child nodes align exactly with their parents. .. note:: - This is similar to netCDF's concept of inherited dimensions. + This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). + +This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`xr.align` command succeeds: + +.. code:: + + xr.align(child, *child.parents, join="exact") -To represent this unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. .. ipython:: python @@ -715,6 +733,8 @@ or compute the standard deviation of each timeseries to find out how it varies w dt.std(dim="time") +.. _coordinate inheritance: + Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ @@ -736,7 +756,7 @@ We can use "Coordinate Inheritance" to define them only once in a parent group a ) dt -We say that the `lat` and `lon` coordinates in the child groups have been "inherited" from their common parent group. +(TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. @@ -750,18 +770,34 @@ We can still access the coordinates defined in the parent groups from any of the (TODO: the repr of ``dt.coords`` should display which coordinates are inherited) +As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group. + If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: .. ipython:: python - dt["/daily"] + print(dt["/daily"]) + +This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it. We can also still perform all the same operations on the whole tree: .. ipython:: python + :okexcept: dt.sel(lat=75, lon=300) dt.std(dim="time") +(TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) + +Overriding Inherited Coordinates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. + +EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? + +EXAMPLE OF INHERITING FROM A GRANDPARENT? + EXPLAIN DEDUPLICATION? From 7549ee917e38cad86f07cb94e1582928fa4f6f18 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:47:32 -0400 Subject: [PATCH 06/30] more improvements --- doc/user-guide/hierarchical-data.rst | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 1127fb008aa..72e74e5f9bd 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -645,12 +645,12 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power -.. _alignment and coordinate inheritance: +.. _alignment-and-coordinate-inheritance: Alignment and Coordinate Inheritance ------------------------------------ -.. _data alignment: +.. _data-alignment: Data Alignment ~~~~~~~~~~~~~~ @@ -660,7 +660,7 @@ The data in different datatree nodes are not totally independent. In particular .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. - This allows us to provide features like :ref:`coordinate inheritance`. See the migration guide for more details on the differences (LINK). + This allows us to provide features like :ref:`coordinate-inheritance`. See the migration guide for more details on the differences (LINK). To demonstrate, let's first generate some example datasets which are not aligned with one another: @@ -703,7 +703,7 @@ This is because DataTree checks that data in child nodes align exactly with thei .. note:: This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). -This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`xr.align` command succeeds: +This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: .. code:: @@ -733,17 +733,19 @@ or compute the standard deviation of each timeseries to find out how it varies w dt.std(dim="time") -.. _coordinate inheritance: +.. _coordinate-inheritance: Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ -Notice that in the tree we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical in each group. +Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. .. note:: This is also a new feature relative to the prototype `xarray-contrib/datatree `_ package. +Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group: + .. ipython:: python dt = xr.DataTree.from_dict( @@ -758,7 +760,7 @@ We can use "Coordinate Inheritance" to define them only once in a parent group a (TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) -This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. +This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups: @@ -791,6 +793,8 @@ We can also still perform all the same operations on the whole tree: (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) +.. _overriding-inherited-coordinates: + Overriding Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From b6316971e136d19376878e8e5c09a1927cbb4688 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:47:46 -0400 Subject: [PATCH 07/30] link from other page --- doc/user-guide/data-structures.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index b5e83789806..eb04500f22d 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -800,6 +800,7 @@ included by default unless you exclude them with the ``inherited`` flag: dt2["/weather/temperature"].to_dataset(inherited=False) +For more examples and further discussion see LINK .. _coordinates: From 02bf96b2108def56613dc84b5fb7057a519b8810 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:52:05 -0400 Subject: [PATCH 08/30] align call include all 3 datasets --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 72e74e5f9bd..ef90fe96b2c 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -686,7 +686,7 @@ We cannot store these non-alignable variables on a single :py:class:`~xarray.Dat .. ipython:: python :okexcept: - xr.align(ds_daily, ds_weekly, join="exact") + xr.align(ds_daily, ds_weekly, ds_monthly, join="exact") But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: From 152d74a8e0f7651a9419be551b98109d187f3610 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:55:46 -0400 Subject: [PATCH 09/30] link back to use cases --- doc/user-guide/hierarchical-data.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index ef90fe96b2c..ce96b99c8bf 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -15,6 +15,8 @@ Hierarchical data %xmode minimal +.. _why: + Why Hierarchical Data? ---------------------- @@ -688,7 +690,7 @@ We cannot store these non-alignable variables on a single :py:class:`~xarray.Dat xr.align(ds_daily, ds_weekly, ds_monthly, join="exact") -But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? +But we :ref:`previously said ` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: .. ipython:: python From 57b7f062c7febe42a2c96abf962d1a86127bee7f Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:56:26 -0400 Subject: [PATCH 10/30] clarification --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index ce96b99c8bf..3eda99f6669 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -711,7 +711,7 @@ This alignment check is performed up through the tree, all the way to the root, xr.align(child, *child.parents, join="exact") -To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. .. ipython:: python From d3ac1a7b8a30e9c269383904b7e7d838397fe79b Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 16:07:02 -0400 Subject: [PATCH 11/30] small improvements --- doc/user-guide/hierarchical-data.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 3eda99f6669..90cb286ddca 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -720,7 +720,7 @@ To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we m ) dt -Now we have a valid :py:class:`~xarray.DataTree` structure which contains the data at different time frequencies. +Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group. This is a useful way to organise our data because we can still operate on all the groups at once. For example we can extract all three timeseries at a specific lat-lon location: @@ -741,6 +741,7 @@ Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. + We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. .. note:: From d73dd8ab2f605618ca1d6e6c7923d9a86a6cd647 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 10:50:09 -0400 Subject: [PATCH 12/30] remove TODO after #9532 --- doc/user-guide/hierarchical-data.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 90cb286ddca..d6e3c30fff6 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -761,8 +761,6 @@ Let's instead place only the time-dependent variables in the child groups, and p ) dt -(TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) - This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. From d779e22d75d207470f727fa631e1298369960c44 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 10:52:24 -0400 Subject: [PATCH 13/30] add todo about #9475 --- doc/user-guide/hierarchical-data.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d6e3c30fff6..079f3381431 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -792,6 +792,8 @@ We can also still perform all the same operations on the whole tree: dt.std(dim="time") +(TODO: The first one repeats coordinates in the result due to https://github.com/pydata/xarray/issues/9475) + (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) .. _overriding-inherited-coordinates: From 3c9ad5519877365132d5b87361395027b3c183d5 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 11:04:12 -0400 Subject: [PATCH 14/30] correct xr.align example call --- doc/user-guide/hierarchical-data.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 079f3381431..86717fe965d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -707,9 +707,9 @@ This is because DataTree checks that data in child nodes align exactly with thei This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: -.. code:: +.. code:: python - xr.align(child, *child.parents, join="exact") + xr.align(child.dataset, parent.dataset for parent in child.parents, join="exact") To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. From 4cee745be81645304215b4958ce2959ebceb51a3 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 11:20:26 -0400 Subject: [PATCH 15/30] add links to netCDF4 documentation --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 48d2bb1edae..a1f0f578381 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -703,7 +703,7 @@ If we first try to create a :py:class:`~xarray.DataTree` with these different-le This is because DataTree checks that data in child nodes align exactly with their parents. .. note:: - This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). + This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions `_, as in netCDF-4 files dimensions are `visible to all child groups `_. This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: From 4c030d84c84d70bfa309355d6e76d1d2c2c98a65 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 23 Sep 2024 09:22:14 -0600 Subject: [PATCH 16/30] Consistent voice Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index a1f0f578381..f042fa00e75 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -801,7 +801,7 @@ We can also still perform all the same operations on the whole tree: Overriding Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. +We can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? From 6db4a0b74a54f468557d58e456842c3914d28c18 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 6 Oct 2024 12:32:37 -0400 Subject: [PATCH 17/30] keep indexes in lat lon selection to dodge #9475 --- doc/user-guide/hierarchical-data.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index a1f0f578381..5f11e1e762e 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -788,12 +788,10 @@ We can also still perform all the same operations on the whole tree: .. ipython:: python :okexcept: - dt.sel(lat=75, lon=300) + dt.sel(lat=[75], lon=[300]) dt.std(dim="time") -(TODO: The first one repeats coordinates in the result due to https://github.com/pydata/xarray/issues/9475) - (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) .. _overriding-inherited-coordinates: From e879dbb9d5522d2a7e46ca9d925bd1c3a3271096 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Sun, 6 Oct 2024 10:36:54 -0600 Subject: [PATCH 18/30] unpack generator properly Co-authored-by: Stephan Hoyer --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 0716f3cd941..f130bc06397 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -709,7 +709,7 @@ This alignment check is performed up through the tree, all the way to the root, .. code:: python - xr.align(child.dataset, parent.dataset for parent in child.parents, join="exact") + xr.align(child.dataset, *(parent.dataset for parent in child.parents), join="exact") To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. From 118e8029d6e0cad3e7904819be250093bc520a2a Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Wed, 9 Oct 2024 22:12:35 -0400 Subject: [PATCH 19/30] ideas for next section --- doc/user-guide/hierarchical-data.rst | 66 +++++++++++++++++++++++----- 1 file changed, 54 insertions(+), 12 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index f130bc06397..65385d4895c 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -740,7 +740,7 @@ or compute the standard deviation of each timeseries to find out how it varies w Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ -Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. +Notice that in the trees we constructed above there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. @@ -751,15 +751,15 @@ Let's instead place only the time-dependent variables in the child groups, and p .. ipython:: python - dt = xr.DataTree.from_dict( - { - "/": ds.drop_dims("time"), - "daily": ds_daily.drop_vars(["lat", "lon"]), - "weekly": ds_weekly.drop_vars(["lat", "lon"]), - "monthly": ds_monthly.drop_vars(["lat", "lon"]), - } - ) - dt +dt = xr.DataTree.from_dict( + { + "/": ds.drop_dims("time"), + "daily": ds_daily.drop_vars(["lat", "lon"]), + "weekly": ds_weekly.drop_vars(["lat", "lon"]), + "monthly": ds_monthly.drop_vars(["lat", "lon"]), + } +) +dt This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. @@ -799,9 +799,51 @@ We can also still perform all the same operations on the whole tree: Overriding Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -We can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. +We can override inherited coordinates with newly-defined ones on child nodes, as long as those newly-defined coordinates also align with the parent nodes. + +You may want to do this if some child dataset follows the same pattern as its siblings but contains different specific values of a coordinate. + +Let's image we had three different datasets, on two different lat-lon grids: + +.. ipython:: python + +lon1, lat1 = np.meshgrid(np.linspace(-20, 20, 5), np.linspace(0, 30, 4)) +grid1 = xr.Coordinates({'lat': lat1, 'lon': lon1}) + +# a different grid with the same sizes +lon2 = lat1 + (lat1 / 10) +lat2 = lon1 + (lon1 / 10) +grid1 = xr.Coordinates({'lat': lat2, 'lon': lon2}) + + # model_ds1 = xr.Dataset() + # model_ds2 = xr.Dataset() + # model_ds3 = xr.Dataset() + +We could arrange these in a datatree + +.. ipython:: python + + dt = xr.DataTree.from_dict( + { + "/": Dataset(coords=grid1), + "model1": None, + "model2": None, + "model3": Dataset(coords=grid2), + } + ) + dt + +However in this you could also have reprented this without overriding, by using an extra layer of nodes in the tree: + + +.. _overriding-inherited-coordinates: + +Non-Inherited Coordinates +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Only coordinates which are backed by an index are inherited. This is because index-backed coordinates can be efficiently de-duplicated between parents and children, preventing ambiguity + -EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? EXAMPLE OF INHERITING FROM A GRANDPARENT? From b245bdd220e0bdfb75b8771b56dba5f0bdc4ce3e Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:03:49 -0400 Subject: [PATCH 20/30] briefly summarize what alignment means --- doc/user-guide/hierarchical-data.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 65385d4895c..500ebea3ebf 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -657,7 +657,8 @@ Alignment and Coordinate Inheritance Data Alignment ~~~~~~~~~~~~~~ -The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) with those in their parent nodes. +The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be exactly aligned with those in their parent nodes. +Exact aligment means that shared dimensions must be the same length, and indexes along those dimensions must be equal. .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! @@ -819,7 +820,7 @@ grid1 = xr.Coordinates({'lat': lat2, 'lon': lon2}) # model_ds2 = xr.Dataset() # model_ds3 = xr.Dataset() -We could arrange these in a datatree +We could arrange these in a datatree .. ipython:: python @@ -841,7 +842,7 @@ However in this you could also have reprented this without overriding, by using Non-Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~ -Only coordinates which are backed by an index are inherited. This is because index-backed coordinates can be efficiently de-duplicated between parents and children, preventing ambiguity +Only coordinates which are backed by an index are inherited. This is because index-backed coordinates can be efficiently de-duplicated between parents and children, preventing ambiguity From 9b8fc9bfbd8a27f60cc61ebdf074e48624c19cbf Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:05:50 -0400 Subject: [PATCH 21/30] clarify that it's the data in each node that was previously unrelated --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 500ebea3ebf..1e576b519d0 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -662,7 +662,7 @@ Exact aligment means that shared dimensions must be the same length, and indexes .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! - In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. + In that package the data model was that the data stored in each node actually was completely unrelated. The data model is now slightly stricter. This allows us to provide features like :ref:`coordinate-inheritance`. See the migration guide for more details on the differences (LINK). To demonstrate, let's first generate some example datasets which are not aligned with one another: From b6385ceb3672c97f9198cf18c6eacd2f90f25b7a Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:07:15 -0400 Subject: [PATCH 22/30] fix incorrect indentation of code block --- doc/user-guide/hierarchical-data.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 1e576b519d0..753ea2bf5dc 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -752,15 +752,15 @@ Let's instead place only the time-dependent variables in the child groups, and p .. ipython:: python -dt = xr.DataTree.from_dict( - { - "/": ds.drop_dims("time"), - "daily": ds_daily.drop_vars(["lat", "lon"]), - "weekly": ds_weekly.drop_vars(["lat", "lon"]), - "monthly": ds_monthly.drop_vars(["lat", "lon"]), - } -) -dt + dt = xr.DataTree.from_dict( + { + "/": ds.drop_dims("time"), + "daily": ds_daily.drop_vars(["lat", "lon"]), + "weekly": ds_weekly.drop_vars(["lat", "lon"]), + "monthly": ds_monthly.drop_vars(["lat", "lon"]), + } + ) + dt This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. From d2918bbb14caae3ec435a6bf7daee2449d70df3a Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:08:04 -0400 Subject: [PATCH 23/30] display the tree with redundant coordinates again --- doc/user-guide/hierarchical-data.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 753ea2bf5dc..f28699f1e76 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -743,6 +743,10 @@ Coordinate Inheritance Notice that in the trees we constructed above there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. +.. ipython:: python + + dt + We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. .. note:: From 6cab6f86b61a6b63c8acadda89a6c6cb018d3312 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:10:12 -0400 Subject: [PATCH 24/30] remove content about non-inherited coords for a follow-up PR --- doc/user-guide/hierarchical-data.rst | 55 ---------------------------- 1 file changed, 55 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index f28699f1e76..81f026db818 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -798,58 +798,3 @@ We can also still perform all the same operations on the whole tree: dt.std(dim="time") (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) - -.. _overriding-inherited-coordinates: - -Overriding Inherited Coordinates -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We can override inherited coordinates with newly-defined ones on child nodes, as long as those newly-defined coordinates also align with the parent nodes. - -You may want to do this if some child dataset follows the same pattern as its siblings but contains different specific values of a coordinate. - -Let's image we had three different datasets, on two different lat-lon grids: - -.. ipython:: python - -lon1, lat1 = np.meshgrid(np.linspace(-20, 20, 5), np.linspace(0, 30, 4)) -grid1 = xr.Coordinates({'lat': lat1, 'lon': lon1}) - -# a different grid with the same sizes -lon2 = lat1 + (lat1 / 10) -lat2 = lon1 + (lon1 / 10) -grid1 = xr.Coordinates({'lat': lat2, 'lon': lon2}) - - # model_ds1 = xr.Dataset() - # model_ds2 = xr.Dataset() - # model_ds3 = xr.Dataset() - -We could arrange these in a datatree - -.. ipython:: python - - dt = xr.DataTree.from_dict( - { - "/": Dataset(coords=grid1), - "model1": None, - "model2": None, - "model3": Dataset(coords=grid2), - } - ) - dt - -However in this you could also have reprented this without overriding, by using an extra layer of nodes in the tree: - - -.. _overriding-inherited-coordinates: - -Non-Inherited Coordinates -~~~~~~~~~~~~~~~~~~~~~~~~~ - -Only coordinates which are backed by an index are inherited. This is because index-backed coordinates can be efficiently de-duplicated between parents and children, preventing ambiguity - - - -EXAMPLE OF INHERITING FROM A GRANDPARENT? - -EXPLAIN DEDUPLICATION? From af5c6b7ccb80837ab0dbdf5d2e081177df13b7bb Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sat, 12 Oct 2024 18:10:57 -0400 Subject: [PATCH 25/30] remove todo --- doc/user-guide/hierarchical-data.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 81f026db818..fcd0a1cec23 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -776,8 +776,6 @@ We can still access the coordinates defined in the parent groups from any of the dt.daily.coords dt["daily/lat"] -(TODO: the repr of ``dt.coords`` should display which coordinates are inherited) - As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group. If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: From d49c2de530bc869c12592c1b592d69e1d69c9adb Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 13 Oct 2024 10:27:52 -0400 Subject: [PATCH 26/30] remove todo now that aggregations are re-implemented --- doc/user-guide/hierarchical-data.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index fcd0a1cec23..7f7fcb4ee59 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -789,10 +789,7 @@ This helps to differentiate which variables are defined on the datatree node tha We can also still perform all the same operations on the whole tree: .. ipython:: python - :okexcept: dt.sel(lat=[75], lon=[300]) dt.std(dim="time") - -(TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) From 44bcf6cb3937d758686b85619f0646637a692e32 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 13 Oct 2024 12:03:40 -0400 Subject: [PATCH 27/30] remove link to (unmerged) migration guide --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 7f7fcb4ee59..39690270589 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -663,7 +663,7 @@ Exact aligment means that shared dimensions must be the same length, and indexes .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that the data stored in each node actually was completely unrelated. The data model is now slightly stricter. - This allows us to provide features like :ref:`coordinate-inheritance`. See the migration guide for more details on the differences (LINK). + This allows us to provide features like :ref:`coordinate-inheritance`. To demonstrate, let's first generate some example datasets which are not aligned with one another: From ea994307ae93be0221cc95ca3c3d6cef8ec68104 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 13 Oct 2024 12:04:29 -0400 Subject: [PATCH 28/30] remove todo about improving error message --- doc/user-guide/hierarchical-data.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 39690270589..a9626142fc7 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -699,8 +699,6 @@ If we first try to create a :py:class:`~xarray.DataTree` with these different-le xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) -(TODO: Looks like this error message could be improved by including information about which sizes are not equal.) - This is because DataTree checks that data in child nodes align exactly with their parents. .. note:: From 64bb8ba2fe7dd4e725e43d0e4cb8e124d7d87325 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 13 Oct 2024 12:08:28 -0400 Subject: [PATCH 29/30] correct statement in data-structures docs --- doc/user-guide/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index fe65a17c28f..45057c12a28 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -771,7 +771,7 @@ Here there are four different coordinate variables, which apply to variables in ``station`` is used only for ``weather`` variables ``lat`` and ``lon`` are only use for ``satellite`` images -Coordinate variables are inherited to descendent nodes, which means that +Coordinate variables are inherited to descendent nodes, which is only possible because variables at different levels of a hierarchical DataTree are always aligned. Placing the ``time`` variable at the root node automatically indicates that it applies to all descendent nodes. Similarly, ``station`` is in the base From 82a70a00f7f0fbdec45522abf28d189cad764b6b Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 13 Oct 2024 13:03:04 -0400 Subject: [PATCH 30/30] fix internal link --- doc/user-guide/data-structures.rst | 2 +- doc/user-guide/hierarchical-data.rst | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index 45057c12a28..e5e89b0fbbd 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -800,7 +800,7 @@ included by default unless you exclude them with the ``inherit`` flag: dt2["/weather/temperature"].to_dataset(inherit=False) -For more examples and further discussion see LINK +For more examples and further discussion see :ref:`alignment and coordinate inheritance `. .. _coordinates: diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index a9626142fc7..4b3a7260567 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -1,4 +1,4 @@ -.. _hierarchical-data: +.. _userguide.hierarchical-data: Hierarchical data ================= @@ -647,7 +647,7 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power -.. _alignment-and-coordinate-inheritance: +.. _hierarchical-data.alignment-and-coordinate-inheritance: Alignment and Coordinate Inheritance ------------------------------------