-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
In xarray-contrib/datatree#309 @marcel-goldschen-ohm pointed out some counterintuitive behaviour of the DataTree.name
setter.
# using xarray main
In [2]: from xarray.core.datatree import DataTree
In [3]: root = DataTree.from_dict(
...: {
...: 'child': DataTree(),
...: },
...: name='root',
...: )
In [4]: root['child'].name = 'childish'
In [5]: root
Out[5]:
<xarray.DataTree 'root'>
Group: /
└── Group: /childish
this looks fine, but
In [6]: root.children
Out[6]:
Frozen({'child': <xarray.DataTree 'childish'>
Group: /childish})
which means that
In [14]: root['childish']
KeyError: 'Could not find node at childish'
Okay, so we just fix it so that setting .name
changes the key in the parent node right? But as @etienneschalk pointed out in xarray-contrib/datatree#309 (comment), that wouldn't be consistent with DataArray
and Dataset
names behave: they don't update the key in the wrapping object either. Instead they just completely ignore that you asked to update it at all:
In [15]: ds = xr.Dataset({'a': 0})
In [17]: ds['a'].name = 'b'
In [18]: ds
Out[18]:
<xarray.Dataset> Size: 8B
Dimensions: ()
Data variables:
a int64 8B 0
In [19]: ds.variables
Out[19]:
Frozen({'a': <xarray.Variable ()> Size: 8B
array(0)})
This happens because Dataset
objects do not actually store DataArray
objects, they only store Variables
(which are un-named), and reconstruct a new DataArray
upon __getitem__
calls. So if you use .name
to alter the name of that newly-constructed DataArray
you're not actually touching the wrapping Dataset
at all, and therefore have no way to update its keys.
Indeed the same thing happens (for the same reason) when altering the name of a Variable that is part of a DataTree
node:
In [8]: import xarray as xr
In [9]: root['a'] = xr.DataArray(0)
In [10]: root
Out[10]:
<xarray.DataTree 'root'>
Group: /
│ Dimensions: ()
│ Data variables:
│ a int64 8B 0
└── Group: /child
In [11]: root['a'].name
Out[11]: 'a'
In [12]: root['a'].name = 'b'
In [13]: root['a']
Out[13]:
<xarray.DataArray 'a' ()> Size: 8B
array(0)
Is this ignoring of the name setter intended, or a bug? And should DataTree
follow the behaviour of Dataset
here or not? If it doesn't follow the behaviour of Dataset
then we're going to end up with a situation where
root['child'].name = 'new_name'
actually makes a change to root
, but
root['a'].name = 'new_name'
does not, so the behaviour of .name
depends on whether or not your key corresponds to a DataTree
value or a Variable
value. 🫤
In the DataTree
case I think it is a lot easier to update the key in the wrapping object when .name
is set because there is a bi-directional linkage back to the parent node, which doesn't exist to link the DataArray
instance back to Dataset
.
cc @shoyer - I would appreciate your thoughts here.