Skip to content

Consistency of names of stored children / dataarrays with keys in wrapping object #9447

@TomNicholas

Description

@TomNicholas

What is your issue?

In xarray-contrib/datatree#309 @marcel-goldschen-ohm pointed out some counterintuitive behaviour of the DataTree.name setter.

# using xarray main
In [2]: from xarray.core.datatree import DataTree

In [3]: root = DataTree.from_dict(
   ...:     {
   ...:         'child': DataTree(),
   ...:     },
   ...:     name='root',
   ...: )

In [4]: root['child'].name = 'childish'

In [5]: root
Out[5]: 
<xarray.DataTree 'root'>
Group: /
└── Group: /childish

this looks fine, but

In [6]: root.children
Out[6]: 
Frozen({'child': <xarray.DataTree 'childish'>
Group: /childish})

which means that

In [14]: root['childish']
KeyError: 'Could not find node at childish'

Okay, so we just fix it so that setting .name changes the key in the parent node right? But as @etienneschalk pointed out in xarray-contrib/datatree#309 (comment), that wouldn't be consistent with DataArray and Dataset names behave: they don't update the key in the wrapping object either. Instead they just completely ignore that you asked to update it at all:

In [15]: ds = xr.Dataset({'a': 0})

In [17]: ds['a'].name = 'b'

In [18]: ds
Out[18]: 
<xarray.Dataset> Size: 8B
Dimensions:  ()
Data variables:
    a        int64 8B 0

In [19]: ds.variables
Out[19]: 
Frozen({'a': <xarray.Variable ()> Size: 8B
array(0)})

This happens because Dataset objects do not actually store DataArray objects, they only store Variables (which are un-named), and reconstruct a new DataArray upon __getitem__ calls. So if you use .name to alter the name of that newly-constructed DataArray you're not actually touching the wrapping Dataset at all, and therefore have no way to update its keys.

Indeed the same thing happens (for the same reason) when altering the name of a Variable that is part of a DataTree node:

In [8]: import xarray as xr

In [9]: root['a'] = xr.DataArray(0)

In [10]: root
Out[10]: 
<xarray.DataTree 'root'>
Group: /Dimensions:  ()
│   Data variables:
│       a        int64 8B 0
└── Group: /child

In [11]: root['a'].name
Out[11]: 'a'

In [12]: root['a'].name = 'b'

In [13]: root['a']
Out[13]: 
<xarray.DataArray 'a' ()> Size: 8B
array(0)

Is this ignoring of the name setter intended, or a bug? And should DataTree follow the behaviour of Dataset here or not? If it doesn't follow the behaviour of Dataset then we're going to end up with a situation where

root['child'].name = 'new_name'

actually makes a change to root, but

root['a'].name = 'new_name'

does not, so the behaviour of .name depends on whether or not your key corresponds to a DataTree value or a Variable value. 🫤

In the DataTree case I think it is a lot easier to update the key in the wrapping object when .name is set because there is a bi-directional linkage back to the parent node, which doesn't exist to link the DataArray instance back to Dataset.

cc @shoyer - I would appreciate your thoughts here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions