Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

assigning Dataset objects to nodes #200

Closed
keewis opened this issue Jan 18, 2023 · 4 comments
Closed

assigning Dataset objects to nodes #200

keewis opened this issue Jan 18, 2023 · 4 comments

Comments

@keewis
Copy link
Contributor

keewis commented Jan 18, 2023

Intuitively, I'd assume

tree = datatree.DataTree()
tree["path"] = xr.Dataset()

to create a new node with the assigned Dataset as the data.

However, currently this raises

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [7], line 2
      1 tree = datatree.DataTree()
----> 2 tree["path"] = xr.Dataset()

File .../datatree/datatree.py:786, in DataTree.__setitem__(self, key, value)
    782 elif isinstance(key, str):
    783     # TODO should possibly deal with hashables in general?
    784     # path-like: a name of a node/variable, or path to a node/variable
    785     path = NodePath(key)
--> 786     return self._set_item(path, value, new_nodes_along_path=True)
    787 else:
    788     raise ValueError("Invalid format for key")

File .../datatree/treenode.py:479, in TreeNode._set_item(self, path, item, new_nodes_along_path, allow_overwrite)
    477         raise KeyError(f"Already a node object at path {path}")
    478 else:
--> 479     current_node._set(name, item)

File .../datatree/datatree.py:762, in DataTree._set(self, key, val)
    759 else:
    760     if not isinstance(val, (DataArray, Variable)):
    761         # accommodate other types that can be coerced into Variables
--> 762         val = DataArray(val)
    764     self.update({key: val})

File .../lib/python3.10/site-packages/xarray/core/dataarray.py:412, in DataArray.__init__(self, data, coords, dims, name, attrs, indexes, fastpath)
    409     attrs = getattr(data, "attrs", None)
    411 data = _check_data_shape(data, coords, dims)
--> 412 data = as_compatible_data(data)
    413 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    414 variable = Variable(dims, data, attrs, fastpath=True)

File .../lib/python3.10/site-packages/xarray/core/variable.py:243, in as_compatible_data(data, fastpath)
    240     return data
    242 # validate whether the data is valid data types.
--> 243 data = np.asarray(data)
    245 if isinstance(data, np.ndarray) and data.dtype.kind in "OMm":
    246     data = _possibly_convert_objects(data)

File .../lib/python3.10/site-packages/xarray/core/dataset.py:1374, in Dataset.__array__(self, dtype)
   1373 def __array__(self, dtype=None):
-> 1374     raise TypeError(
   1375         "cannot directly convert an xarray.Dataset into a "
   1376         "numpy array. Instead, create an xarray.DataArray "
   1377         "first, either with indexing on the Dataset or by "
   1378         "invoking the `to_array()` method."
   1379     )

TypeError: cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.

which does not really tell us why it raises.

Would it be possible to enable that short-cut for assigning data to new nodes? I'm not sure what to do if the node already exists, though (overwrite the data? drop the existing node?)

@dcherian
Copy link

I had this issue too. I think the error message should ask the user to use DataTree(Dataset(...)).

Making the shortcut work might make the data model appear confusing

@TomNicholas
Copy link
Member

It raises just because I left it for later to implement.

I do agree with Deepak though - is it confusing to be able to assign a Datatree object or a Dataset object? We could also generalise to assigning a mapping of variables or arrays - we could assign anything that is a valid set of arguments to the Dataset constructor. I wasn't sure how far to go so I stopped early.

@keewis
Copy link
Contributor Author

keewis commented Jan 19, 2023

I guess that's reasonable, and I can just use ds.pipe(datatree.DataTree) for or ds.to_node() with #191, so that's not too much to type (I'd like to avoid DataTree(Dataset(...)) since I find that a bit difficult to read).

The error message definitely could use some improvement in that case, though!

@flamingbear
Copy link
Contributor

closed for pydata/xarray#9335

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants