Skip to content

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate  #2710

Closed
@pletchm

Description

@pletchm

Current expand_dims functionality

Apparently, expand_dims can only create a dimension for a point coordinate, i.e. it promotes a scalar coordinate into 1D coordinate. Here is an example:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
>>> da["a"] = 0  # create a point coordinate
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
    a        int64 0
>>> da.expand_dims("a")  # create a new dimension "a" for the point coordinated
<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0
>>>

Problem description

I want to be able to do 2 more things with expand_dims or maybe a related/similar method:

  1. broadcast the data across 1 or more new dimensions
  2. expand an existing dimension to include 1 or more new coordinates

Here is the code I currently use to accomplish this

from collections import OrderedDict

import xarray as xr


def expand_dimensions(data, fill_value=np.nan, **new_coords):
    """Expand (or add if it doesn't yet exist) the data array to fill in new
    coordinates across multiple dimensions.

    If a dimension doesn't exist in the dataarray yet, then the result will be
    `data`, broadcasted across this dimension.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 3, b: 5)>
    array([[ 1.,  1.,  1.,  1.,  1.],
           [ 2.,  2.,  2.,  2.,  2.],
           [ 3.,  3.,  3.,  3.,  3.]])
    Coordinates:
      * a        (a) int64 0 1 2
      * b        (b) int64 1 2 3 4 5

    Or, if `dim` is already a dimension in `data`, then any new coordinate
    values in `new_coords` that are not yet in `data[dim]` will be added,
    and the values corresponding to those new coordinates will be `fill_value`.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 6)>
    array([ 1.,  2.,  3.,  0.,  0.,  0.])
    Coordinates:
      * a        (a) int64 0 1 2 3 4 5

    Args:
        data (xarray.DataArray):
            Data that needs dimensions expanded.
        fill_value (scalar, xarray.DataArray, optional):
            If expanding new coords this is the value of the new datum.
            Defaults to `np.nan`.
        **new_coords (list[int | str]):
            The keywords are arbitrary dimensions and the values are
            coordinates of those dimensions that the data will include after it
            has been expanded.
    Returns:
        xarray.DataArray:
            Data that had its dimensions expanded to include the new
            coordinates.
    """
    ordered_coord_dict = OrderedDict(new_coords)
    shape_da = xr.DataArray(
        np.zeros(list(map(len, ordered_coord_dict.values()))),
        coords=ordered_coord_dict,
        dims=ordered_coord_dict.keys())
    expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
    return expanded_data

Here's an example of broadcasting data across a new dimension:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> expand_dimensions(da, a=[0, 1, 2])
<xarray.DataArray (b: 5, c: 3, a: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0 1 2

Here's an example of expanding an existing dimension to include new coordinates:

>>> expand_dimensions(da, b=[5, 6])
<xarray.DataArray (b: 7, c: 3)>
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [nan, nan, nan],
       [nan, nan, nan]])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6
  * c        (c) int64 0 1 2

Final Note

If no one else is already working on this, and if it seems like a useful addition to XArray, then I would more than happy to work on this. Please let me know.

Thank you,
Martin

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions