Skip to content

Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 #2757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 26, 2019

Conversation

pletchm
Copy link
Contributor

@pletchm pletchm commented Feb 8, 2019

This pull request enhances the expand_dims method for both Dataset and DataArray objects to support inserting/broadcasting dimensions with size > 1. It corresponds to this issue #2710.

Changes:

  1. dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions
  2. dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions
  3. Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg
  4. Add expand_dims enhancement from issue 2710 to whats-new.rst

Included:

  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

What's new:

All of the old functionality is still there, so it shouldn't break anyone's existing code that uses it.

You can now pass a dim as a dict, where the keys are the new dimensions and the values are either integers (giving the length of the new dimensions) or iterables (giving the coordinates of the new dimensions).

import numpy as np
import xarray as xr

>>> original = xr.Dataset({'x': ('a', np.random.randn(3)),
                                  'y': (['b', 'a'], np.random.randn(4, 3))},
                                  coords={'a': np.linspace(0, 1, 3),
                                                'b': np.linspace(0, 1, 4),
                                                'c': np.linspace(0, 1, 5)},
                                  attrs={'key': 'entry'})
>>> original
<xarray.Dataset>
Dimensions:  (a: 3, b: 4, c: 5)
Coordinates:
  * a        (a) float64 0.0 0.5 1.0
  * b        (b) float64 0.0 0.3333 0.6667 1.0
  * c        (c) float64 0.0 0.25 0.5 0.75 1.0
Data variables:
    x        (a) float64 -1.556 0.2178 0.6319
    y        (b, a) float64 0.5273 0.6652 0.3418 1.858 ... -0.3519 0.8088 0.8753
Attributes:
    key:      entry
>>> original.expand_dims({"d": 4, "e": ["l", "m", "n"]})
<xarray.Dataset>
Dimensions:  (a: 3, b: 4, c: 5, d: 4, e: 3)
Coordinates:
  * e        (e) <U1 'l' 'm' 'n'
  * a        (a) float64 0.0 0.5 1.0
  * b        (b) float64 0.0 0.3333 0.6667 1.0
  * c        (c) float64 0.0 0.25 0.5 0.75 1.0
Dimensions without coordinates: d
Data variables:
    x        (d, e, a) float64 -1.556 0.2178 0.6319 ... -1.556 0.2178 0.6319
    y        (d, e, b, a) float64 0.5273 0.6652 0.3418 ... -0.3519 0.8088 0.8753
Attributes:
    key:      entry

Or, equivalently, you can pass the new dimensions as kwargs instead of a dictionary.

original.expand_dims(d=4, e=["l", "m", "n"])

@pep8speaks
Copy link

pep8speaks commented Feb 8, 2019

Hello @pletchm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-26 01:00:29 UTC

@pletchm pletchm force-pushed the feature/expand-dims-broadcast branch from bc7db5a to 64eba1d Compare February 9, 2019 08:15
Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending a PR. I also think it is a nice feature.
Some minor comments

@pletchm pletchm force-pushed the feature/expand-dims-broadcast branch from f127d76 to e76c7d8 Compare February 14, 2019 05:56
pletchm added a commit to pletchm/xarray that referenced this pull request Feb 20, 2019
…ns with size>1 (pydata#2757)

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
@pletchm
Copy link
Contributor Author

pletchm commented Feb 26, 2019

Hi @shoyer and @fujiisoup, I believe I've made all of the updates that you suggested in your comments. How is the PR looking now?
Thank you!

pletchm added a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
…ns with size>1 (pydata#2757)

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
jwenfai and others added 14 commits March 21, 2019 08:31
…a#2721)

* Quarter offset implemented (base is now latest pydata-master).

* Fixed issues raised in review (pydata#2721 (review))

* Updated whats-new.rst with info on quarter offset support.

* Updated whats-new.rst with info on quarter offset support.

* Update doc/whats-new.rst

Co-Authored-By: jwenfai <[email protected]>

* Added support for quarter frequencies when resampling CFTimeIndex. Less redundancy in CFTimeIndex resampling tests.

* Removed normalization code (unnecessary for cftime_range) in cftime_offsets.py. Removed redundant lines in whats-new.rst.

* Removed invalid option from _get_day_of_month docstring. Added tests back in that raises ValueError when resampling (base=24 when resampling to daily freq, e.g., '8D').

* Minor edits to docstrings/comments

* lint
* ENH: Add Dataset.drop_dims()

* Drops full dimensions and any corresponding variables in a
  Dataset
* Fixes GH1949

* DOC: Add Dataset.drop_dims() documentation
* Added tests of desired name inferring behaviour

* Infers names

* updated what's new
It got deprecated in numpy 1.16 and throws a ton of warnings due to
that.
All the function does is returning .item() anyway, which is why it got
deprecated.
* Friendlier io title.

* Fix lists.

* Fix *args, **kwargs

"inline emphasis..."

* misc

* Reference xarray_extras for csv writing. Closes pydata#2289

* Add metpy accessor. Closes pydata#461

* fix transpose docstring. Closes pydata#2576

* Revert "Fix lists."

This reverts commit 39983a5.

* Revert "Fix *args, **kwargs"

This reverts commit 1b9da35.

* Add MetPy to related projects.

* Add Weather and Climate specific page.

* Add hvplot.

* Note open_dataset, mfdataset open files as read-only (closes pydata#2345).

* Update metpy 1

Co-Authored-By: dcherian <[email protected]>

* Update doc/weather-climate.rst

Co-Authored-By: dcherian <[email protected]>
0.12 will already have a big change in dropping Python 2.7 support. I'd rather
wait a bit longer to finalize these deprecations to minimize the impact on
users.
* attempt at loading remote hdf5

* added a couple tests

* rewind bytes after reading header

* addressed comments for tests and error message

* fixed pep8 formatting

* created _get_engine_from_magic_number function, new tests

* added description in whats-new

* fixed test failure on windows

* same error on windows and nix
pletchm added a commit to pletchm/xarray that referenced this pull request Mar 21, 2019
…ns with size>1 (pydata#2757)

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
…ns with size>1 (pydata#2757)

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
@pletchm pletchm force-pushed the feature/expand-dims-broadcast branch from 4d4f403 to 21fa6e0 Compare March 21, 2019 16:21
@pletchm
Copy link
Contributor Author

pletchm commented Mar 25, 2019

@shoyer, dim_kwargs is no longer allowed to be used for python 3.5. Does anything else need to change in this PR?

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small note, otherwise looks good to me!

…ns with size>1 (pydata#2757)

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
pletchm added 2 commits March 25, 2019 17:57
…ns with size>1 (pydata#2757)

 * Move enhancement description up to 0.12.1

 * use .size attribute to determine the size of a dimension, rather than converting to a list, which can be slow for large iterables

 * Make using dim_kwargs for python 3.5 illegal -- a ValueError is thrown

 * dataset.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * dataarray.expand_dims() method take dict like object where values represent length of dimensions or coordinates of dimesnsions

 * Add alternative option to passing a dict to the dim argument, which is now an optional kwarg, passing in each new dimension as its own kwarg

 * Add expand_dims enhancement from issue 2710 to whats-new.rst

 * Fix test_dataarray.TestDataArray.test_expand_dims_with_greater_dim_size tests to pass in python 3.5 using ordered dicts instead of regular dicts. This was needed because python 3.5 and earlier did not maintain insertion order for dicts

 * Restrict core logic to use 'dim' as a dict--it will be converted into a dict on entry if it is a str or a sequence of str

 * Don't cast dim values (coords) as a list since IndexVariable/Variable will internally convert it into a numpy.ndarray. So just use IndexVariable((k,), v)

 * TypeErrors should be raised for invalid input types, rather than ValueErrors.

 * Force 'dim' to be OrderedDict for python 3.5
@shoyer shoyer merged commit 16a2c03 into pydata:master Mar 26, 2019
@shoyer
Copy link
Member

shoyer commented Mar 26, 2019

Thanks @pletchm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.