Skip to content

Commit 8805658

Browse files
dcheriantomvothecoderpre-commit-ci[bot]
authored
Add SeasonGrouper, SeasonResampler (#9524)
Co-authored-by: Tom Vo <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent c0dc71b commit 8805658

File tree

13 files changed

+934
-51
lines changed

13 files changed

+934
-51
lines changed

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1329,6 +1329,8 @@ Grouper Objects
13291329
groupers.BinGrouper
13301330
groupers.UniqueGrouper
13311331
groupers.TimeResampler
1332+
groupers.SeasonGrouper
1333+
groupers.SeasonResampler
13321334

13331335

13341336
Rolling objects

doc/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,8 @@
182182
"pd.NaT": "~pandas.NaT",
183183
}
184184

185+
autodoc_type_aliases = napoleon_type_aliases # Keep both in sync
186+
185187
# mermaid config
186188
mermaid_version = "10.9.1"
187189

doc/user-guide/groupby.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,14 @@ Different groupers can be combined to construct sophisticated GroupBy operations
332332
ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
333333
334334
335+
Time Grouping and Resampling
336+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
337+
338+
.. seealso::
339+
340+
See :ref:`resampling`.
341+
342+
335343
Shuffling
336344
~~~~~~~~~
337345

doc/user-guide/time-series.rst

Lines changed: 101 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: xarray
2+
13
.. _time-series:
24

35
================
@@ -21,26 +23,19 @@ core functionality.
2123
Creating datetime64 data
2224
------------------------
2325

24-
Xarray uses the numpy dtypes ``datetime64[unit]`` and ``timedelta64[unit]``
25-
(where unit is one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
26+
Xarray uses the numpy dtypes :py:class:`numpy.datetime64` and :py:class:`numpy.timedelta64`
27+
with specified units (one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
2628
data, which offer vectorized operations with numpy and smooth integration with pandas.
2729

28-
To convert to or create regular arrays of ``datetime64`` data, we recommend
29-
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
30+
To convert to or create regular arrays of :py:class:`numpy.datetime64` data, we recommend
31+
using :py:func:`pandas.to_datetime`, :py:class:`pandas.DatetimeIndex`, or :py:func:`xarray.date_range`:
3032

3133
.. ipython:: python
3234
3335
pd.to_datetime(["2000-01-01", "2000-02-02"])
3436
pd.DatetimeIndex(
3537
["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
3638
)
37-
pd.date_range("2000-01-01", periods=365)
38-
pd.date_range("2000-01-01", periods=365, unit="s")
39-
40-
It is also possible to use corresponding :py:func:`xarray.date_range`:
41-
42-
.. ipython:: python
43-
4439
xr.date_range("2000-01-01", periods=365)
4540
xr.date_range("2000-01-01", periods=365, unit="s")
4641
@@ -81,7 +76,7 @@ attribute like ``'days since 2000-01-01'``).
8176

8277

8378
You can manual decode arrays in this form by passing a dataset to
84-
:py:func:`~xarray.decode_cf`:
79+
:py:func:`decode_cf`:
8580

8681
.. ipython:: python
8782
@@ -93,8 +88,8 @@ You can manual decode arrays in this form by passing a dataset to
9388
coder = xr.coders.CFDatetimeCoder(time_unit="s")
9489
xr.decode_cf(ds, decode_times=coder)
9590
96-
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
97-
:py:class:`~xarray.CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
91+
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`CFTimeIndex` will be used for indexing.
92+
:py:class:`CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
9893
See :ref:`CFTimeIndex` for more information.
9994

10095
Datetime indexing
@@ -205,35 +200,37 @@ You can also search for multiple months (in this case January through March), us
205200
Resampling and grouped operations
206201
---------------------------------
207202

208-
Datetime components couple particularly well with grouped operations (see
209-
:ref:`groupby`) for analyzing features that repeat over time. Here's how to
210-
calculate the mean by time of day:
203+
204+
.. seealso::
205+
206+
For more generic documentation on grouping, see :ref:`groupby`.
207+
208+
209+
Datetime components couple particularly well with grouped operations for analyzing features that repeat over time.
210+
Here's how to calculate the mean by time of day:
211211

212212
.. ipython:: python
213-
:okwarning:
214213
215214
ds.groupby("time.hour").mean()
216215
217216
For upsampling or downsampling temporal resolutions, xarray offers a
218-
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
217+
:py:meth:`Dataset.resample` method building on the core functionality
219218
offered by the pandas method of the same name. Resample uses essentially the
220-
same api as ``resample`` `in pandas`_.
219+
same api as :py:meth:`pandas.DataFrame.resample` `in pandas`_.
221220

222221
.. _in pandas: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling
223222

224223
For example, we can downsample our dataset from hourly to 6-hourly:
225224

226225
.. ipython:: python
227-
:okwarning:
228226
229227
ds.resample(time="6h")
230228
231-
This will create a specialized ``Resample`` object which saves information
232-
necessary for resampling. All of the reduction methods which work with
233-
``Resample`` objects can also be used for resampling:
229+
This will create a specialized :py:class:`~xarray.core.resample.DatasetResample` or :py:class:`~xarray.core.resample.DataArrayResample`
230+
object which saves information necessary for resampling. All of the reduction methods which work with
231+
:py:class:`Dataset` or :py:class:`DataArray` objects can also be used for resampling:
234232

235233
.. ipython:: python
236-
:okwarning:
237234
238235
ds.resample(time="6h").mean()
239236
@@ -252,7 +249,7 @@ by specifying the ``dim`` keyword argument
252249
ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])
253250
254251
For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
255-
``nearest`` and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d``
252+
``nearest`` and ``interpolate``. ``interpolate`` extends :py:func:`scipy.interpolate.interp1d`
256253
and supports all of its schemes. All of these resampling operations work on both
257254
Dataset and DataArray objects with an arbitrary number of dimensions.
258255

@@ -266,9 +263,7 @@ Data that has indices outside of the given ``tolerance`` are set to ``NaN``.
266263
267264
It is often desirable to center the time values after a resampling operation.
268265
That can be accomplished by updating the resampled dataset time coordinate values
269-
using time offset arithmetic via the `pandas.tseries.frequencies.to_offset`_ function.
270-
271-
.. _pandas.tseries.frequencies.to_offset: https://pandas.pydata.org/docs/reference/api/pandas.tseries.frequencies.to_offset.html
266+
using time offset arithmetic via the :py:func:`pandas.tseries.frequencies.to_offset` function.
272267

273268
.. ipython:: python
274269
@@ -277,5 +272,80 @@ using time offset arithmetic via the `pandas.tseries.frequencies.to_offset`_ fun
277272
resampled_ds["time"] = resampled_ds.get_index("time") + offset
278273
resampled_ds
279274
280-
For more examples of using grouped operations on a time dimension, see
281-
:doc:`../examples/weather-data`.
275+
276+
.. seealso::
277+
278+
For more examples of using grouped operations on a time dimension, see :doc:`../examples/weather-data`.
279+
280+
281+
Handling Seasons
282+
~~~~~~~~~~~~~~~~
283+
284+
Two extremely common time series operations are to group by seasons, and resample to a seasonal frequency.
285+
Xarray has historically supported some simple versions of these computations.
286+
For example, ``.groupby("time.season")`` (where the seasons are DJF, MAM, JJA, SON)
287+
and resampling to a seasonal frequency using Pandas syntax: ``.resample(time="QS-DEC")``.
288+
289+
Quite commonly one wants more flexibility in defining seasons. For these use-cases, Xarray provides
290+
:py:class:`groupers.SeasonGrouper` and :py:class:`groupers.SeasonResampler`.
291+
292+
293+
.. currentmodule:: xarray.groupers
294+
295+
.. ipython:: python
296+
297+
from xarray.groupers import SeasonGrouper
298+
299+
ds.groupby(time=SeasonGrouper(["DJF", "MAM", "JJA", "SON"])).mean()
300+
301+
302+
Note how the seasons are in the specified order, unlike ``.groupby("time.season")`` where the
303+
seasons are sorted alphabetically.
304+
305+
.. ipython:: python
306+
307+
ds.groupby("time.season").mean()
308+
309+
310+
:py:class:`SeasonGrouper` supports overlapping seasons:
311+
312+
.. ipython:: python
313+
314+
ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()
315+
316+
317+
Skipping months is allowed:
318+
319+
.. ipython:: python
320+
321+
ds.groupby(time=SeasonGrouper(["JJAS"])).mean()
322+
323+
324+
Use :py:class:`SeasonResampler` to specify custom seasons.
325+
326+
.. ipython:: python
327+
328+
from xarray.groupers import SeasonResampler
329+
330+
ds.resample(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])).mean()
331+
332+
333+
:py:class:`SeasonResampler` is smart enough to correctly handle years for seasons that
334+
span the end of the year (e.g. DJF). By default :py:class:`SeasonResampler` will skip any
335+
season that is incomplete (e.g. the first DJF season for a time series that starts in Jan).
336+
Pass the ``drop_incomplete=False`` kwarg to :py:class:`SeasonResampler` to disable this behaviour.
337+
338+
.. ipython:: python
339+
340+
from xarray.groupers import SeasonResampler
341+
342+
ds.resample(
343+
time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=False)
344+
).mean()
345+
346+
347+
Seasons need not be of the same length:
348+
349+
.. ipython:: python
350+
351+
ds.resample(time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).mean()

properties/test_properties.py

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1+
import itertools
2+
13
import pytest
24

35
pytest.importorskip("hypothesis")
46

5-
from hypothesis import given
7+
import hypothesis.strategies as st
8+
from hypothesis import given, note
69

710
import xarray as xr
811
import xarray.testing.strategies as xrst
12+
from xarray.groupers import find_independent_seasons, season_to_month_tuple
913

1014

1115
@given(attrs=xrst.simple_attrs)
@@ -15,3 +19,45 @@ def test_assert_identical(attrs):
1519

1620
ds = xr.Dataset(attrs=attrs)
1721
xr.testing.assert_identical(ds, ds.copy(deep=True))
22+
23+
24+
@given(
25+
roll=st.integers(min_value=0, max_value=12),
26+
breaks=st.lists(
27+
st.integers(min_value=0, max_value=11), min_size=1, max_size=12, unique=True
28+
),
29+
)
30+
def test_property_season_month_tuple(roll, breaks):
31+
chars = list("JFMAMJJASOND")
32+
months = tuple(range(1, 13))
33+
34+
rolled_chars = chars[roll:] + chars[:roll]
35+
rolled_months = months[roll:] + months[:roll]
36+
breaks = sorted(breaks)
37+
if breaks[0] != 0:
38+
breaks = [0] + breaks
39+
if breaks[-1] != 12:
40+
breaks = breaks + [12]
41+
seasons = tuple(
42+
"".join(rolled_chars[start:stop]) for start, stop in itertools.pairwise(breaks)
43+
)
44+
actual = season_to_month_tuple(seasons)
45+
expected = tuple(
46+
rolled_months[start:stop] for start, stop in itertools.pairwise(breaks)
47+
)
48+
assert expected == actual
49+
50+
51+
@given(data=st.data(), nmonths=st.integers(min_value=1, max_value=11))
52+
def test_property_find_independent_seasons(data, nmonths):
53+
chars = "JFMAMJJASOND"
54+
# if stride > nmonths, then we can't infer season order
55+
stride = data.draw(st.integers(min_value=1, max_value=nmonths))
56+
chars = chars + chars[:nmonths]
57+
seasons = [list(chars[i : i + nmonths]) for i in range(0, 12, stride)]
58+
note(seasons)
59+
groups = find_independent_seasons(seasons)
60+
for group in groups:
61+
inds = tuple(itertools.chain(*group.inds))
62+
assert len(inds) == len(set(inds))
63+
assert len(group.codes) == len(set(group.codes))

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,8 @@ extend-ignore-identifiers-re = [
393393
[tool.typos.default.extend-words]
394394
# NumPy function names
395395
arange = "arange"
396+
ond = "ond"
397+
aso = "aso"
396398

397399
# Technical terms
398400
nd = "nd"

xarray/compat/toolzcompat.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# This file contains functions copied from the toolz library in accordance
2+
# with its license. The original copyright notice is duplicated below.
3+
4+
# Copyright (c) 2013 Matthew Rocklin
5+
6+
# All rights reserved.
7+
8+
# Redistribution and use in source and binary forms, with or without
9+
# modification, are permitted provided that the following conditions are met:
10+
11+
# a. Redistributions of source code must retain the above copyright notice,
12+
# this list of conditions and the following disclaimer.
13+
# b. Redistributions in binary form must reproduce the above copyright
14+
# notice, this list of conditions and the following disclaimer in the
15+
# documentation and/or other materials provided with the distribution.
16+
# c. Neither the name of toolz nor the names of its contributors
17+
# may be used to endorse or promote products derived from this software
18+
# without specific prior written permission.
19+
20+
21+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
22+
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24+
# ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR
25+
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26+
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
27+
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
28+
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29+
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30+
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
31+
# DAMAGE.
32+
33+
34+
def sliding_window(n, seq):
35+
"""A sequence of overlapping subsequences
36+
37+
>>> list(sliding_window(2, [1, 2, 3, 4]))
38+
[(1, 2), (2, 3), (3, 4)]
39+
40+
This function creates a sliding window suitable for transformations like
41+
sliding means / smoothing
42+
43+
>>> mean = lambda seq: float(sum(seq)) / len(seq)
44+
>>> list(map(mean, sliding_window(2, [1, 2, 3, 4])))
45+
[1.5, 2.5, 3.5]
46+
"""
47+
import collections
48+
import itertools
49+
50+
return zip(
51+
*(
52+
collections.deque(itertools.islice(it, i), 0) or it
53+
for i, it in enumerate(itertools.tee(seq, n))
54+
),
55+
strict=False,
56+
)

xarray/core/dataarray.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6860,7 +6860,7 @@ def groupby(
68606860
68616861
>>> da.groupby("letters")
68626862
<DataArrayGroupBy, grouped over 1 grouper(s), 2 groups in total:
6863-
'letters': 2/2 groups present with labels 'a', 'b'>
6863+
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>
68646864
68656865
Execute a reduction
68666866
@@ -6876,8 +6876,8 @@ def groupby(
68766876
68776877
>>> da.groupby(["letters", "x"])
68786878
<DataArrayGroupBy, grouped over 2 grouper(s), 8 groups in total:
6879-
'letters': 2/2 groups present with labels 'a', 'b'
6880-
'x': 4/4 groups present with labels 10, 20, 30, 40>
6879+
'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'
6880+
'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>
68816881
68826882
Use Grouper objects to express more complicated GroupBy operations
68836883

0 commit comments

Comments
 (0)