center=True for xarray.DataArray.rolling() #1046

chunweiyuan · 2016-10-13T00:37:25Z

The logic behind setting center=True confuses me. Say window size = 3. The default behavior (center=False) sets the window to go from i-2 to i, so I would've expected center=True to set the window from i-1 to i+1. But that's not what I see.

For example, this is what data looks like:

>>> data = xr.DataArray(np.arange(27).reshape(3, 3, 3), coords=[('x', ['a', 'b', 'c']), ('y', [-2, 0, 2]), ('z', [0, 1 ,2])])

>>>data
xarray.DataArray (x: 3, y: 3, z: 3),
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])
Coordinates:
  * x        (x) |S1 'a' 'b' 'c'
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2

Now, if I set y-window size = 3, center = False, min # of entries = 1, I get

>>> r = data.rolling(y=3, center=False, min_periods=1)
>>> r.mean()
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[  0. ,   1. ,   2. ],
        [  1.5,   2.5,   3.5],
        [  3. ,   4. ,   5. ]],

       [[  9. ,  10. ,  11. ],
        [ 10.5,  11.5,  12.5],
        [ 12. ,  13. ,  14. ]],

       [[ 18. ,  19. ,  20. ],
        [ 19.5,  20.5,  21.5],
        [ 21. ,  22. ,  23. ]]])
Coordinates:
  * x        (x) |S1 'a' 'b' 'c'
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2

Which essentially gives me a "trailing window" of size 3, meaning the window goes from i-2 to i. This is not explained in the doc but can be understood empirically.

On the other hand, setting center = True gives

>>> r = data.rolling(y=3, center=True, min_periods=1)
>>> r.mean()
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[  1.5,   2.5,   3.5],
        [  3. ,   4. ,   5. ],
        [  nan,   nan,   nan]],

       [[ 10.5,  11.5,  12.5],
        [ 12. ,  13. ,  14. ],
        [  nan,   nan,   nan]],

       [[ 19.5,  20.5,  21.5],
        [ 21. ,  22. ,  23. ],
        [  nan,   nan,   nan]]])
Coordinates:
  * x        (x) |S1 'a' 'b' 'c'
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2

In other words, it just pushes every cell up the y-dim by 1, using nan to represent things coming off the edge of the universe. If you look at _center_result() of xarray/core/rolling.py, that's exactly what it does with .shift().

I would've expected center=True to change the window to go from i-1 to i+1. In which case, with min_periods=1, would not render any nan value in r.mean().

Could someone explain the logical flow to me?

Much obliged,

Chun

The text was updated successfully, but these errors were encountered:

shoyer · 2016-10-13T03:37:55Z

I think we mostly tried to make this consistent with pandas. To be honest I don't entirely understand the logic myself.

Cc @jhamman

jhamman · 2016-10-13T03:58:32Z

We do try to stay consistent with pandas except for the last position. Here's the unit test where we verify that behavior.

Using x=0 from your example in Pandas:

In [1]: import pandas as pd
s
In [2]: data = pd.Series([0, 3, 6])

In [3]: data.rolling(3, center=True, min_periods=1).mean()
Out[3]: 
0    1.5
1    3.0
2    4.5

If I remember correctly, and my brain is a bit like mush right now so I could be wrong, bottleneck and pandas handle this case differently so we had to make a decision. We choose to use bottleneck (for speed) but to do our best to stay consistent with pandas. Back to your example, this time just with bottleneck:

In [4]: bn.move_mean(data, 3, min_count=1)
Out[4]: array([ 0. ,  1.5,  3. ])

So, as you can see, bottleneck does something totally different that wouldn't otherwise work with center=True unless we did our little shift trick. I'm not really sure the best way to correct for this difference in the last position except to either a) try to push a center=True option into bottleneck (may not be possible), or b) write a bunch of logic on our end bridge the gap between these two (may be laborious). Of course, I'm open to ideas.

chunweiyuan · 2016-10-14T00:53:54Z

My opinion is that the nan has got to go. If we want to (1) maintain pandas-consistency and (2) use bottleneck without mucking it up, then I think we need to add some logic in either rolling.reduce() or rolling._center_result().

So here's my failed attempt:

def reverse_and_roll_1d(data, window_size, min_periods=1):
    """
    Implements a concept to fix the end-of-array problem with
    xarray.core.rolling._center_shift(),
    by
    1.) take slice of the back-end of the array
    2.) flip it
    3.) compute centered-window arithmetic
    4.) flip it again
    5.) replace back-end of default result with (4)

    :param DataArray data: 1-D data array, with dim name 'x'.
    :param int window_size: size of window.
    """
    # first the default way to computing centered window
    r = data.rolling(x=window_size, center=True, min_periods=min_periods)
    avg = r.mean()
    # now we need to fix the back-end of the array
    rev_start = len(data.x) # an index
    rev_end = len(data.x) - window_size - 1 \
                     if len(data.data) > window_size \
                    else None  # another index
    tail_slice = slice(rev_start, rev_end, -1) # back end of array, flipped
    r2 = data[dict(x=tail_slice)].\
        rolling(x=window_size, center=True, min_periods=min_periods)
    avg[dict(x=slice(-window_size+1, None))] = \
        r2.mean()[dict(x=slice(window_size-2, None, -1))] # replacement

    return avg

This algorithm is consistently 8 times slower than pd.DataFrame.rolling(), for various 1d array sizes.

I'm open to ideas as well :)

shoyer · 2016-10-14T21:56:42Z

@chunweiyuan I agree, this seems worth doing, and I think you have a pretty sensible approach here. For large arrays (especially with ndim > 1), this should add only minimal performance overhead. If you can fit this into the existing framework for rolling that would be awesome!

jhamman · 2016-10-17T22:00:32Z

I'm fine with this approach for now. It would be great if we could convince bottleneck to help us out with a keyword argument of some kind.

chunweiyuan · 2016-10-20T22:32:23Z

Let me exhaust a few other ideas first. I'll definitely share my thoughts here first before making any commit. Thanks.

stale · 2020-11-21T07:36:23Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

phil-blain · 2024-04-04T21:06:56Z

This seems to have been fixed in https://github.com/pydata/xarray/pull/1837/files#diff-66d415f4d4a5d0969b40e35b86cbf67612bc3b88c7f02957a550f12df7e0e14eR149-R154, right ? I think this issue can be closed.

keewis · 2024-08-06T10:37:02Z

I agree, the example with center=True from the original post now returns

In [1]: import xarray as xr
   ...: import numpy as np
   ...: 
   ...: data = xr.DataArray(
   ...:     np.arange(27).reshape(3, 3, 3),
   ...:     coords=[("x", ["a", "b", "c"]), ("y", [-2, 0, 2]), ("z", [0, 1, 2])],
   ...: )
   ...: display(
   ...:     data.rolling(y=3, center=False, min_periods=1).mean(),
   ...:     data.rolling(y=3, center=True, min_periods=1).mean(),
   ...: )
<xarray.DataArray (x: 3, y: 3, z: 3)> Size: 216B
array([[[ 0. ,  1. ,  2. ],
        [ 1.5,  2.5,  3.5],
        [ 3. ,  4. ,  5. ]],

       [[ 9. , 10. , 11. ],
        [10.5, 11.5, 12.5],
        [12. , 13. , 14. ]],

       [[18. , 19. , 20. ],
        [19.5, 20.5, 21.5],
        [21. , 22. , 23. ]]])
Coordinates:
  * x        (x) <U1 12B 'a' 'b' 'c'
  * y        (y) int64 24B -2 0 2
  * z        (z) int64 24B 0 1 2
<xarray.DataArray (x: 3, y: 3, z: 3)> Size: 216B
array([[[ 1.5,  2.5,  3.5],
        [ 3. ,  4. ,  5. ],
        [ 4.5,  5.5,  6.5]],

       [[10.5, 11.5, 12.5],
        [12. , 13. , 14. ],
        [13.5, 14.5, 15.5]],

       [[19.5, 20.5, 21.5],
        [21. , 22. , 23. ],
        [22.5, 23.5, 24.5]]])
Coordinates:
  * x        (x) <U1 12B 'a' 'b' 'c'
  * y        (y) int64 24B -2 0 2
  * z        (z) int64 24B 0 1 2

which I think makes sense?

jhamman added design question topic-pandas-like labels Oct 13, 2016

jhamman mentioned this issue Nov 30, 2016

win_type for rolling() ? #1142

Closed

stale bot added the stale label Nov 21, 2020

dcherian added the topic-rolling label Feb 18, 2021

stale bot removed the stale label Feb 18, 2021

keewis closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

center=True for xarray.DataArray.rolling() #1046

center=True for xarray.DataArray.rolling() #1046

chunweiyuan commented Oct 13, 2016 •

edited

Loading

shoyer commented Oct 13, 2016

jhamman commented Oct 13, 2016

chunweiyuan commented Oct 14, 2016

shoyer commented Oct 14, 2016 •

edited

Loading

jhamman commented Oct 17, 2016

chunweiyuan commented Oct 20, 2016

stale bot commented Nov 21, 2020

phil-blain commented Apr 4, 2024

keewis commented Aug 6, 2024

center=True for xarray.DataArray.rolling() #1046

center=True for xarray.DataArray.rolling() #1046

Comments

chunweiyuan commented Oct 13, 2016 • edited Loading

shoyer commented Oct 13, 2016

jhamman commented Oct 13, 2016

chunweiyuan commented Oct 14, 2016

shoyer commented Oct 14, 2016 • edited Loading

jhamman commented Oct 17, 2016

chunweiyuan commented Oct 20, 2016

stale bot commented Nov 21, 2020

phil-blain commented Apr 4, 2024

keewis commented Aug 6, 2024

chunweiyuan commented Oct 13, 2016 •

edited

Loading

shoyer commented Oct 14, 2016 •

edited

Loading