-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
center=True for xarray.DataArray.rolling() #1046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we mostly tried to make this consistent with pandas. To be honest I don't entirely understand the logic myself. Cc @jhamman |
We do try to stay consistent with pandas except for the last position. Here's the unit test where we verify that behavior. Using In [1]: import pandas as pd
s
In [2]: data = pd.Series([0, 3, 6])
In [3]: data.rolling(3, center=True, min_periods=1).mean()
Out[3]:
0 1.5
1 3.0
2 4.5 If I remember correctly, and my brain is a bit like mush right now so I could be wrong, In [4]: bn.move_mean(data, 3, min_count=1)
Out[4]: array([ 0. , 1.5, 3. ]) So, as you can see, bottleneck does something totally different that wouldn't otherwise work with |
My opinion is that the nan has got to go. If we want to (1) maintain pandas-consistency and (2) use bottleneck without mucking it up, then I think we need to add some logic in either rolling.reduce() or rolling._center_result(). So here's my failed attempt:
This algorithm is consistently 8 times slower than pd.DataFrame.rolling(), for various 1d array sizes. I'm open to ideas as well :) |
@chunweiyuan I agree, this seems worth doing, and I think you have a pretty sensible approach here. For large arrays (especially with ndim > 1), this should add only minimal performance overhead. If you can fit this into the existing framework for |
I'm fine with this approach for now. It would be great if we could convince bottleneck to help us out with a keyword argument of some kind. |
Let me exhaust a few other ideas first. I'll definitely share my thoughts here first before making any commit. Thanks. |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
This seems to have been fixed in https://github.com/pydata/xarray/pull/1837/files#diff-66d415f4d4a5d0969b40e35b86cbf67612bc3b88c7f02957a550f12df7e0e14eR149-R154, right ? I think this issue can be closed. |
I agree, the example with In [1]: import xarray as xr
...: import numpy as np
...:
...: data = xr.DataArray(
...: np.arange(27).reshape(3, 3, 3),
...: coords=[("x", ["a", "b", "c"]), ("y", [-2, 0, 2]), ("z", [0, 1, 2])],
...: )
...: display(
...: data.rolling(y=3, center=False, min_periods=1).mean(),
...: data.rolling(y=3, center=True, min_periods=1).mean(),
...: )
<xarray.DataArray (x: 3, y: 3, z: 3)> Size: 216B
array([[[ 0. , 1. , 2. ],
[ 1.5, 2.5, 3.5],
[ 3. , 4. , 5. ]],
[[ 9. , 10. , 11. ],
[10.5, 11.5, 12.5],
[12. , 13. , 14. ]],
[[18. , 19. , 20. ],
[19.5, 20.5, 21.5],
[21. , 22. , 23. ]]])
Coordinates:
* x (x) <U1 12B 'a' 'b' 'c'
* y (y) int64 24B -2 0 2
* z (z) int64 24B 0 1 2
<xarray.DataArray (x: 3, y: 3, z: 3)> Size: 216B
array([[[ 1.5, 2.5, 3.5],
[ 3. , 4. , 5. ],
[ 4.5, 5.5, 6.5]],
[[10.5, 11.5, 12.5],
[12. , 13. , 14. ],
[13.5, 14.5, 15.5]],
[[19.5, 20.5, 21.5],
[21. , 22. , 23. ],
[22.5, 23.5, 24.5]]])
Coordinates:
* x (x) <U1 12B 'a' 'b' 'c'
* y (y) int64 24B -2 0 2
* z (z) int64 24B 0 1 2 which I think makes sense? |
The logic behind setting center=True confuses me. Say window size = 3. The default behavior (center=False) sets the window to go from i-2 to i, so I would've expected center=True to set the window from i-1 to i+1. But that's not what I see.
For example, this is what data looks like:
Now, if I set y-window size = 3, center = False, min # of entries = 1, I get
Which essentially gives me a "trailing window" of size 3, meaning the window goes from i-2 to i. This is not explained in the doc but can be understood empirically.
On the other hand, setting center = True gives
In other words, it just pushes every cell up the y-dim by 1, using nan to represent things coming off the edge of the universe. If you look at _center_result() of xarray/core/rolling.py, that's exactly what it does with .shift().
I would've expected center=True to change the window to go from i-1 to i+1. In which case, with min_periods=1, would not render any nan value in r.mean().
Could someone explain the logical flow to me?
Much obliged,
Chun
The text was updated successfully, but these errors were encountered: