-
Notifications
You must be signed in to change notification settings - Fork 6
ideas for daily time grid crunching #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
aa1b89d
a765cbe
b63b73c
ffd578b
13ef033
6157ade
b7aeda1
60123c1
e011614
dbb355c
8e9d302
5382ccd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
import numpy as np | ||
import pandas as pd | ||
import xarray as xr | ||
import datetime | ||
|
||
# Date Reading functions | ||
def read_zarr_data(zarr_path): | ||
|
@@ -619,6 +620,103 @@ def daily_tobegroupedby_season( | |
return daily_tobegroupedby_season | ||
|
||
|
||
def assign_season_coords( | ||
time_series, | ||
start_day, | ||
start_month, | ||
end_day, | ||
end_month, | ||
time_dim="T", | ||
): | ||
"""Assign season start and season end coordinates to this object, | ||
mapping time dimension dates that belong to a season to their season start and end. | ||
Data outside seasons are dropped. | ||
|
||
Parameters | ||
----------- | ||
time_series : DataArray, Dataset | ||
Time-dependent data to be grouped. | ||
start_day : int | ||
Day of the start date of the season. | ||
start_month : int | ||
Month of the start date of the season. | ||
end_day : int | ||
Day of the end date of the season. | ||
end_month : int | ||
Day of the end date of the season. | ||
time_dim : str, optional | ||
Time coordinate in `time_series` (default `time_dim` ="T"). | ||
|
||
Returns | ||
------- | ||
time_series : DataArray, Dataset | ||
`time_series` where days outside the seasons of interest are dropped | ||
and with two additional coordinates `season_start` and `season_end` | ||
that map the time dimension dates that belong to a season to | ||
their season start and end respectively. | ||
|
||
See Also | ||
-------- | ||
xarray.DataArray.assign_coords, xarray.DataArray.groupby | ||
|
||
Notes | ||
----- | ||
If the first (last) season is truncated, because `time_series` starts (ends) within it, | ||
The first (last) element of `season_start` ( `season_end` ) differs from all others | ||
and is the first (last) element of `time_dim` . | ||
The additional coordinates are expected to be used by a groupby function. | ||
|
||
Examples | ||
-------- | ||
""" | ||
if ( | ||
(start_day == 29) | ||
and (start_month == 2) | ||
and ((~time_series[time_dim].dt.is_leap_year).sum() > 0) | ||
) : | ||
raise Exception( | ||
"if there is at least one non-leap year in time_coord, can not start on 29-Feb" | ||
) | ||
# ending season on 29-Feb means ending on 1-Mar with -1 day offset | ||
remicousin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (end_day == 29 and end_month == 2) : | ||
end_day = 1 | ||
end_month = 3 | ||
day_offset = -1 | ||
else: | ||
day_offset = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe end_month/end_day should always represent the first day that's not included in the season? I.e. day_offset is always -1, then the DJF season would be represented with end_month = 3, end_month = 1, and JFM with end_month = 4, end_day = 1. Advantages: no special case --> less likelihood of a bug in the special case that we don't notice because we failed to test it; corresponds to e.g. how arrays are sliced in python: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's late in the review cycle for me to be introducing something like this. Feel free to ignore if you don't like the suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see arguments for (or against) either way. I am not sure there won't be any special case. If the user was a season ending in 28 Feb (included), would they put 29 Feb or 1 Mar? If they put 29 Feb, they would miss 3 28 Feb out of 4; if they put 1 Mar, they would include 29 Feb every 4 years. At least if using sel_day_and_month... If we envision to function to work also for the monhtly case, then start/end_day should be an optional input and having its default value at 1 would be straightforward with open interval end. If we keep the intervals closed on both ends, we should probably set them to None, have start be 1, and have end be 1 too with an offset of -1 so that when users don't enter days, they get full months. So it's more gymnastics (but not crazy) in that case. Then, as we try to build on top of xarray (well at least that is what I am trying to do), I feel we should try and opt xarray conventions, and thus keep closed intervals... That's not a very strong argument. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't understand. Why would they miss Feb 28? Xarray's choice to define slices as closed on both ends doesn't make sense to me. You can't partition a range of real values into bins using intervals that are closed on both ends. Indeed, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They would miss it if we keep on using sel_day_and_month Yes I noticed the seemingly strange behavior of groupby_bins with respect to intervals edges inclusivity. They just reproduced the pandas.cut behavior though. And maybe it is so because this is more about categorizing rather than slicing? Maybe we should not try, use and think through the groupby prism but through the resample one? |
||
# Create start_/end_edges pairing arrays | ||
dense_time = time_series[time_dim].resample({time_dim: "1D"}).asfreq() | ||
start_edges = sel_day_and_month(dense_time, start_day, start_month) | ||
end_edges = sel_day_and_month(dense_time, end_day, end_month, offset=day_offset) | ||
if start_edges[0] > end_edges[0] : | ||
start_edges = xr.concat([time_series[time_dim][0], start_edges], dim=time_dim) | ||
if end_edges[-1] < start_edges[-1] : | ||
end_edges = xr.concat([end_edges, time_series[time_dim][-1]], dim=time_dim) | ||
# dates mapped to the season start and end | ||
season_start = xr.DataArray(np.piecewise( | ||
time_series[time_dim], | ||
[((time_series[time_dim] >= start_edges[t]) & (time_series[time_dim] <= end_edges[t])).values | ||
for t in range(start_edges.size)], | ||
[start_edges[t] for t in range(start_edges.size)] | ||
+ [pd.to_datetime([np.nan])], | ||
), dims=[time_dim], coords={time_dim: time_series[time_dim]}) | ||
season_end = xr.DataArray(np.piecewise( | ||
time_series[time_dim], | ||
[((time_series[time_dim] >= start_edges[t]) & (time_series[time_dim] <= end_edges[t])).values | ||
for t in range(end_edges.size)], | ||
[end_edges[t] for t in range(end_edges.size)] | ||
+ [pd.to_datetime([np.nan])], | ||
), dims=[time_dim], coords={time_dim: time_series[time_dim]}) | ||
# Drop days out of seasons and assign coords | ||
aaron-kaplan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return (time_series.where(~np.isnat(season_start), drop=True) | ||
.assign_coords( | ||
season_start=(time_dim, season_start[~np.isnat(season_start)].data) | ||
) | ||
.assign_coords( | ||
season_end=(time_dim, season_end[~np.isnat(season_end)].data) | ||
) | ||
) | ||
|
||
# Seasonal Functions | ||
|
||
def seasonal_onset_date( | ||
|
Uh oh!
There was an error while loading. Please reload this page.