add support for DatetimeIndexer #45

jreback · 2020-06-10T20:37:34Z

from pandas.api.indexers import BaseIndexer

def calculate_variable_window_bounds(left, right, index):

    num_values = len(index)
    assert len(left) == len(right) == len(index)
    
    start = np.empty(num_values, dtype='int64')
    start.fill(-1)
    end = np.empty(num_values, dtype='int64')
    end.fill(-1)

    # initial conditions
    if index[0] > left[0]:
        start[0] = 0
    if index[0] <= right[0]:
        end[0] = 1

    #import pdb; pdb.set_trace()
    for i in range(1, num_values):

        value = index[i]
        
        # advance the start bound until we are
        # within the constraint
        start[i] = start[i - 1]
        for j in range(start[i - 1], i):
            # if we are no longer in the right bounds
            if value > right[j]:
                start[i] = j
                #break
            elif value < left[j]:
                start[i] = j
            else:
                break

        # end bound is previous end
        # or current index
        if (index[end[i - 1]] - right[i]) <= 0:
            end[i] = i + 1
        else:
            end[i] = end[i - 1]
        
    print(start, end)
    return start, end

class DatetimeIndexer(BaseIndexer):
    def get_window_bounds(self, num_values, min_periods, center, closed):
        # starts, ends, points are all DTI
        starts = np.asarray(self.starts.view('i8'))
        ends = np.asarray(self.ends.view('i8'))
        points = np.asarray(self.points.view('i8'))
        return calculate_variable_window_bounds(starts, ends, points)

Input frame

tweets_str = """
             ticker,datetime,sentiment
             GOOG,2020-05-27 15:00,0.6
             GOOG,2020-05-28 11:00,0.5
             IBM,2020-05-28 12:00,-0.1
             GOOG,2020-05-28 13:00,0.2
             GOOG,2020-05-28 20:00,0.3
             GOOG,2020-05-29 07:00,-0.1
             IBM,2020-05-29 09:00,-0.3
             IBM,2020-05-29 12:00,-0.4
             GOOG,2020-05-30 07:00,-0.2
             GOOG,2020-05-30 08:00,-0.5
             GOOG,2020-05-30 10:00,0.1
             GOOG,2020-05-30 14:00,0.3
             GOOG,2020-05-31 07:00,-0.1
             GOOG,2020-06-01 08:00,0.2
             GOOG,2020-06-01 10:00,0.4
             """
tweets = pd.read_csv(StringIO(dedent(tweets_str)), parse_dates=["datetime"])

Call it like this

bd = 1 * pd.tseries.offsets.BusinessDay()
starts = tweets.datetime -1 * bd
ends = tweets.datetime -0 * bd
tweets.rolling(window=DatetimeIndexer(starts=starts, ends=ends, points=tweets.datetime)).sentiment.mean()

The text was updated successfully, but these errors were encountered:

jreback · 2020-06-10T21:16:29Z

https://github.com/pandas-dev/pandas/blob/master/pandas/core/window/rolling.py#L482

jreback · 2020-06-10T21:17:49Z

tweets.rolling(window=DatetimeIndexer(starts=starts, ends=ends), on ='datetime')sentiment.mean()
tweets.set_index('datetime').rolling(window=DatetimeIndexer(starts=starts, ends=ends)).sentiment.mean()

mroeschke · 2020-06-10T23:39:30Z

Looks like I did something similar in #24

I can work on documenting & testing a similar BusinessDayIndexer in the docs
When the user passes a BaseIndexer object as window and they don't specify index, I think it's reasonable to populate it with the index of the DataFrame/Series or whatever on is.

jreback · 2020-06-10T23:55:02Z

sounds good @mroeschke

mroeschke · 2020-06-23T06:10:13Z

Here's a demo of creating a custom indexer that can work on non-fixed offsets: pandas-dev#34947

In [1]: from pandas.core.window.indexers import BusinessOffsetIndexer

In [2]: df = pd.DataFrame(range(10), index=pd.date_range('2020', periods=10))

In [3]: offset = pd.offsets.BDay(1)

In [4]: indexer = BusinessOffsetIndexer(index=df.index, offset=offset)

In [5]: df.rolling(indexer).sum()
Out[5]:
               0
2020-01-01   0.0
2020-01-02   1.0
2020-01-03   2.0
2020-01-04   3.0
2020-01-05   7.0
2020-01-06  12.0
2020-01-07   6.0
2020-01-08   7.0
2020-01-09   8.0
2020-01-10   9.0

In [6]: df
Out[6]:
            0
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4
2020-01-06  5
2020-01-07  6
2020-01-08  7
2020-01-09  8
2020-01-10  9

In [7]: df.index.day_name()
Out[7]:
Index(['Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Monday',
       'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
      dtype='object')

In [8]: df.rolling(indexer, closed='left').sum()
Out[8]:
              0
2020-01-01  0.0
2020-01-02  0.0
2020-01-03  1.0
2020-01-04  2.0
2020-01-05  5.0
2020-01-06  9.0
2020-01-07  5.0
2020-01-08  6.0
2020-01-09  7.0
2020-01-10  8.0

mroeschke · 2020-06-25T17:19:19Z

Demo'd by pandas-dev#34947

DiegoAlbertoTorres mentioned this issue Jun 17, 2020

TRACKER: milestones #44

Open

32 tasks

mroeschke closed this as completed Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for DatetimeIndexer #45

add support for DatetimeIndexer #45

jreback commented Jun 10, 2020

jreback commented Jun 10, 2020

jreback commented Jun 10, 2020 •

edited

Loading

mroeschke commented Jun 10, 2020

jreback commented Jun 10, 2020

mroeschke commented Jun 23, 2020

mroeschke commented Jun 25, 2020

add support for DatetimeIndexer #45

add support for DatetimeIndexer #45

Comments

jreback commented Jun 10, 2020

jreback commented Jun 10, 2020

jreback commented Jun 10, 2020 • edited Loading

mroeschke commented Jun 10, 2020

jreback commented Jun 10, 2020

mroeschke commented Jun 23, 2020

mroeschke commented Jun 25, 2020

jreback commented Jun 10, 2020 •

edited

Loading