Skip to content

BUG: loffset not applied when using resample with agg() (GH13218) #14213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
@@ -321,3 +321,4 @@ Bug Fixes
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`)
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)
- Bug in ``resample``, where a non-string ```loffset`` argument would not be applied when resampling a timeseries (:issue:`13218`)
22 changes: 20 additions & 2 deletions pandas/tseries/resample.py
Original file line number Diff line number Diff line change
@@ -323,6 +323,11 @@ def aggregate(self, arg, *args, **kwargs):
*args,
**kwargs)

# if arg was a string, _aggregate called resampler's _downsample or
# _groupby_and_agg methods, which would've already applied the loffset
if not isinstance(arg, compat.string_types):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear why this is happening here, and not in .downsample (and .upsample) instead. checking the arg type is not very robust (and does't follow a good pattern).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not happening in there because .downsample is not called when anything other than a string is passed into agg.

Resampler has downsample & groupby_and_agg methods (e.g. count, mean) which in return call _downsample, and when a string is passed into agg, these methods are called. However, if a string isn't passed in, these methods would not have been called, and consequently _downsample would not have been called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as for the implementation, for something as small as loffset, is it really worth refactoring a lot of code to change the way it is applied?

I've gone through the stack trying to find another place to apply it, but this is by far the clearest & simplest way I've found.

result = self._apply_loffset(result)

return result

agg = aggregate
@@ -381,7 +386,7 @@ def _gotitem(self, key, ndim, subset=None):
return grouped

def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs):
""" revaluate the obj with a groupby aggregation """
""" re-evaluate the obj with a groupby aggregation """

if grouper is None:
self._set_binner()
@@ -409,7 +414,14 @@ def _groupby_and_aggregate(self, how, grouper=None, *args, **kwargs):
return self._wrap_result(result)

def _apply_loffset(self, result):
"""if loffset if set, offset the result index"""
"""
if loffset is set, offset the result index

Parameters
----------
result : Series or DataFrame
the result of resample
"""
loffset = self.loffset
if isinstance(loffset, compat.string_types):
loffset = to_offset(self.loffset)
@@ -419,6 +431,7 @@ def _apply_loffset(self, result):
isinstance(result.index, DatetimeIndex) and
len(result.index) > 0
)

if needs_offset:
result.index = result.index + loffset

@@ -797,6 +810,11 @@ def aggregate(self, arg, *args, **kwargs):
if result is None:
result = self._downsample(arg, *args, **kwargs)

# if arg was a string, _aggregate called resamplers' _downsample or
# _groupby_and_agg methods, which would've already applied the loffset
if not isinstance(arg, compat.string_types):
result = self._apply_loffset(result)

return result

agg = aggregate
32 changes: 31 additions & 1 deletion pandas/tseries/tests/test_resample.py
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@
from pandas.tseries.period import period_range, PeriodIndex, Period
from pandas.tseries.resample import (DatetimeIndex, TimeGrouper,
DatetimeIndexResampler)
from pandas.tseries.tdi import timedelta_range
from pandas.tseries.tdi import timedelta_range, TimedeltaIndex
from pandas.util.testing import (assert_series_equal, assert_almost_equal,
assert_frame_equal, assert_index_equal)
from pandas._period import IncompatibleFrequency
@@ -769,6 +769,36 @@ def test_resample_empty_dtypes(self):
# (ex: doing mean with dtype of np.object)
pass

def test_resample_loffset_arg_type(self):
# GH 13218, 15002
df = self.create_series().to_frame('value')
expected_means = [df.values[i:i + 2].mean()
for i in range(0, len(df.values), 2)]
expected_index = self.create_index(df.index[0],
periods=len(df.index) / 2,
freq='2D')
# loffset coreces PeriodIndex to DateTimeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that seems odd that a PI coerces when loffset is applied. is this happen in previous versions of pandas?

Copy link
Contributor Author

@wcwagner wcwagner Dec 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I'm pretty sure in v0.18 the same thing happened. Is this intended?

In [19]: df = pd.DataFrame(data=list(range(10)), index=pd.period_range("1/1/2000", freq="D", periods=10))

In [20]: df.resample("2D").mean().index
Out[20]:  DatetimeIndex(['2000-01-01', '2000-01-03', '2000-01-05', '2000-01-07',
               '2000-01-09'],
              dtype='datetime64[ns]', freq='2D')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I added this to the master issue: #12871, so that when PI is more compat (IOW it returns PI when PI is input, though not always possible) for resampling this will be tested.

if isinstance(expected_index, PeriodIndex):
expected_index = expected_index.to_timestamp()
expected_index += timedelta(hours=2)
expected = DataFrame({'value': expected_means}, index=expected_index)
for arg in ['mean', {'value': 'mean'}, ['mean']]:
result_agg = df.resample('2D', loffset='2H').agg(arg)
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
result_how = df.resample('2D', how=arg, loffset='2H')
if isinstance(arg, list):
expected.columns = pd.MultiIndex.from_tuples([('value',
'mean')])
# GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex
if isinstance(expected.index, TimedeltaIndex):
with tm.assertRaises(AssertionError):
assert_frame_equal(result_agg, expected)
assert_frame_equal(result_how, expected)
else:
assert_frame_equal(result_agg, expected)
assert_frame_equal(result_how, expected)


class TestDatetimeIndex(Base, tm.TestCase):
_multiprocess_can_split_ = True