-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: loffset not applied when using resample with agg() (GH13218) #14213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ | |
from pandas.tseries.period import period_range, PeriodIndex, Period | ||
from pandas.tseries.resample import (DatetimeIndex, TimeGrouper, | ||
DatetimeIndexResampler) | ||
from pandas.tseries.tdi import timedelta_range | ||
from pandas.tseries.tdi import timedelta_range, TimedeltaIndex | ||
from pandas.util.testing import (assert_series_equal, assert_almost_equal, | ||
assert_frame_equal, assert_index_equal) | ||
from pandas._period import IncompatibleFrequency | ||
|
@@ -769,6 +769,36 @@ def test_resample_empty_dtypes(self): | |
# (ex: doing mean with dtype of np.object) | ||
pass | ||
|
||
def test_resample_loffset_arg_type(self): | ||
# GH 13218, 15002 | ||
df = self.create_series().to_frame('value') | ||
expected_means = [df.values[i:i + 2].mean() | ||
for i in range(0, len(df.values), 2)] | ||
expected_index = self.create_index(df.index[0], | ||
periods=len(df.index) / 2, | ||
freq='2D') | ||
# loffset coreces PeriodIndex to DateTimeIndex | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm that seems odd that a PI coerces when loffset is applied. is this happen in previous versions of pandas? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback I'm pretty sure in v0.18 the same thing happened. Is this intended?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so I added this to the master issue: #12871, so that when PI is more compat (IOW it returns PI when PI is input, though not always possible) for resampling this will be tested. |
||
if isinstance(expected_index, PeriodIndex): | ||
expected_index = expected_index.to_timestamp() | ||
expected_index += timedelta(hours=2) | ||
expected = DataFrame({'value': expected_means}, index=expected_index) | ||
for arg in ['mean', {'value': 'mean'}, ['mean']]: | ||
result_agg = df.resample('2D', loffset='2H').agg(arg) | ||
with tm.assert_produces_warning(FutureWarning, | ||
check_stacklevel=False): | ||
result_how = df.resample('2D', how=arg, loffset='2H') | ||
if isinstance(arg, list): | ||
expected.columns = pd.MultiIndex.from_tuples([('value', | ||
'mean')]) | ||
# GH 13022, 7687 - TODO: fix resample w/ TimedeltaIndex | ||
if isinstance(expected.index, TimedeltaIndex): | ||
with tm.assertRaises(AssertionError): | ||
assert_frame_equal(result_agg, expected) | ||
assert_frame_equal(result_how, expected) | ||
else: | ||
assert_frame_equal(result_agg, expected) | ||
assert_frame_equal(result_how, expected) | ||
|
||
|
||
class TestDatetimeIndex(Base, tm.TestCase): | ||
_multiprocess_can_split_ = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why this is happening here, and not in
.downsample
(and.upsample
) instead. checking the arg type is not very robust (and does't follow a good pattern).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not happening in there because
.downsample
is not called when anything other than a string is passed intoagg
.Resampler
has downsample & groupby_and_agg methods (e.g. count, mean) which in return call_downsample
, and when a string is passed intoagg
, these methods are called. However, if a string isn't passed in, these methods would not have been called, and consequently_downsample
would not have been called.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as for the implementation, for something as small as
loffset
, is it really worth refactoring a lot of code to change the way it is applied?I've gone through the stack trying to find another place to apply it, but this is by far the clearest & simplest way I've found.