-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: loffset not applied when using resample with agg() (GH13218) #14213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.78% (diff: 100%)@@ master #14213 diff @@
==========================================
Files 145 145
Lines 51090 51094 +4
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43315 43321 +6
+ Misses 7775 7773 -2
Partials 0 0
|
@@ -323,6 +323,11 @@ def aggregate(self, arg, *args, **kwargs): | |||
*args, | |||
**kwargs) | |||
|
|||
# if arg was a string, _aggregate called resampler's _downsample or | |||
# _groupby_and_agg methods, which would've already applied the loffset | |||
if not isinstance(arg, compat.string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why this is happening here, and not in .downsample
(and .upsample
) instead. checking the arg type is not very robust (and does't follow a good pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not happening in there because .downsample
is not called when anything other than a string is passed into agg
.
Resampler
has downsample & groupby_and_agg methods (e.g. count, mean) which in return call _downsample
, and when a string is passed into agg
, these methods are called. However, if a string isn't passed in, these methods would not have been called, and consequently _downsample
would not have been called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as for the implementation, for something as small as loffset
, is it really worth refactoring a lot of code to change the way it is applied?
I've gone through the stack trying to find another place to apply it, but this is by far the clearest & simplest way I've found.
@wcwagner if you want to update to 0.19.2 and rebase, we can take another look. |
can you move release note to 0.20.0 |
Sorry for the delay, I will update everything tomorrow. |
pls add example referenced in #15002 as well (your changes should cover it) |
yep |
Seems like resample's loffset tests were removed from |
@@ -1188,6 +1188,29 @@ def test_resample_loffset_count(self): | |||
|
|||
assert_series_equal(result, expected) | |||
|
|||
def test_resample_loffset_arg_type(self): | |||
# GH 13218, 15002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better if you can put this in Base
, that way code is not duplicated. Of better to at least partially do it (e.g. if you have a specific part that can be put there, then add another test in the sub-classes to cover the rest).
expected_index = self.create_index(df.index[0], | ||
periods=len(df.index) / 2, | ||
freq='2D') | ||
# loffset coreces PeriodIndex to DateTimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that seems odd that a PI coerces when loffset is applied. is this happen in previous versions of pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I'm pretty sure in v0.18 the same thing happened. Is this intended?
In [19]: df = pd.DataFrame(data=list(range(10)), index=pd.period_range("1/1/2000", freq="D", periods=10))
In [20]: df.resample("2D").mean().index
Out[20]: DatetimeIndex(['2000-01-01', '2000-01-03', '2000-01-05', '2000-01-07',
'2000-01-09'],
dtype='datetime64[ns]', freq='2D')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I added this to the master issue: #12871, so that when PI is more compat (IOW it returns PI when PI is input, though not always possible) for resampling this will be tested.
git diff upstream/master | flake8 --diff
_apply_loffset is called in resampler's downsample and groupby_and_agg methods, defined here and here
When a
str
is passed intoaggregate
, the above methods are called here, and consequently the loffset is applied.However, if anything other than a
str
is passed intoaggregate
, then a different path will be taken and the loffset must be applied. So a simple check inaggregate
will apply the loffset correctly.note: my last PR was closed, and I can't reopen it by a force push, hence the new one