Skip to content

Use ea interface to calculate accumulator functions for datetimelike #50297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jan 13, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 16, 2022

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

cc @jbrockmendel Follow up from the other pr

As mentioned on the other pr, this changes the behavior (highlighted by the test).

@phofl phofl added Datetime Datetime data dtype Timedelta Timedelta data type labels Dec 16, 2022
"""
try:
fill_value = {
np.cumprod: 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is np.cumprod relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I guess we can remove it

return result


def cumsum(values: np.ndarray, *, skipna: bool = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> np.ndarray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good point, this one works. The others aren't that easy...


raise TypeError(f"Accumulation {name} not supported for {type(self)}")
return type(self)._simple_new(
result, freq=self.freq, dtype=self.dtype # type: ignore[call-arg]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think retaining self.freq is going to be correct in general

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate when we should set freq to None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much anything that isn't a no-op should be not-freq-preserving. e.g. pd.date_range("2016-01-01", periods=3)._data.cummin()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, makes sense. Should we retain the freq in cases where we only have one element or not retain in general?

Not easy to hit btw :) But added a test now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id just not-retain in general

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, that's how I implemented it right now

("cummax", pd.Period("2012-1-2", freq="D")),
],
)
def test_cummin_cummax_period(self, func, exp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking this


Parameters
----------
func : np.cumsum, np.cumprod, np.maximum.accumulate, np.minimum.accumulate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func : np.cumsum, np.cumprod, np.maximum.accumulate, np.minimum.accumulate
func : np.cumsum, np.maximum.accumulate, np.minimum.accumulate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx removed

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM merge when ready @phofl or @jbrockmendel

)

elif skipna and not issubclass(values.dtype.type, (np.integer, np.bool_)):
if skipna and not issubclass(values.dtype.type, (np.integer, np.bool_)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment/check that "mM" cases should not get here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(OK for follow-up, i can add this into my next "assorted" branch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert

@@ -0,0 +1,46 @@
import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this test_cumulative.py to match tests/series/ and tests/frame/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mroeschke mroeschke added this to the 2.0 milestone Jan 13, 2023
@mroeschke mroeschke merged commit a38a24e into pandas-dev:main Jan 13, 2023
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the accumulate_datetime branch January 14, 2023 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants