-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: period factorization #14348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: period factorization #14348
Conversation
Current coverage is 85.26% (diff: 100%)@@ master #14348 diff @@
==========================================
Files 140 140
Lines 50634 50634
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43173 43172 -1
- Misses 7461 7462 +1
Partials 0 0
|
@@ -311,10 +313,7 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None): | |||
uniques, labels = safe_sort(uniques, labels, na_sentinel=na_sentinel, | |||
assume_unique=True) | |||
|
|||
if is_datetimetz_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious from the diff, but this check was redundant because the datetimetz values will get re-boxed at
https://github.com/chris-b1/pandas/blob/cf081d9f0216fe7c069369f2673ffde2db669704/pandas/core/algorithms.py#L320
# GH 14338 | ||
goal_time = 0.2 | ||
|
||
def setup(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add similar benches for dti and dti w/tz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally inherit from a common class
if is_period_dtype(values): | ||
values = PeriodIndex(values) | ||
# period array interface goes to object so intercept | ||
vals = values.view(np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use values.asi8
values = DatetimeIndex(values) | ||
vals = values.asi8 | ||
|
||
if is_period_dtype(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment here about what is going on (helpful to future readers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this might be better to do something like this:
start around line 290
if needs_i8_conversion(values):
dtype = values.dtype
values = values.asi8
else:
dtype = None
values = np.asarray(values)
then the reverse conversion is
if dtype is not None:
uniques = uniques.astype(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did something like this, but it still ends up being fairly complex because ext dtypes need special casing, let me know if I'm missing a clearer way to do it.
thanks for taking care of this. It looks like this is targeted for 0.19.1; any rough idea when that will be released? |
855357b
to
a77ccc2
Compare
Looks good to me! @jreback |
@chris-b1 whoops, sorry, closed this and I don't see a reopen button .. |
Strange, I can't reopen either, will just make a new PR |
Possibly some glitches due to the repo transfer. Older PRs are also stuck on "Checking for ability to merge automatically…" (that was the reason I wanted to close/reopen this one to retrigger it) |
Merged in #14419
git diff upstream/master | flake8 --diff
asv
cc @sinhrks , @bmoscon