-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: fix a bunch of pyarrow duration xfails #50669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: fix a bunch of pyarrow duration xfails #50669
Conversation
@@ -653,6 +653,15 @@ def factorize( | |||
use_na_sentinel: bool = True, | |||
) -> tuple[np.ndarray, ExtensionArray]: | |||
null_encoding = "mask" if use_na_sentinel else "encode" | |||
|
|||
pa_type = self._data.type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to #50688 (comment), could you see if going through cast
is more performant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import pandas as pd
import pyarrow as pa
parr = pa.array(range(10000), type=pa.duration("s"))
arr = pd.core.arrays.ArrowExtensionArray(parr)
%timeit arr.factorize()
377 µs ± 17.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) # <- astype
335 µs ± 2.92 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) # <- cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ~10% more performant at a similar complexity cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is a blocker ill change on the affected PRs. i still like this pattern marginally more than the alternative, but mainly i want to get the slow xfails out of my workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer going through cast
(and generally keep ops in pyarrow-land as much as possible). If you could adjust this in #50688 that'd be good.
Thanks @jbrockmendel |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.