-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: DataFrame.iloc[int] for EA dtypes #54508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cls = dtype.construct_array_type() | ||
result = cls._empty((n,), dtype=dtype) | ||
if isinstance(dtype, ExtensionDtype): | ||
result = np.empty(n, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i expect this will be bad for e.g. DatetimeTZDtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried it for DatetimeTZDtype("ns", "UTC")
and it seems to be ok - and more performant in that case as well.
I think it's ok since each element is pulled out individually which ensures wrapping in Timestamp
.
Is the issue that the relevant |
Yes. In the case pyarrow, really non-performant to iteratively set each element. |
Thanks @lukemanley |
…dtypes) (#54700) Backport PR #54508: PERF: DataFrame.iloc[int] for EA dtypes Co-authored-by: Luke Manley <[email protected]>
I'm not wild about this. Seems to be papering over a hacky |
Fair enough. The |
Maybe a TODO note pointing back at the relevant part of this thread? |
doc/source/whatsnew/v2.1.0.rst
file if fixing a bug or adding a new feature.Perf improvement in
DataFrame.iloc
when input is an integer and the dataframe is EA-backed. Most visible on wide frames.Also visible with
DataFrame
reductions of EA dtypes: