-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Standard signature for to_numpy #24341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is part 1 of pandas-dev#23995 We make the signature of `to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
Hello @TomAugspurger! Thanks for submitting the PR.
|
objects, each with the correct ``tz``. | ||
|
||
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET")) | ||
>>> ser.to_numpy(dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the default behavior right now is the same as dtype="datetime64[ns]"
, i.e. the timezone info is lost. I don't think that's what we want, but I'm waiting on #24024 to be done before making that change.
I also have a branch (https://github.com/TomAugspurger/pandas/pull/new/dt-array-3) that's deprecating the behavior for Series.array and Index.array returning datetime64[ns] for tz-aware values. That's currently blocked by #24024.
Codecov Report
@@ Coverage Diff @@
## master #24341 +/- ##
=========================================
Coverage ? 92.28%
=========================================
Files ? 162
Lines ? 51821
Branches ? 0
=========================================
Hits ? 47824
Misses ? 3997
Partials ? 0
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.24.0.rst
Outdated
@@ -73,6 +73,25 @@ as ``.values``). | |||
ser.array | |||
ser.to_numpy() | |||
|
|||
:meth:`~Series.to_numpy` gives some control over the ``dtype`` of the resulting :class:`ndarray`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather you put this in the docs themselves (basics?) and just keep this note shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
""" | ||
if (is_extension_array_dtype(self.dtype) or | ||
is_datetime64tz_dtype(self.dtype)): | ||
# TODO(DatetimeArray): remove the second clause. | ||
return np.asarray(self._values) | ||
return self._values | ||
result = np.asarray(self._values, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this could cause a copy, right? so should we set copy=False if dtype
is not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, we may be double copying. I'm not quite sure how to detect that. I'm not sure how to best handle this.
One option I considered is adding a method to the EA interface _to_numpy
that returns (ndarray, did_copy)
. Then this could be written as
if is_extension_array_dtype(self.dtype):
result, copied = self.array._to_numpy(dtype, copy)
if copy and not copied:
return result
Do you think avoiding the double-copy is worth that complexity? Or perhaps there's a cleaner way I haven't thought of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no ok for now, though maybe add a comment. note that @jbrockmendel basically does this in the in various parts of the datetimelike constructors.
Yep, that was my inspiration.
…On Tue, Dec 18, 2018 at 4:26 PM Jeff Reback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/base.py
<#24341 (comment)>:
> """
if (is_extension_array_dtype(self.dtype) or
is_datetime64tz_dtype(self.dtype)):
# TODO(DatetimeArray): remove the second clause.
- return np.asarray(self._values)
- return self._values
+ result = np.asarray(self._values, dtype=dtype)
no ok for now, though maybe add a comment. note that @jbrockmendel
<https://github.com/jbrockmendel> basically does this in the in various
parts of the datetimelike constructors.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24341 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIrKu8GRAWPkGAr4QfXKr4km-yrkDks5u6WuugaJpZM4ZY-Ou>
.
|
@@ -86,6 +86,27 @@ be the same as :attr:`~Series.array`. When the Series or Index is backed by | |||
a :class:`~pandas.api.extension.ExtensionArray`, :meth:`~Series.to_numpy` | |||
may involve copying data and coercing values. | |||
|
|||
:meth:`~Series.to_numpy` gives some control over the ``dtype`` of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe on a followup should had some sub-section headers here as this is getting long
thanks! |
This is part 1 of #23995
We make the signature
to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray