-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Standard signature for to_numpy #24341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -841,18 +841,22 @@ def array(self): | |
""" | ||
return self._values | ||
|
||
def to_numpy(self): | ||
def to_numpy(self, dtype=None, copy=False): | ||
""" | ||
A NumPy ndarray representing the values in this Series or Index. | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
The returned array will be the same up to equality (values equal | ||
in `self` will be equal in the returned array; likewise for values | ||
that are not equal). When `self` contains an ExtensionArray, the | ||
dtype may be different. For example, for a category-dtype Series, | ||
``to_numpy()`` will return a NumPy array and the categorical dtype | ||
will be lost. | ||
|
||
Parameters | ||
---------- | ||
dtype : str or numpy.dtype, optional | ||
The dtype to pass to :meth:`numpy.asarray` | ||
copy : bool, default False | ||
Whether to ensure that the returned value is a not a view on | ||
another array. Note that ``copy=False`` does not *ensure* that | ||
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that | ||
a copy is made, even if not strictly necessary. | ||
|
||
Returns | ||
------- | ||
|
@@ -866,10 +870,18 @@ def to_numpy(self): | |
|
||
Notes | ||
----- | ||
The returned array will be the same up to equality (values equal | ||
in `self` will be equal in the returned array; likewise for values | ||
that are not equal). When `self` contains an ExtensionArray, the | ||
dtype may be different. For example, for a category-dtype Series, | ||
``to_numpy()`` will return a NumPy array and the categorical dtype | ||
will be lost. | ||
|
||
|
||
For NumPy dtypes, this will be a reference to the actual data stored | ||
in this Series or Index. Modifying the result in place will modify | ||
the data stored in the Series or Index (not that we recommend doing | ||
that). | ||
in this Series or Index (assuming ``copy=False``). Modifying the result | ||
in place will modify the data stored in the Series or Index (not that | ||
we recommend doing that). | ||
|
||
For extension types, ``to_numpy()`` *may* require copying data and | ||
coercing the result to a NumPy type (possibly object), which may be | ||
|
@@ -894,12 +906,37 @@ def to_numpy(self): | |
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a'])) | ||
>>> ser.to_numpy() | ||
array(['a', 'b', 'a'], dtype=object) | ||
|
||
Specify the `dtype` to control how datetime-aware data is represented. | ||
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp` | ||
objects, each with the correct ``tz``. | ||
|
||
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET")) | ||
>>> ser.to_numpy(dtype=object) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the default behavior right now is the same as I also have a branch (https://github.com/TomAugspurger/pandas/pull/new/dt-array-3) that's deprecating the behavior for Series.array and Index.array returning datetime64[ns] for tz-aware values. That's currently blocked by #24024. |
||
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET', freq='D'), | ||
Timestamp('2000-01-02 00:00:00+0100', tz='CET', freq='D')], | ||
dtype=object) | ||
|
||
Or ``dtype='datetime64[ns]'`` to return an ndarray of native | ||
datetime64 values. The values are converted to UTC and the timezone | ||
info is dropped. | ||
|
||
>>> ser.to_numpy(dtype="datetime64[ns]") | ||
... # doctest: +ELLIPSIS | ||
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'], | ||
dtype='datetime64[ns]') | ||
""" | ||
if (is_extension_array_dtype(self.dtype) or | ||
is_datetime64tz_dtype(self.dtype)): | ||
# TODO(DatetimeArray): remove the second clause. | ||
return np.asarray(self._values) | ||
return self._values | ||
# TODO(GH-24345): Avoid potential double copy | ||
result = np.asarray(self._values, dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. technically this could cause a copy, right? so should we set copy=False if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct, we may be double copying. I'm not quite sure how to detect that. I'm not sure how to best handle this. One option I considered is adding a method to the EA interface if is_extension_array_dtype(self.dtype):
result, copied = self.array._to_numpy(dtype, copy)
if copy and not copied:
return result Do you think avoiding the double-copy is worth that complexity? Or perhaps there's a cleaner way I haven't thought of. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no ok for now, though maybe add a comment. note that @jbrockmendel basically does this in the in various parts of the datetimelike constructors. |
||
else: | ||
result = self._values | ||
|
||
if copy: | ||
result = result.copy() | ||
return result | ||
|
||
@property | ||
def _ndarray_values(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe on a followup should had some sub-section headers here as this is getting long