Skip to content

API: Standard signature for to_numpy #24341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 19, 2018

Conversation

TomAugspurger
Copy link
Contributor

This is part 1 of #23995

We make the signature

to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray

This is part 1 of pandas-dev#23995

We make the signature of

`to_numpy(dtype : Union[str, np.dtype], copy : bool) -> ndarray`
@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 18, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 18, 2018
@pep8speaks
Copy link

Hello @TomAugspurger! Thanks for submitting the PR.

objects, each with the correct ``tz``.

>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the default behavior right now is the same as dtype="datetime64[ns]", i.e. the timezone info is lost. I don't think that's what we want, but I'm waiting on #24024 to be done before making that change.

I also have a branch (https://github.com/TomAugspurger/pandas/pull/new/dt-array-3) that's deprecating the behavior for Series.array and Index.array returning datetime64[ns] for tz-aware values. That's currently blocked by #24024.

@codecov
Copy link

codecov bot commented Dec 18, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@f57c357). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #24341   +/-   ##
=========================================
  Coverage          ?   92.28%           
=========================================
  Files             ?      162           
  Lines             ?    51821           
  Branches          ?        0           
=========================================
  Hits              ?    47824           
  Misses            ?     3997           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.69% <100%> (?)
#single 42.98% <22.22%> (?)
Impacted Files Coverage Δ
pandas/core/base.py 97.66% <100%> (ø)
pandas/core/frame.py 96.91% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f57c357...a61b1ad. Read the comment docs.

@@ -73,6 +73,25 @@ as ``.values``).
ser.array
ser.to_numpy()

:meth:`~Series.to_numpy` gives some control over the ``dtype`` of the resulting :class:`ndarray`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather you put this in the docs themselves (basics?) and just keep this note shorter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

"""
if (is_extension_array_dtype(self.dtype) or
is_datetime64tz_dtype(self.dtype)):
# TODO(DatetimeArray): remove the second clause.
return np.asarray(self._values)
return self._values
result = np.asarray(self._values, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this could cause a copy, right? so should we set copy=False if dtype is not None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we may be double copying. I'm not quite sure how to detect that. I'm not sure how to best handle this.

One option I considered is adding a method to the EA interface _to_numpy that returns (ndarray, did_copy). Then this could be written as

if is_extension_array_dtype(self.dtype):
    result, copied = self.array._to_numpy(dtype, copy)

if copy and not copied:
    return result

Do you think avoiding the double-copy is worth that complexity? Or perhaps there's a cleaner way I haven't thought of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no ok for now, though maybe add a comment. note that @jbrockmendel basically does this in the in various parts of the datetimelike constructors.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 18, 2018 via email

@@ -86,6 +86,27 @@ be the same as :attr:`~Series.array`. When the Series or Index is backed by
a :class:`~pandas.api.extension.ExtensionArray`, :meth:`~Series.to_numpy`
may involve copying data and coercing values.

:meth:`~Series.to_numpy` gives some control over the ``dtype`` of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe on a followup should had some sub-section headers here as this is getting long

@jreback jreback merged commit a89136f into pandas-dev:master Dec 19, 2018
@jreback
Copy link
Contributor

jreback commented Dec 19, 2018

thanks!

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 20, 2018
@TomAugspurger TomAugspurger deleted the to_numpy-signature branch January 2, 2019 20:17
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants