-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: Cleanups, typing, memoryviews in tslibs #23368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #23368 +/- ##
=========================================
Coverage ? 92.22%
=========================================
Files ? 161
Lines ? 51191
Branches ? 0
=========================================
Hits ? 47210
Misses ? 3981
Partials ? 0
Continue to review full report at Codecov.
|
@jbrockmendel : For the less initiated, could you explain the rationale for your changes? |
Good thinking, gotta keep me on my toes. The most common change is changing e.g. Adding Many of the remaining changes are using py3-style type annotations instead of cython type annotations (cython supports this even in py2). This moves us in the direction of being valid python, which hopefully one day we can lint. |
@jbrockmendel : Given the amount of work you have done cleaning up the internals, I'm starting to think that it would be good to add something to the contribution docs about writing good What do you think? |
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |||
for i in range(n): | |||
v = vals[i] | |||
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True) | |||
return result | |||
return result.base # `.base` to access underlying np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
@@ -970,7 +980,13 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |||
# Pull the only index and adjust | |||
a_idx = grp[:switch_idx] | |||
b_idx = grp[switch_idx:] | |||
dst_hours[grp] = np.hstack((result_a[a_idx], result_b[b_idx])) | |||
|
|||
# __setitem__ on dst_hours.base because indexing with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would really rather not use .base
pandas/_libs/tslibs/fields.pyx
Outdated
else: | ||
raise ValueError("Field {field} not supported".format(field=field)) | ||
|
||
return out.base # `.base` to access underlying np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments as other PR
pandas/_libs/tslibs/fields.pyx
Outdated
else: | ||
raise ValueError("Field {field} not supported".format(field=field)) | ||
|
||
return out.base.view(bool) # `.base` to access underlying np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that np.assarray(out)
would be clearer than out.base
and frequently-repeated-comment. The flip side is that np.asarray is a python call whereas .base
is a C lookup. It shouldn't make a big difference, but for the pattern in _libs is usually perf-first, so it isn't obvious which to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in #23882
8027e2f
to
498cd64
Compare
Reverted change of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments on .base
@@ -54,7 +54,7 @@ weekday_to_int = {int_to_weekday[key]: key for key in int_to_weekday} | |||
|
|||
@cython.wraparound(False) | |||
@cython.boundscheck(False) | |||
cpdef inline int32_t get_days_in_month(int year, Py_ssize_t month) nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this not do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the "inline" doesn't get carried across modules in this context. Since the function is never called from within ccalendar, its not accomplishing anything.
pandas/_libs/tslibs/conversion.pyx
Outdated
def ensure_datetime64ns(ndarray arr, copy=True): | ||
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
def ensure_datetime64ns(arr: ndarray, copy: bint = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no spaces on the args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to figure this out too. Are you sure about this? I've been following the usage here https://www.python.org/dev/peps/pep-3107/#syntax
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -121,7 +123,7 @@ def ensure_datetime64ns(ndarray arr, copy=True): | |||
return result | |||
|
|||
|
|||
def ensure_timedelta64ns(ndarray arr, copy=True): | |||
def ensure_timedelta64ns(arr: ndarray, copy: bint = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/_libs/tslibs/conversion.pyx
Outdated
int64_t *tdata | ||
int64_t v, left, right, val, v_left, v_right | ||
ndarray[int64_t] result, result_a, result_b, dst_hours | ||
int64_t v, left, right, val, v_left, v_right, delta_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you avoid duplication here (int64_t[:]) is twice
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |||
for i in range(n): | |||
v = vals[i] | |||
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True) | |||
return result | |||
return result.base # `.base` to access underlying np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -1015,7 +1031,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |||
stamp = _render_tstamp(val) | |||
raise pytz.NonExistentTimeError(stamp) | |||
|
|||
return result | |||
return result.base # `.base` to access underlying np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. I previously changed all of the foo.base
to np.asarray(foo)
, but reverted that change after finding that risked incorrectly returning np.array(None)
instead of raising if None
gets passed to one of these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the docs
https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
import numpy as np
def process_buffer(int[:,:] input_view not None,
int[:,:] output_view=None):
if output_view is None:
# Creating a default view, e.g.
output_view = np.empty_like(input_view)
# process 'input_view' into 'output_view'
return output_view
you can use not None
in the declaration to avoid this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I was hoping you had forgotten about that. Part of the goal ATM is to move the cython code closer to being valid python (linting!), and not None
moves that in the wrong direction. Will un-revert, and open an issue with Cython on implementing a py-friendly version of not None
. Ditto for the other open cython PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I actually find it weird that cython code accepts None here at all by default.
needs a rebase |
Rebased, but please hold off pending another pass. |
The worthwhile parts of this are in #23464. Closing. |
* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368
* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368
* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368
* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368
git diff upstream/master -u -- "*.py" | flake8 --diff