Skip to content

REF: Cleanups, typing, memoryviews in tslibs #23368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Oct 27, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@4f71755). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #23368   +/-   ##
=========================================
  Coverage          ?   92.22%           
=========================================
  Files             ?      161           
  Lines             ?    51191           
  Branches          ?        0           
=========================================
  Hits              ?    47210           
  Misses            ?     3981           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.6% <ø> (?)
#single 42.26% <ø> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...f57f994. Read the comment docs.

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Clean labels Oct 27, 2018
@gfyoung
Copy link
Member

gfyoung commented Oct 27, 2018

@jbrockmendel : For the less initiated, could you explain the rationale for your changes?

@jbrockmendel
Copy link
Member Author

Good thinking, gotta keep me on my toes.

The most common change is changing e.g. ndarray[int64_t] to int64_t[:]. This is the more modern suggested usage in cython, is more general, slightly more performant, and moved towards decoupling from a numpy back-end.

Adding @cython.boundscheck(False) and @cython.wraparound(False) improves performance slightly, is safe in cases where we know all array-lookups have valid indices.

Many of the remaining changes are using py3-style type annotations instead of cython type annotations (cython supports this even in py2). This moves us in the direction of being valid python, which hopefully one day we can lint.

@gfyoung
Copy link
Member

gfyoung commented Oct 27, 2018

@jbrockmendel : Given the amount of work you have done cleaning up the internals, I'm starting to think that it would be good to add something to the contribution docs about writing good Cython code (so that I won't be asking every time why you make the changes that you do 🙂 ).

What do you think?

@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
for i in range(n):
v = vals[i]
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
return result
return result.base # `.base` to access underlying np.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asarray

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

@@ -970,7 +980,13 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
# Pull the only index and adjust
a_idx = grp[:switch_idx]
b_idx = grp[switch_idx:]
dst_hours[grp] = np.hstack((result_a[a_idx], result_b[b_idx]))

# __setitem__ on dst_hours.base because indexing with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would really rather not use .base

else:
raise ValueError("Field {field} not supported".format(field=field))

return out.base # `.base` to access underlying np.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments as other PR

else:
raise ValueError("Field {field} not supported".format(field=field))

return out.base.view(bool) # `.base` to access underlying np.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that np.assarray(out) would be clearer than out.base and frequently-repeated-comment. The flip side is that np.asarray is a python call whereas .base is a C lookup. It shouldn't make a big difference, but for the pattern in _libs is usually perf-first, so it isn't obvious which to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in #23882

@jreback jreback added this to the 0.24.0 milestone Oct 28, 2018
@jbrockmendel
Copy link
Member Author

Reverted change of foo.base to np.assarray(foo). It risks incorrectly returning np.array(None) instead of raising.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments on .base

@@ -54,7 +54,7 @@ weekday_to_int = {int_to_weekday[key]: key for key in int_to_weekday}

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef inline int32_t get_days_in_month(int year, Py_ssize_t month) nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this not do anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the "inline" doesn't get carried across modules in this context. Since the function is never called from within ccalendar, its not accomplishing anything.

def ensure_datetime64ns(ndarray arr, copy=True):
@cython.boundscheck(False)
@cython.wraparound(False)
def ensure_datetime64ns(arr: ndarray, copy: bint = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no spaces on the args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to figure this out too. Are you sure about this? I've been following the usage here https://www.python.org/dev/peps/pep-3107/#syntax

@@ -121,7 +123,7 @@ def ensure_datetime64ns(ndarray arr, copy=True):
return result


def ensure_timedelta64ns(ndarray arr, copy=True):
def ensure_timedelta64ns(arr: ndarray, copy: bint = True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

int64_t *tdata
int64_t v, left, right, val, v_left, v_right
ndarray[int64_t] result, result_a, result_b, dst_hours
int64_t v, left, right, val, v_left, v_right, delta_idx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you avoid duplication here (int64_t[:]) is twice

@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
for i in range(n):
v = vals[i]
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
return result
return result.base # `.base` to access underlying np.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

@@ -1015,7 +1031,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
stamp = _render_tstamp(val)
raise pytz.NonExistentTimeError(stamp)

return result
return result.base # `.base` to access underlying np.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. I previously changed all of the foo.base to np.asarray(foo), but reverted that change after finding that risked incorrectly returning np.array(None) instead of raising if None gets passed to one of these functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the docs

https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

import numpy as np

def process_buffer(int[:,:] input_view not None,
                   int[:,:] output_view=None):

   if output_view is None:
       # Creating a default view, e.g.
       output_view = np.empty_like(input_view)

   # process 'input_view' into 'output_view'
   return output_view

you can use not None in the declaration to avoid this issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I was hoping you had forgotten about that. Part of the goal ATM is to move the cython code closer to being valid python (linting!), and not None moves that in the wrong direction. Will un-revert, and open an issue with Cython on implementing a py-friendly version of not None. Ditto for the other open cython PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I actually find it weird that cython code accepts None here at all by default.

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 2, 2018
@jreback
Copy link
Contributor

jreback commented Nov 2, 2018

needs a rebase

@jbrockmendel
Copy link
Member Author

Rebased, but please hold off pending another pass.

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 2, 2018
@jbrockmendel
Copy link
Member Author

The worthwhile parts of this are in #23464. Closing.

@jbrockmendel jbrockmendel deleted the libtsmore branch November 3, 2018 01:07
jreback pushed a commit that referenced this pull request Nov 3, 2018
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants