REF: Cleanups, typing, memoryviews in tslibs #23368

jbrockmendel · 2018-10-26T20:31:01Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-10-27T02:26:41Z

Codecov Report

❗ No coverage uploaded for pull request base (master@4f71755). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #23368   +/-   ##
=========================================
  Coverage          ?   92.22%           
=========================================
  Files             ?      161           
  Lines             ?    51191           
  Branches          ?        0           
=========================================
  Hits              ?    47210           
  Misses            ?     3981           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.6% <ø> (?)`
#single	`42.26% <ø> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...f57f994. Read the comment docs.

gfyoung · 2018-10-27T06:52:41Z

@jbrockmendel : For the less initiated, could you explain the rationale for your changes?

jbrockmendel · 2018-10-27T14:34:14Z

Good thinking, gotta keep me on my toes.

The most common change is changing e.g. ndarray[int64_t] to int64_t[:]. This is the more modern suggested usage in cython, is more general, slightly more performant, and moved towards decoupling from a numpy back-end.

Adding @cython.boundscheck(False) and @cython.wraparound(False) improves performance slightly, is safe in cases where we know all array-lookups have valid indices.

Many of the remaining changes are using py3-style type annotations instead of cython type annotations (cython supports this even in py2). This moves us in the direction of being valid python, which hopefully one day we can lint.

pandas/_libs/tslibs/period.pyx

gfyoung · 2018-10-27T22:28:56Z

@jbrockmendel : Given the amount of work you have done cleaning up the internals, I'm starting to think that it would be good to add something to the contribution docs about writing good Cython code (so that I won't be asking every time why you make the changes that you do 🙂 ).

What do you think?

jreback · 2018-10-28T02:31:40Z

pandas/_libs/tslibs/conversion.pyx

@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
        for i in range(n):
            v = vals[i]
            result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
-        return result
+        return result.base  # `.base` to access underlying np.array


jreback · 2018-10-28T02:32:01Z

pandas/_libs/tslibs/conversion.pyx

@@ -970,7 +980,13 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
                # Pull the only index and adjust
                a_idx = grp[:switch_idx]
                b_idx = grp[switch_idx:]
-                dst_hours[grp] = np.hstack((result_a[a_idx], result_b[b_idx]))
+
+                # __setitem__ on dst_hours.base because indexing with


i would really rather not use .base

jreback · 2018-10-28T02:32:14Z

pandas/_libs/tslibs/fields.pyx

+    else:
+        raise ValueError("Field {field} not supported".format(field=field))
+
+    return out.base  # `.base` to access underlying np.array


same comments as other PR

jreback · 2018-10-28T02:32:27Z

pandas/_libs/tslibs/fields.pyx

+    else:
+        raise ValueError("Field {field} not supported".format(field=field))
+
+    return out.base.view(bool)  # `.base` to access underlying np.array


I agree that np.assarray(out) would be clearer than out.base and frequently-repeated-comment. The flip side is that np.asarray is a python call whereas .base is a C lookup. It shouldn't make a big difference, but for the pattern in _libs is usually perf-first, so it isn't obvious which to use.

same comment as in #23882

…btsmore

jbrockmendel · 2018-10-30T16:53:34Z

Reverted change of foo.base to np.assarray(foo). It risks incorrectly returning np.array(None) instead of raising.

jreback

same comments on .base

jreback · 2018-10-31T12:47:48Z

pandas/_libs/tslibs/ccalendar.pyx

@@ -54,7 +54,7 @@ weekday_to_int = {int_to_weekday[key]: key for key in int_to_weekday}

 @cython.wraparound(False)
 @cython.boundscheck(False)
-cpdef inline int32_t get_days_in_month(int year, Py_ssize_t month) nogil:


does this not do anything?

AFAICT the "inline" doesn't get carried across modules in this context. Since the function is never called from within ccalendar, its not accomplishing anything.

jreback · 2018-10-31T12:48:06Z

pandas/_libs/tslibs/conversion.pyx

-def ensure_datetime64ns(ndarray arr, copy=True):
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def ensure_datetime64ns(arr: ndarray, copy: bint = True):


no spaces on the args

I've been trying to figure this out too. Are you sure about this? I've been following the usage here https://www.python.org/dev/peps/pep-3107/#syntax

jreback · 2018-10-31T12:48:11Z

pandas/_libs/tslibs/conversion.pyx

@@ -121,7 +123,7 @@ def ensure_datetime64ns(ndarray arr, copy=True):
    return result


-def ensure_timedelta64ns(ndarray arr, copy=True):
+def ensure_timedelta64ns(arr: ndarray, copy: bint = True):


jreback · 2018-10-31T12:48:47Z

pandas/_libs/tslibs/conversion.pyx

        int64_t *tdata
-        int64_t v, left, right, val, v_left, v_right
-        ndarray[int64_t] result, result_a, result_b, dst_hours
+        int64_t v, left, right, val, v_left, v_right, delta_idx


can you avoid duplication here (int64_t[:]) is twice

jreback · 2018-10-31T12:48:56Z

pandas/_libs/tslibs/conversion.pyx

@@ -867,7 +877,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
        for i in range(n):
            v = vals[i]
            result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
-        return result
+        return result.base  # `.base` to access underlying np.array


jreback · 2018-10-31T12:49:22Z

pandas/_libs/tslibs/conversion.pyx

@@ -1015,7 +1031,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
                stamp = _render_tstamp(val)
                raise pytz.NonExistentTimeError(stamp)

-    return result
+    return result.base  # `.base` to access underlying np.array


See comment above. I previously changed all of the foo.base to np.asarray(foo), but reverted that change after finding that risked incorrectly returning np.array(None) instead of raising if None gets passed to one of these functions.

from the docs

https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html

import numpy as np def process_buffer(int[:,:] input_view not None, int[:,:] output_view=None): if output_view is None: # Creating a default view, e.g. output_view = np.empty_like(input_view) # process 'input_view' into 'output_view' return output_view

you can use not None in the declaration to avoid this issue.

Yah, I was hoping you had forgotten about that. Part of the goal ATM is to move the cython code closer to being valid python (linting!), and not None moves that in the wrong direction. Will un-revert, and open an issue with Cython on implementing a py-friendly version of not None. Ditto for the other open cython PR.

well I actually find it weird that cython code accepts None here at all by default.

…btsmore

jreback · 2018-11-02T14:14:12Z

needs a rebase

…btsmore

jbrockmendel · 2018-11-02T15:13:49Z

Rebased, but please hold off pending another pass.

jbrockmendel · 2018-11-03T01:07:27Z

The worthwhile parts of this are in #23464. Closing.

* Easy bits of #23382 * Easy parts of #23368

* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368

gfyoung added Internals Related to non-user accessible pandas implementation Clean labels Oct 27, 2018

jbrockmendel commented Oct 27, 2018

View reviewed changes

pandas/_libs/tslibs/period.pyx Outdated Show resolved Hide resolved

gfyoung approved these changes Oct 27, 2018

View reviewed changes

jreback requested changes Oct 28, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Oct 28, 2018

jbrockmendel added 6 commits October 30, 2018 09:52

tslibs cleanup, typing, memoryviews

37239cc

cleanup and typing

c7bd679

small optimizations

e2a2df9

optimizations

ae04c77

Whitespace fixup

c89ddff

Merge branch 'master' of https://github.com/pandas-dev/pandas into li…

498cd64

…btsmore

jbrockmendel force-pushed the libtsmore branch from 8027e2f to 498cd64 Compare October 30, 2018 16:52

jreback requested changes Oct 31, 2018

View reviewed changes

jbrockmendel added 2 commits November 1, 2018 08:40

Merge branch 'master' of https://github.com/pandas-dev/pandas into li…

77039a3

…btsmore

use np.asarray instead of .base

f501cd7

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 2, 2018

easy parts of pandas-dev#23368

c34e438

jbrockmendel mentioned this pull request Nov 2, 2018

REF: cython cleanup, typing, optimizations #23456

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into li…

f57f994

…btsmore

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Nov 2, 2018

Easy parts of pandas-dev#23368

45b6f75

jbrockmendel mentioned this pull request Nov 2, 2018

REF: cython cleanup, typing, optimizations #23464

Merged

jbrockmendel closed this Nov 3, 2018

jbrockmendel deleted the libtsmore branch November 3, 2018 01:07

jreback pushed a commit that referenced this pull request Nov 3, 2018

REF: cython cleanup, typing, optimizations (#23464)

6fe83bb

* Easy bits of #23382 * Easy parts of #23368

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

REF: cython cleanup, typing, optimizations (pandas-dev#23464)

3faf1a9

* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

REF: cython cleanup, typing, optimizations (pandas-dev#23464)

a43fb86

* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368

TomAugspurger mentioned this pull request Feb 20, 2019

Pandas Series Construction Extremely Slow for Array of Large Series #25364

Open

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF: cython cleanup, typing, optimizations (pandas-dev#23464)

2477d28

* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF: cython cleanup, typing, optimizations (pandas-dev#23464)

543553b

* Easy bits of pandas-dev#23382 * Easy parts of pandas-dev#23368

Uh oh!

REF: Cleanups, typing, memoryviews in tslibs #23368

REF: Cleanups, typing, memoryviews in tslibs #23368

Uh oh!

Conversation

jbrockmendel commented Oct 26, 2018

Uh oh!

codecov bot commented Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfyoung commented Oct 27, 2018

Uh oh!

jbrockmendel commented Oct 27, 2018

Uh oh!

Uh oh!

gfyoung commented Oct 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Oct 30, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 2, 2018

Uh oh!

jbrockmendel commented Nov 2, 2018

Uh oh!

jbrockmendel commented Nov 3, 2018

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2018 •

edited

Loading