-
-
Notifications
You must be signed in to change notification settings - Fork 691
Basic Cython support for DateParser #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 100% (diff: 100%)@@ master #414 diff @@
====================================
Files 14 13 -1
Lines 3081 2902 -179
Methods 0 0
Messages 0 0
Branches 226 182 -44
====================================
- Hits 3081 2902 -179
Misses 0 0
Partials 0 0
|
It's a real shame that this was ignored for so long. If you want to try fixing all the conflicts or submit a new PR that'd be great but I understand if it's too late for that. The only part I'm not keen on is the line width fixes. |
@nlittlepoole this seems like a great addition to Arrow. Like Chris said above, if you can resolve the conflicts and get things up-to-date, this would be a great change to merge in. |
Hey, since Cython and |
Overview
So as a heavy user of arrow for data pipelines, I've found that
arrow.get
, while faster than the standardstrptime
still isn't as fast as it could be. Others have discussed this more in depth. While its probably not feasible to be as fast asudatetime
, I think its worth exploring.So this PR is an attempt to increase the speed of date parsing. Compiling the
parser
module with Cython plus the LRU Cache that @ownaginatious added provides for some nice performance increases. I went from averaging between 100µs and 200µs per date parsed to averaging between 55µs and 75µs per date parsed (when using the cache). For long running data pipelines this is large increase.What I changed
I added 2 functions and some imports for Cython to setup tools. These are just for automatically finding
pyx
files and compiling them. Additionally I added Cython to therequirements.txt
I fixed up the line widths in
parser.py
. I changedl
tolength
for improved readability. Lastly switched from using the python version ofdatetime
to using the cpython version bundled with Cython. They are identical except the Cython version calls directly to the C code written for CPython.Tests
All the tests pass on 2.X.X without any changes to them. However Chai doesn't play very nicely with Cython3. The LRU cache of
DateTimeParser._generate_pattern_re
changes the type frommethod
to some subclass offunctools
, so Chai doesn't think it knows how to handle it. I might open a PR there to fix that issue but in the meantime I changed the tests to mock to make them pass.Todos & Further Exploration
cpython.datetime
withudatetime
arrow.factory.get
to use cache