Skip to content

Precompile common regular expressions #1603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 24, 2020

Conversation

brettlangdon
Copy link

@brettlangdon brettlangdon commented Jun 23, 2020

For very large tox.ini configurations there is a lot of time spent calling re._compile for static regular expressions. By precompiling these common/static regular expressions I was able to get our tox -l run from 7.79s to 6.63s

Example of our tox.ini

$ python ./src/tox/__main__.py -c ~/example/tox.ini -l | wc -l
1199

$ time python ./src/tox/__main__.py -c ~/example/tox.ini -l > /dev/null
python ./src/tox/__main__.py -c ~/example/tox.ini -l > /dev/null  7.79s user 0.97s system 107% cpu 8.122 total

$ time python ./src/tox/__main__.py -c ~/example/tox.ini -l > /dev/null
python ./src/tox/__main__.py -c ~/example/tox.ini -l > /dev/null  6.63s user 0.95s system 109% cpu 6.924 total

Thanks for contributing a pull request!

If you are contributing for the first time or provide a trivial fix don't worry too
much about the checklist - we will help you get started.

Contribution checklist:

(also see CONTRIBUTING.rst for details)

  • wrote descriptive pull request text
  • added/updated test(s)
  • updated/extended the documentation
  • added relevant issue keyword
    in message body
  • added news fragment in changelog folder
    • fragment name: <issue number>.<type>.rst for example (588.bugfix.rst)
    • <type> is must be one of bugfix, feature, deprecation,breaking, doc, misc
    • if PR has no issue: consider creating one first or change it to the PR number after creating the PR
    • "sign" fragment with "by :user:<your username>"
    • please use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files - by :user:superuser."
    • also see examples
  • added yourself to CONTRIBUTORS (preserving alphabetical order)

@asottile
Copy link
Contributor

I highly doubt your performance numbers, for instance re.search is defined as _compile(pattern).search(...) which goes through a cache

@brettlangdon
Copy link
Author

The results I see locally with this change are pretty consistent, dropping about 1s off execution time.

_compile is using a cache, but also adds the indirection of an additional function call plus a lookup in the cache which isn't needed. It adds up, especially on a large file with a ton of envs. For example for our config it is called 1.6M times.

Before

Tue Jun 23 20:14:30 2020    ./tox.profile

         25212245 function calls (24405995 primitive calls) in 13.372 seconds

   Ordered by: internal time
   List reduced from 1888 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1631094    0.679    0.000    1.065    0.000 re.py:289(_compile)
   421805    0.581    0.000    8.473    0.000 __init__.py:1610(factor_line)
   389147    0.536    0.000    4.538    0.000 __init__.py:1440(_expand_envstr)
   391925    0.510    0.000    2.319    0.000 __init__.py:1445(expand)

After

Tue Jun 23 20:16:40 2020    ./tox.profile

         20402481 function calls (19596231 primitive calls) in 11.232 seconds

   Ordered by: internal time
   List reduced from 1886 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   421805    0.511    0.000    6.337    0.000 __init__.py:1615(factor_line)
   389147    0.497    0.000    3.036    0.000 __init__.py:1445(_expand_envstr)
    23057    0.458    0.000    0.530    0.000 shlex.py:133(read_token)
   391925    0.455    0.000    1.308    0.000 __init__.py:1450(expand)
   781084    0.418    0.000    0.418    0.000 {method 'split' of 're.Pattern' objects}

For very large tox.ini configurations there is a lot of time spent calling re._compile
by precompiling these common/static regular expressions I was able to get our
`tox -l` run from 7.79s to 6.63s
@brettlangdon brettlangdon force-pushed the brettlangdon/re.compile branch from e073e80 to 542d76a Compare June 24, 2020 00:37
Copy link
Contributor

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOs isn't happy but seems unrelated to this PR.

@gaborbernat gaborbernat merged commit 9f49997 into tox-dev:master Jun 24, 2020
ssbarnea pushed a commit to ssbarnea/tox that referenced this pull request Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants