Skip to content

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented May 6, 2023

Stop de-duplicating results in _RecursiveWildcardSelector. A new _DoubleRecursiveWildcardSelector class is introduced which performs de-duplication, but this is used only for patterns with multiple non-adjacent ** segments, such as path.glob('**/foo/**'). By avoiding the use of a set in most cases, PurePath.__hash__() is not called, and so paths do not need to be parsed and (case-) normalised.

Also merge adjacent ** segments in patterns.

Timings:

$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*"))'
1 loop, best of 5: 197 msec per loop   # before
2 loops, best of 5: 146 msec per loop  # after
--> 35% faster
$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/**/*"))'
1 loop, best of 5: 1.77 sec per loop   # before
2 loops, best of 5: 146 msec per loop  # after
--> 12x faster
$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*/**"))'
1 loop, best of 5: 738 msec per loop   # before
1 loop, best of 5: 731 msec per loop   # after
--> about the same

Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be parsed and (case-) normalised.

Also merge adjacent '**' segments in patterns.
@barneygale barneygale merged commit c0ece3d into python:main May 7, 2023
jbower-fb pushed a commit to jbower-fb/cpython that referenced this pull request May 8, 2023
…nGH-104244)

Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be stringified and case-normalised.

Also merge adjacent '**' segments in patterns.
carljm added a commit to carljm/cpython that referenced this pull request May 9, 2023
* main: (47 commits)
  pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)
  pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)
  pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)
  pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)
  pythonGH-104284: Fix documentation gettext build (python#104296)
  pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)
  pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)
  pythongh-99108: fix typo in Modules/Setup (python#104293)
  pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172)
  pythongh-103193: Improve `getattr_static` test coverage (python#104286)
  Trim trailing whitespace and test on CI (python#104275)
  pythongh-102500: Remove mention of bytes shorthand (python#104281)
  pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256)
  pythongh-99108: Replace SHA3 implementation HACL* version (python#103597)
  pythongh-104273: Remove redundant len() calls in argparse function (python#104274)
  pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200)
  pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266)
  pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244)
  pythongh-103650: Fix perf maps address format (python#103651)
  pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants