Skip to content

config: avoid stat storm in _getconftestmodules #9532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2022

Conversation

bluetech
Copy link
Member

Fix #9478.

I'll also say Closes #9125 since it shouldn't be that slow anymore (also with next version of pluggy). I ran a profile on pandas collection with this PR and gethookproxy (which #9125 caches) comes out at ~1.5s out of ~30s.

I've looked at various ways to speed this up, but they were either too ugly/intrusive or not as effective. In the end, the conftest data structure _dirpath2confmods is pretty efficient for what it does, and the API with the is_file/parent is mostly unavoidable given that. So a cache it is.

Note that this cache is safe (assuming a path does not change from file <-> dir during the run, which is already assumed), unlike the previous caches which worked on the entire _getconftestmodules function.

@bluetech bluetech added the needs backport applied to PRs, indicates that it should be ported to the current bug-fix branch label Jan 21, 2022
# _getconftestmodules()'s call to _get_directory() causes a stat
# storm when it's called potentially thousands of times in a test
# session (#9478), often with the same path, so cache it.
self._get_directory = lru_cache(256)(_get_directory)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to remove the bounds here, using None instead of 256?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered I forgot to answer this.

I think it's always good to have limits on lru caches to avoid them growing too large. We can increase it though if it's too low for someone.

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@The-Compiler
Copy link
Member

Any opinions whether this should be in 7.0.0 (due to #9478) or 7.0.1? I'd like to avoid adding more to the pile, but given that this is part of solving a regression and seems simple enough I guess we can take it in.

@RonnyPfannschmidt RonnyPfannschmidt added this to the 7.0 milestone Jan 25, 2022
@nicoddemus
Copy link
Member

I think it should go into 7.0.0, as it is simple and fixes a regression, like you commented. 👍

@bluetech bluetech merged commit 5c69ece into pytest-dev:main Jan 25, 2022
@The-Compiler
Copy link
Member

Looks like the backport failed due to conflicts: https://github.com/pytest-dev/pytest/runs/4938147553?check_suite_focus=true

bluetech added a commit to bluetech/pytest that referenced this pull request Jan 25, 2022
config: avoid stat storm in _getconftestmodules

(cherry picked from commit 5c69ece)

Conflicts:
	src/_pytest/config/__init__.py
@bluetech bluetech deleted the getdir-cache branch February 7, 2022 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport applied to PRs, indicates that it should be ported to the current bug-fix branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[prerelease] Performance regression?
4 participants