Skip to content

Conversation

SlavaSkvortsov
Copy link
Contributor

Overview

Changed introduced in this PR was a reason why our service ran out of CPU after the NR update.
image

If you have redis-py installed, but don't have aioredis, NewRelic tries to find a version of aioredis anyway every time you call your Redis client. Instead of a quick check based on importing the library, NR now uses pkg_resources to find a lib. And pkg_resources is slow :(
Because we use Redis a lot, it heavily affected the performance.

The easiest solution is to add a global cache for package versions - they won't change in runtime anyway

Related Github Issue

Include a link to the related GitHub issue, if applicable

Testing

The agent includes a suite of tests which should be used to
verify your changes don't break existing functionality. These tests will run with
Github Actions when a pull request is made. More details on running the tests locally can be found
here,
For most contributions it is strongly recommended to add additional tests which
exercise your changes.

@SlavaSkvortsov SlavaSkvortsov requested a review from a team May 25, 2023 12:54
@CLAassistant
Copy link

CLAassistant commented May 25, 2023

CLA assistant check
All committers have signed the CLA.

@mergify mergify bot added the tests-failing Tests failing in CI. label Jun 7, 2023
Copy link
Contributor

@hmstepanek hmstepanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the bringing this to our attention and submitting a fix! Since we still support Python2.7 and lru_cache is only available in Python3, we need a solution for Python2.7. It looks like the tests are failing for Python3.7 as well though likely that could be fixed with some configuration changes. It looks like there are a couple options if we want to just pull in a third party for Python2.7:

  • functools32
  • repoze.lru
    Another option is implementing our own caching. I've only briefly started looking into this and our team will likely want to decide a direction together on this. I just wanted to provide some initial feedback here so you don't feel like we're ignoring you!

@SlavaSkvortsov
Copy link
Contributor Author

Hey @hmstepanek!

Want to let you know that we spot more issues with this code. It performs poorly with rq library, since every workhorse there is a separate process, having lru_cache does not help. And getting the versions of the packages is something that happens on the initialization of NewRelic. An average job looks like this:
image
first 400 milliseconds is the initialization of NewRelic (you can see that the function is called cached_get_package_version coz we monkeypatch the library to avoid the issue described in the original issue) and only after that the actual job start. When you have a lot of small jobs, spending 400 milliseconds of CPU for every single one of them is significant.

We don't have a solution for this issue yet.

Here is some code to monkeypatch the lib to solve the issue described in the issue above in case someone needs it :)

def _monkeypatch_newrelic() -> None:
    try:
        from newrelic.common import package_version_utils
    except ImportError:  # pragma: nocover
        return  # pragma: nocover

    if not hasattr(package_version_utils, '_get_package_version'):
        return  # pragma: nocover

    original_get_package_version = package_version_utils._get_package_version

    @cache
    def cached_get_package_version(package_name: str) -> str:
        return original_get_package_version(package_name)

    package_version_utils._get_package_version = cached_get_package_version

About the solution compatible with python2.7 - I don't have any experience with that, so I can't recommend anything. Thanks for looking into the problem and feel free to change the PR!

Thank you!

@TimPansino
Copy link
Contributor

Closing in favor of #946 which incorporates and extends these changes.

(This branch got really out of date and I couldn't update it here.)

@TimPansino TimPansino closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-failing Tests failing in CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants