Skip to content

Cannot use python logging without maintaining all records in caplog #8307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hilbichin opened this issue Feb 1, 2021 · 13 comments
Open

Cannot use python logging without maintaining all records in caplog #8307

hilbichin opened this issue Feb 1, 2021 · 13 comments
Labels
plugin: logging related to the logging builtin plugin type: performance performance or memory problem/improvement

Comments

@hilbichin
Copy link

Description

I'd like to be able to use python's built-in logging, specifically to stream logging records to CLI and file, without having all log records stored in memory in the caplog fixture.

The reason I don't want the records to be stored in caplog is that this is essentially a forced memory leak. In the use-case where I discovered this, all DEBUG-level logging records were being stored persistently in caplog even though my CLI logging level was INFO, file handler level was DEBUG. The records were being streamed to file as expected, but I don't want them kept in memory. This was a very long-running test, the accumulating DEBUG records eventually resulted in a memory error.

Maybe I just don't know how to use caplog/logging in such a way that I can handle my logging streams as desired without keeping the log records in memory. If that's the case, would it be possible to get that clarification?

Environment

Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 23 2018, 23:31:17) [MSC v.1916 32 bit (Intel)]

system='Windows', release='10', version='10.0.16299', machine='AMD64', processor='Intel64 Family 6 Model 142 Stepping 10, GenuineIntel'

Package            Version
------------------ -------
atomicwrites       1.4.0
attrs              20.3.0
colorama           0.4.4
importlib-metadata 3.4.0
iniconfig          1.1.1
packaging          20.9
pip                20.3.3
pluggy             0.13.1
py                 1.10.0
pyparsing          2.4.7
pytest             6.2.2
setuptools         51.0.0
toml               0.10.2
typing-extensions  3.7.4.3
wheel              0.36.2
zipp               3.4.0

Example

import logging
logger = logging.getLogger(__name__)
test_logging_ex(caplog):
    assert not caplog.records
    logger.info("I'd like to see this message in the CLI, but I can't"
@Zac-HD Zac-HD added plugin: logging related to the logging builtin plugin type: performance performance or memory problem/improvement labels Feb 2, 2021
@asottile
Copy link
Member

asottile commented Feb 4, 2021

I believe the first line in the caplog documentation already addresses this: https://docs.pytest.org/en/stable/logging.html#caplog-fixture

Inside tests it is possible to change the log level for the captured log messages. This is supported by the caplog fixture:

@hilbichin
Copy link
Author

I can control what gets captured by caplog. But, I cannot do this independently of other handlers to achieve the following:

  • INFO level logs streamed to CLI.
  • DEBUG level logs streamed to file.
  • No (or minimal) logs persisting in python process memory.

This is easy outside of pytest, but with caplog I can't do it. Here is my setup:

In pytest.ini:

log_cli = true
log_cli_level = INFO
log_level = DEBUG

Result:

  • INFO messages go to CLI (good)
  • DEBUG messages to a log file (good)
  • DEBUG messages in caplog (bad, memory leak)

If I add a fixture that does:

log_names = [name for name in logging.root.manager.loggerDict]
for name in log_names:
    caplog.set_level(logging.INFO, logger=name)

Result:

  • INFO messages go to CLI (good)
  • INFO messages to a log file (bad)
  • INFO messages in caplog (better, memory leak not so bad)

@hilbichin
Copy link
Author

hilbichin commented Apr 13, 2021

Any update here? The ask is to be able to log debug logs to file, info logs to terminal, while not leaking memory by keeping every debug log in caplog.

@SnufkinWasTaken
Copy link

SnufkinWasTaken commented Feb 20, 2024

similar problem here - any update regarding this issue? ^^

@lovetheguitar
Copy link
Contributor

To reduce pytest's memory consumption we currently settled for below approach.
Maybe it helps someone. Feedback/improvements welcome.

def pytest_runtest_logreport(report: pytest.TestReport) -> None:
    """Drop captured logs of successful test phases to reduce pytest's excessive memory usage.

    .. warning :: No extended testing regarding possible side effects was done.

    This is a workaround for an intrinsic pytest memory leak, see relevant issues:
      - https://github.com/pytest-dev/pytest/issues/8307
      - https://github.com/pytest-dev/pytest/discussions/11704
      - https://github.com/pytest-dev/pytest/issues/3329
      - https://github.com/pytest-dev/pytest/issues/9215

    `prefix` taken from https://github.com/pytest-dev/pytest/blob/8a410d0ba60bca7073bf18a1503c59a3b5445780/src/_pytest/reports.py#L118-L125

    This hook gets called for every `pytest_runtest_protocol` phase, i.e. `setup`, `call`, `teardown`.

    Since we do not directly delete the private
    [`item._report_sections`](https://github.com/pytest-dev/pytest/blob/c967d508d62ad9e073d495ddfdca437188f2283e/src/_pytest/reports.py#L369-L370)
    to hopefully be "a bit safer" - the `report.sections` are appended for each consecutive hook
    call. This means that in later phases all the (log) sections of previous phases will be present
    again and we thus remove up to three sections without a warning.
    """  # noqa: E501  # allow long url
    # TODO: test if it would be sufficient to only drop the sections in the teardown phase
    if report.passed and report.caplog != "":
        prefix = "Captured log"

        sections_to_drop = []
        for section in report.sections:
            name = section[0]
            if name.startswith(prefix):
                sections_to_drop.append(section)

        if len(sections_to_drop) > 3:
            log.debug(
                f"Removing {sections_to_drop=} from captured logs of passed {report.nodeid=}."
            )
            warnings.warn(
                RuntimeWarning(
                    f"Dropping more than the maximum expected three sections "
                    f"({', '.join(section_name for section_name, _ in sections_to_drop)}) "
                    f"from the pytest report.sections object. This might be caused by "
                    f"other pytest plugins and might in turn cause problems for those."
                ),
                stacklevel=2,
            )

        for section_to_drop in sections_to_drop:
            report.sections.remove(section_to_drop)

@ThermodynamicBeta
Copy link

If I understand correctly, @lovetheguitar 's solution only works if you have multiple smaller tests, where setup, call and teardown will happen. In my use case, I have one long-running test that will eventually run out of memory due to the caplog. Is there any possible solution for that case?

@RonnyPfannschmidt
Copy link
Member

At the sprint a plugin was started for mid test pruning

I didn't keep Tabs

@lovetheguitar
Copy link
Contributor

@ThermodynamicBeta yes indeed, this hack above, as well as the plugin we started at the end of the sprint will only work for a test suite consisting of multiple tests. The plugin is local only still, but if people are interested that might motivate us to clean up things and get it out faster.

@RonnyPfannschmidt We used pytest_runtest_teardown, which in this scenario I‘d call "after test".

I don’t think there is much to expect from pytest or a plugin in the case of single long running test though.
In that case no pytest hooks will be executed during the run, so whichever clean up solution would have to run concurrently with the test.

@RonnyPfannschmidt
Copy link
Member

I recall discussing fixture to provide mid test culling at choose points

@ThermodynamicBeta
Copy link

I made a workaround with some help from some community members. Adding this to a test makes it so logging has a capped maximum effect on memory, with an obvious downside of only saving the last n=100 messages for the post-test summary.

from collections import deque
import logging

class DequeStringIO:
    def __init__(self, bound: int) -> None:
        self.buf = deque(maxlen=bound)

    def write(self, s: str) -> int:
        self.buf.append(s)
        return len(s)

    def getvalue(self) -> str:
        return "".join(self.buf)

for handler in logging.getLogger().handlers:
    if "LogCaptureHandler" in str(type(handler)):
        # there are two other handlers that don't appreciate having their stream overwritten
        handler.records = deque(maxlen=100)
        handler.stream = DequeStringIO(100)

@The-Compiler
Copy link
Member

FWIW, instead of the hacky

    if "LogCaptureHandler" in str(type(handler)):

you might want to do a way more direct

    if isinstance(handler, _pytest.logging.LogCaptureHandler):

(yes, it's a private import - but you're already relying on private naming details to begin with!)

@ThermodynamicBeta
Copy link

I couldn't figure out how to import _pytest, how do you do that?

@The-Compiler
Copy link
Member

Like any other module, the _ is just a naming convention: import _pytest.logging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: logging related to the logging builtin plugin type: performance performance or memory problem/improvement
Projects
None yet
Development

No branches or pull requests

8 participants