Skip to content

bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 12, 2020

Conversation

aeros
Copy link
Contributor

@aeros aeros commented Oct 3, 2019

Motivation for this PR (comment from @vstinner in bpo issue):

Warning seen o AMD64 Ubuntu Shared 3.x buildbot:
https://buildbot.python.org/all/#/builders/141/builds/2593

test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ...
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)

The following implementation details for the new method are TBD:

  1. Public vs private

  2. Inclusion in close()

  3. Name

  4. Coroutine vs subroutine method

  5. timeout parameter

If it's a private method, 3, 4, and 5 are significantly less important.

I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.

https://bugs.python.org/issue38356

Automerge-Triggered-By: @asvetlov

Comment on lines 1272 to 1273
threads = [thread for thread in list(self._threads.values())
if thread.is_alive()]
Copy link
Contributor Author

@aeros aeros Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor optimization, would it be worthwhile to change this to a generator expression?

        threads = (thread for thread in list(self._threads.values())
                            if thread.is_alive())

There's no need to use a list here. The same applies to the __del__() method on line 1284.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, we could also change the list conversion to an iterable conversion. There's no reason to use a list since we're just iterating over the threads in the values view without adding or removing items.

         threads = (thread for thread in iter(self._threads.values())
                            if thread.is_alive())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads() and use it from our unit tests. No need to document it beyond just writing a doc string. We can add a public API for this sometime later.

Copy link
Contributor Author

@aeros aeros Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a public API for this sometime later.

@1st1 Yeah those were my thoughts as well. I think it made the most sense to include it as part of close(), since that's intended to be used as the Watcher cleanup method and fits well with the existing tests. I generally prefer to avoid modifying the regression tests as much as possible if there's not a functional benefit for doing so.

Is there anything from the current state of the PR you'd like to see changed other than writing a docstring?

Copy link
Contributor Author

@aeros aeros Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1

FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads() and use it from our unit tests

If we were to call it explicitly from the unit test, how would you recommend doing so? The current tearDown() method (in SubprocessWatcherMixin) is generically structured for all of the ChildWatcher classes:

        def tearDown(self):
            super().tearDown()
            policy = asyncio.get_event_loop_policy()
            watcher = policy.get_child_watcher()
            policy.set_child_watcher(None)
            watcher.attach_loop(None)
            watcher.close()

Would it work if we overrode the tearDown() method for the ThreadedChildWatcher test class?:

Current:


    class SubprocessThreadedWatcherTests(SubprocessWatcherMixin,
                                         test_utils.TestCase):

        Watcher = unix_events.ThreadedChildWatcher

Suggested:

    class SubprocessThreadedWatcherTests(SubprocessWatcherMixin,
                                         test_utils.TestCase):

        Watcher = unix_events.ThreadedChildWatcher

        def tearDown(self):
            super().tearDown()
            Watcher._join_threads()

I'm not sure if _join_threads() should be called earlier or if it's okay to just call it after SubprocessWatcherMixin.tearDown() is finished (as the above example does).

@aeros aeros added the type-bug An unexpected behavior, bug, or error label Oct 3, 2019
threads = [thread for thread in list(self._threads.values())
if thread.is_alive()]
for thread in threads:
thread.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to remove the thread from self._threads as soon as a thread complete?

for pid in list(self._threads):
  thread = self._threads[pid]
  thread.join()
  self._threads.pop(pid)

Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?

Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.

Copy link
Contributor Author

@aeros aeros Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner

Would it be possible to remove the thread from self._threads as soon as a thread complete?

Yeah sure, I would have no issue with adding something like this. I'm not 100% sure that we have to remove the threads that aren't alive from self._threads, but it would certainly be possible.

What exactly does removing the threads from self._threads accomplish that joining them doesn't already do? When a thread is successfully joined, thread._is_stopped is set to true, which is what thread.is_alive() uses to check if the thread is still active. I don't think we necessarily need to remove the threads from self._threads as part of the cleanup. Unless there's something I'm missing, I think that would just add additional overhead.

Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?

This should be possible by running thread.join(timeout) and if thread.is_alive() is True afterwards, the join timed out. In what situations would you want to differentiate between logging a warning vs silently exiting?

Unless by "exit silently" you were referring to thread.join(), which effectively does that already when the timeout is reached. thread.join(timeout) will block for timeout duration or when the join is complete, whichever one occurs first.

Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.

Yeah this would be a fairly straightforward change and I agree with it. A RuntimeError would be suitable as an exception to raise for that scenario.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

Copy link
Contributor

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the PR.

As @vstinner mentioned watcher.close() is not called normally.
Moreover, the watcher is designed to work with multiple event loops.
The lifecycle of watcher normally is equal to event loop policy lifetime, so typically the watcher lives forever.

However, the CPython test suite requires that any test has no side effects. That's why we explicitly call set_event_loop_policy(None) after every asyncio test. We should clean up a watcher after this step (again, explicitly).

Please note, asyncio watchers are part of public API.
I never love it but we cannot make wild changes. Watchers potentially can be used not only by asyncio but by third-party loops as well. That's why ThreadedChildWatcher was designed to strongly keep the backward compatibility.
Third-party watcher is also possible, we don't restrict that. The watcher may have no proposed _close_loop() method as well.

P.S.
I believe after polishing ThreadedChildWatcher we can start deprecation of watchers subsystem at all, it generates more problems than solves. Removing this part from public API can help but this is another thing.

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@asvetlov

As @vstinner mentioned watcher.close() is not called normally.
Moreover, the watcher is designed to work with multiple event loops.

This was primarily based upon fixing the thread leak in the unittest SubprocessThreadedWatcherTests that uses SubprocessWatcherMixin, which does use watcher.close():

    class SubprocessWatcherMixin(SubprocessMixin):
        Watcher = None
        def setUp(self):
            super().setUp()
            policy = asyncio.get_event_loop_policy()
            self.loop = policy.new_event_loop()
            self.set_event_loop(self.loop)
            watcher = self.Watcher()
            watcher.attach_loop(self.loop)
            policy.set_child_watcher(watcher)

        def tearDown(self):
            super().tearDown()
            policy = asyncio.get_event_loop_policy()
            watcher = policy.get_child_watcher()
            policy.set_child_watcher(None)
            watcher.attach_loop(None)
            watcher.close()

I figured the easiest way to have the fix conform to the existing tests would be to have all of the threads joined in ThreadedChildWatcher.close(). Is there a different solution you would recommend that would fix the leak coming from the ThreadedChildWatcher's threads? The only other option I thought of was changing the tearDown():

        def tearDown(self):
            super().tearDown()
            policy = asyncio.get_event_loop_policy()
            watcher = policy.get_child_watcher()
            policy.set_child_watcher(None)
            watcher.attach_loop(None)
            if isinstance(watcher, ThreadedChildWatcher):
                watcher._join_threads()
            watcher.close()

But, this would work even worse in terms to compatibility with other implementations, and adds an unneeded conditional to the other watcher class tests.

We should clean up a watcher after this step (again, explicitly).

Do you have an implementation idea as to how this could be done? Would each watcher have it's own cleanup method separate from watcher.close()?

@vstinner

Can we imagine to add a new method to only wait for threads spawned by an event loop?

Would this still fix the dangling thread issue in the SubprocessThreadedWatcherTests? As far as I can tell, only the watcher itself is closed, not the event loop.

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@asvetlov

I disagree with the PR.

After reading through your comment, I also disagree with the current state of it. As I mentioned in the bpo comments, I really didn't have an idea of the intended public API design for ThreadedChildWatcher (beyond what is said in the docs), this was mostly just an attempt to fix the dangling thread issue.

I'd be more than glad to modify this PR though in any recommended manner that you think would fit the intended API design.

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@asvetlov

We should clean up a watcher after this step (again, explicitly).

Also, there's something else that's not clear to me about this. In the current docs, it says the watcher.close() method is intended to be used as a resource cleanup (https://docs.python.org/3.9/library/asyncio-policy.html#asyncio.AbstractChildWatcher.close):

close()

    Close the watcher.

    This method has to be called to ensure that underlying resources are cleaned-up.

So where else other than watcher.close() (which the current tests are using) should the watcher be cleaned up from?

@asvetlov
Copy link
Contributor

asvetlov commented Oct 3, 2019

Well, joining spawn threads in ThreadedChildWatcher.close() makes sense.
asyncio doesn't call it on policy finishing, thus tearDown() should call watcher.close() as @aeros167 suggested.

@aeros
Copy link
Contributor Author

aeros commented Oct 3, 2019

@asvetlov

thus tearDown() should call watcher.close() as @aeros167 suggested.

Just to make sure it's clear, tearDown() already calls watcher.close(). The code example in my prior comment was the existing implementation.

So if we're in agreement about joining the threads in watcher.close() and retaining the current tearDown() implementation where watcher.close() is already called at the end, what do you think should be changed from the current version of the PR? I'm curious to hear @1st1's thoughts on this as well.

@1st1
Copy link
Member

1st1 commented Oct 21, 2019

Looks OK to me. If @asvetlov approves this it's good to go.

@aeros
Copy link
Contributor Author

aeros commented Oct 22, 2019

Looks OK to me. If @asvetlov approves this it's good to go.

Okay, I'll just add the docstring then after I finish working on the 3.8 asyncio "What's New" changes.

@aeros aeros changed the title bpo-38356: Fix ThreadedChildWatcher thread leak bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio Oct 23, 2019
@aeros aeros changed the title bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio bpo-38356: Fix ThreadedChildWatcher thread leak for test_asyncio Oct 24, 2019
@aeros
Copy link
Contributor Author

aeros commented Oct 24, 2019

In the recently added commits, I fixed the formatting, added a docstring as requested by @1st1, and made a slight modification to not join daemon threads in _join_threads().

This is because daemon threads don't need to be joined as part of this cleanup process. When the program exits, they're terminated automatically. See #16552 (comment) for more details.

@aeros aeros requested a review from asvetlov October 24, 2019 07:44
@aeros aeros changed the title bpo-38356: Fix ThreadedChildWatcher thread leak for test_asyncio bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio Oct 29, 2019
@aeros
Copy link
Contributor Author

aeros commented Jan 9, 2020

@asvetlov Are there any changes you'd like to suggest for this PR or is it good to go?

Copy link
Contributor

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@miss-islington
Copy link
Contributor

Thanks @aeros for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 12, 2020
…onGH-16552)

Motivation for this PR (comment from @vstinner in bpo issue):
```
Warning seen o AMD64 Ubuntu Shared 3.x buildbot:
https://buildbot.python.org/all/GH-/builders/141/builds/2593

test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ...
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
```
The following implementation details for the new method are TBD:

1) Public vs private

2) Inclusion in `close()`

3) Name

4) Coroutine vs subroutine method

5) *timeout* parameter

If it's a private method, 3, 4, and 5 are significantly less important.

I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.

https://bugs.python.org/issue38356
(cherry picked from commit 0ca7cc7)

Co-authored-by: Kyle Stanley <[email protected]>
@bedevere-bot
Copy link

GH-17963 is a backport of this pull request to the 3.8 branch.

miss-islington added a commit that referenced this pull request Jan 12, 2020
…6552)

Motivation for this PR (comment from @vstinner in bpo issue):
```
Warning seen o AMD64 Ubuntu Shared 3.x buildbot:
https://buildbot.python.org/all/GH-/builders/141/builds/2593

test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ...
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
```
The following implementation details for the new method are TBD:

1) Public vs private

2) Inclusion in `close()`

3) Name

4) Coroutine vs subroutine method

5) *timeout* parameter

If it's a private method, 3, 4, and 5 are significantly less important.

I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.

https://bugs.python.org/issue38356
(cherry picked from commit 0ca7cc7)

Co-authored-by: Kyle Stanley <[email protected]>
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
…onGH-16552)

Motivation for this PR (comment from @vstinner in bpo issue):
```
Warning seen o AMD64 Ubuntu Shared 3.x buildbot:
https://buildbot.python.org/all/#/builders/141/builds/2593

test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ...
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
```
The following implementation details for the new method are TBD:

1) Public vs private

2) Inclusion in `close()`

3) Name

4) Coroutine vs subroutine method

5) *timeout* parameter

If it's a private method, 3, 4, and 5 are significantly less important.

I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.


https://bugs.python.org/issue38356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants