bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

aeros · 2019-10-03T03:55:01Z

Motivation for this PR (comment from @vstinner in bpo issue):

Warning seen o AMD64 Ubuntu Shared 3.x buildbot:
https://buildbot.python.org/all/#/builders/141/builds/2593

test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ...
Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)

The following implementation details for the new method are TBD:

Public vs private
Inclusion in close()
Name
Coroutine vs subroutine method
timeout parameter

If it's a private method, 3, 4, and 5 are significantly less important.

I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.

https://bugs.python.org/issue38356

Automerge-Triggered-By: @asvetlov

Lib/asyncio/unix_events.py

aeros · 2019-10-03T04:16:33Z

Lib/asyncio/unix_events.py

+        threads = [thread for thread in list(self._threads.values())
+                            if thread.is_alive()]


As a minor optimization, would it be worthwhile to change this to a generator expression?

threads = (thread for thread in list(self._threads.values()) if thread.is_alive())

There's no need to use a list here. The same applies to the __del__() method on line 1284.

Adding to this, we could also change the list conversion to an iterable conversion. There's no reason to use a list since we're just iterating over the threads in the values view without adding or removing items.

threads = (thread for thread in iter(self._threads.values()) if thread.is_alive())

FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads() and use it from our unit tests. No need to document it beyond just writing a doc string. We can add a public API for this sometime later.

We can add a public API for this sometime later.

@1st1 Yeah those were my thoughts as well. I think it made the most sense to include it as part of close(), since that's intended to be used as the Watcher cleanup method and fits well with the existing tests. I generally prefer to avoid modifying the regression tests as much as possible if there's not a functional benefit for doing so.

Is there anything from the current state of the PR you'd like to see changed other than writing a docstring?

@1st1

FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads() and use it from our unit tests

If we were to call it explicitly from the unit test, how would you recommend doing so? The current tearDown() method (in SubprocessWatcherMixin) is generically structured for all of the ChildWatcher classes:

def tearDown(self): super().tearDown() policy = asyncio.get_event_loop_policy() watcher = policy.get_child_watcher() policy.set_child_watcher(None) watcher.attach_loop(None) watcher.close()

Would it work if we overrode the tearDown() method for the ThreadedChildWatcher test class?:

Current:

class SubprocessThreadedWatcherTests(SubprocessWatcherMixin, test_utils.TestCase): Watcher = unix_events.ThreadedChildWatcher

Suggested:

class SubprocessThreadedWatcherTests(SubprocessWatcherMixin, test_utils.TestCase): Watcher = unix_events.ThreadedChildWatcher def tearDown(self): super().tearDown() Watcher._join_threads()

I'm not sure if _join_threads() should be called earlier or if it's okay to just call it after SubprocessWatcherMixin.tearDown() is finished (as the above example does).

vstinner · 2019-10-03T06:59:09Z

Lib/asyncio/unix_events.py

+        threads = [thread for thread in list(self._threads.values())
+                            if thread.is_alive()]
+        for thread in threads:
+            thread.join()


Would it be possible to remove the thread from self._threads as soon as a thread complete?

for pid in list(self._threads): thread = self._threads[pid] thread.join() self._threads.pop(pid)

Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?

Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.

@vstinner

Would it be possible to remove the thread from self._threads as soon as a thread complete?

Yeah sure, I would have no issue with adding something like this. I'm not 100% sure that we have to remove the threads that aren't alive from self._threads, but it would certainly be possible.

What exactly does removing the threads from self._threads accomplish that joining them doesn't already do? When a thread is successfully joined, thread._is_stopped is set to true, which is what thread.is_alive() uses to check if the thread is still active. I don't think we necessarily need to remove the threads from self._threads as part of the cleanup. Unless there's something I'm missing, I think that would just add additional overhead.

Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?

This should be possible by running thread.join(timeout) and if thread.is_alive() is True afterwards, the join timed out. In what situations would you want to differentiate between logging a warning vs silently exiting?

Unless by "exit silently" you were referring to thread.join(), which effectively does that already when the timeout is reached. thread.join(timeout) will block for timeout duration or when the join is complete, whichever one occurs first.

Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.

Yeah this would be a fairly straightforward change and I agree with it. A RuntimeError would be suitable as an exception to raise for that scenario.

Lib/asyncio/unix_events.py

bedevere-bot · 2019-10-03T07:02:51Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

asvetlov

I disagree with the PR.

As @vstinner mentioned watcher.close() is not called normally.
Moreover, the watcher is designed to work with multiple event loops.
The lifecycle of watcher normally is equal to event loop policy lifetime, so typically the watcher lives forever.

However, the CPython test suite requires that any test has no side effects. That's why we explicitly call set_event_loop_policy(None) after every asyncio test. We should clean up a watcher after this step (again, explicitly).

Please note, asyncio watchers are part of public API.
I never love it but we cannot make wild changes. Watchers potentially can be used not only by asyncio but by third-party loops as well. That's why ThreadedChildWatcher was designed to strongly keep the backward compatibility.
Third-party watcher is also possible, we don't restrict that. The watcher may have no proposed _close_loop() method as well.

P.S.
I believe after polishing ThreadedChildWatcher we can start deprecation of watchers subsystem at all, it generates more problems than solves. Removing this part from public API can help but this is another thing.

aeros · 2019-10-03T10:46:45Z

@asvetlov

As @vstinner mentioned watcher.close() is not called normally.
Moreover, the watcher is designed to work with multiple event loops.

This was primarily based upon fixing the thread leak in the unittest SubprocessThreadedWatcherTests that uses SubprocessWatcherMixin, which does use watcher.close():

    class SubprocessWatcherMixin(SubprocessMixin):
        Watcher = None
        def setUp(self):
            super().setUp()
            policy = asyncio.get_event_loop_policy()
            self.loop = policy.new_event_loop()
            self.set_event_loop(self.loop)
            watcher = self.Watcher()
            watcher.attach_loop(self.loop)
            policy.set_child_watcher(watcher)

        def tearDown(self):
            super().tearDown()
            policy = asyncio.get_event_loop_policy()
            watcher = policy.get_child_watcher()
            policy.set_child_watcher(None)
            watcher.attach_loop(None)
            watcher.close()

I figured the easiest way to have the fix conform to the existing tests would be to have all of the threads joined in ThreadedChildWatcher.close(). Is there a different solution you would recommend that would fix the leak coming from the ThreadedChildWatcher's threads? The only other option I thought of was changing the tearDown():

        def tearDown(self):
            super().tearDown()
            policy = asyncio.get_event_loop_policy()
            watcher = policy.get_child_watcher()
            policy.set_child_watcher(None)
            watcher.attach_loop(None)
            if isinstance(watcher, ThreadedChildWatcher):
                watcher._join_threads()
            watcher.close()

But, this would work even worse in terms to compatibility with other implementations, and adds an unneeded conditional to the other watcher class tests.

We should clean up a watcher after this step (again, explicitly).

Do you have an implementation idea as to how this could be done? Would each watcher have it's own cleanup method separate from watcher.close()?

@vstinner

Can we imagine to add a new method to only wait for threads spawned by an event loop?

Would this still fix the dangling thread issue in the SubprocessThreadedWatcherTests? As far as I can tell, only the watcher itself is closed, not the event loop.

aeros · 2019-10-03T10:49:53Z

@asvetlov

I disagree with the PR.

After reading through your comment, I also disagree with the current state of it. As I mentioned in the bpo comments, I really didn't have an idea of the intended public API design for ThreadedChildWatcher (beyond what is said in the docs), this was mostly just an attempt to fix the dangling thread issue.

I'd be more than glad to modify this PR though in any recommended manner that you think would fit the intended API design.

aeros · 2019-10-03T11:02:01Z

@asvetlov

We should clean up a watcher after this step (again, explicitly).

Also, there's something else that's not clear to me about this. In the current docs, it says the watcher.close() method is intended to be used as a resource cleanup (https://docs.python.org/3.9/library/asyncio-policy.html#asyncio.AbstractChildWatcher.close):

close()

    Close the watcher.

    This method has to be called to ensure that underlying resources are cleaned-up.

So where else other than watcher.close() (which the current tests are using) should the watcher be cleaned up from?

asvetlov · 2019-10-03T16:04:32Z

Well, joining spawn threads in ThreadedChildWatcher.close() makes sense.
asyncio doesn't call it on policy finishing, thus tearDown() should call watcher.close() as @aeros167 suggested.

aeros · 2019-10-03T23:26:04Z

@asvetlov

thus tearDown() should call watcher.close() as @aeros167 suggested.

Just to make sure it's clear, tearDown() already calls watcher.close(). The code example in my prior comment was the existing implementation.

So if we're in agreement about joining the threads in watcher.close() and retaining the current tearDown() implementation where watcher.close() is already called at the end, what do you think should be changed from the current version of the PR? I'm curious to hear @1st1's thoughts on this as well.

1st1 · 2019-10-21T22:36:14Z

Looks OK to me. If @asvetlov approves this it's good to go.

aeros · 2019-10-22T22:49:13Z

Looks OK to me. If @asvetlov approves this it's good to go.

Okay, I'll just add the docstring then after I finish working on the 3.8 asyncio "What's New" changes.

aeros · 2019-10-24T07:31:22Z

In the recently added commits, I fixed the formatting, added a docstring as requested by @1st1, and made a slight modification to not join daemon threads in _join_threads().

This is because daemon threads don't need to be joined as part of this cleanup process. When the program exits, they're terminated automatically. See #16552 (comment) for more details.

aeros · 2020-01-09T22:35:53Z

@asvetlov Are there any changes you'd like to suggest for this PR or is it good to go?

asvetlov

LGTM

miss-islington · 2020-01-12T11:02:53Z

Thanks @aeros for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@vstinner

…onGH-16552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/GH-/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356 (cherry picked from commit 0ca7cc7) Co-authored-by: Kyle Stanley <[email protected]>

bedevere-bot · 2020-01-12T11:03:33Z

GH-17963 is a backport of this pull request to the 3.8 branch.

@vstinner

…6552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/GH-/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356 (cherry picked from commit 0ca7cc7) Co-authored-by: Kyle Stanley <[email protected]>

@vstinner

…onGH-16552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/#/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356

Fix test_asyncio.test_subprocess dangling threads

4a8395a

aeros requested review from 1st1 and asvetlov as code owners October 3, 2019 03:55

the-knights-who-say-ni added the CLA signed label Oct 3, 2019

bedevere-bot added the awaiting review label Oct 3, 2019

Remove empty line and fix formatting

bfc0d6c

aeros requested a review from vstinner October 3, 2019 04:01

aeros added topic-asyncio skip news labels Oct 3, 2019

aeros commented Oct 3, 2019

View reviewed changes

Lib/asyncio/unix_events.py Outdated Show resolved Hide resolved

aeros commented Oct 3, 2019

View reviewed changes

aeros added the type-bug An unexpected behavior, bug, or error label Oct 3, 2019

vstinner requested changes Oct 3, 2019

View reviewed changes

bedevere-bot removed the awaiting review label Oct 3, 2019

bedevere-bot added the awaiting changes label Oct 3, 2019

asvetlov requested changes Oct 3, 2019

View reviewed changes

aeros changed the title ~~bpo-38356: Fix ThreadedChildWatcher thread leak~~ bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio Oct 23, 2019

aeros changed the title ~~bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio~~ bpo-38356: Fix ThreadedChildWatcher thread leak for test_asyncio Oct 24, 2019

aeros added 3 commits October 24, 2019 03:18

Fix code formatting

fbf7f0d

Don't join daemon threads

6bcf009

Add docstring for ThreadedChildWatcher._join_threads()

fe2b256

aeros requested a review from asvetlov October 24, 2019 07:44

aeros changed the title ~~bpo-38356: Fix ThreadedChildWatcher thread leak for test_asyncio~~ bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio Oct 29, 2019

asvetlov approved these changes Jan 12, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Jan 12, 2020

asvetlov added needs backport to 3.8 labels Jan 12, 2020

miss-islington merged commit 0ca7cc7 into python:master Jan 12, 2020

bedevere-bot removed the awaiting merge label Jan 12, 2020

bedevere-bot removed the needs backport to 3.8 label Jan 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

aeros commented Oct 3, 2019 •

edited by miss-islington

Loading

aeros Oct 3, 2019 •

edited

Loading

aeros Oct 3, 2019

1st1 Oct 4, 2019

aeros Oct 4, 2019 •

edited

Loading

aeros Oct 21, 2019 •

edited

Loading

vstinner Oct 3, 2019

aeros Oct 3, 2019 •

edited

Loading

bedevere-bot commented Oct 3, 2019

asvetlov left a comment

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

asvetlov commented Oct 3, 2019

aeros commented Oct 3, 2019 •

edited

Loading

1st1 commented Oct 21, 2019

aeros commented Oct 22, 2019 •

edited

Loading

aeros commented Oct 24, 2019 •

edited

Loading

aeros commented Jan 9, 2020

asvetlov left a comment

miss-islington commented Jan 12, 2020

bedevere-bot commented Jan 12, 2020

		threads = [thread for thread in list(self._threads.values())
		if thread.is_alive()]

bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552

Conversation

aeros commented Oct 3, 2019 • edited by miss-islington Loading

aeros Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

aeros Oct 3, 2019

Choose a reason for hiding this comment

1st1 Oct 4, 2019

Choose a reason for hiding this comment

aeros Oct 4, 2019 • edited Loading

Choose a reason for hiding this comment

aeros Oct 21, 2019 • edited Loading

Choose a reason for hiding this comment

vstinner Oct 3, 2019

Choose a reason for hiding this comment

aeros Oct 3, 2019 • edited Loading

Choose a reason for hiding this comment

bedevere-bot commented Oct 3, 2019

asvetlov left a comment

Choose a reason for hiding this comment

aeros commented Oct 3, 2019 • edited Loading

aeros commented Oct 3, 2019 • edited Loading

aeros commented Oct 3, 2019 • edited Loading

asvetlov commented Oct 3, 2019

aeros commented Oct 3, 2019 • edited Loading

1st1 commented Oct 21, 2019

aeros commented Oct 22, 2019 • edited Loading

aeros commented Oct 24, 2019 • edited Loading

aeros commented Jan 9, 2020

asvetlov left a comment

Choose a reason for hiding this comment

miss-islington commented Jan 12, 2020

bedevere-bot commented Jan 12, 2020

aeros commented Oct 3, 2019 •

edited by miss-islington

Loading

aeros Oct 3, 2019 •

edited

Loading

aeros Oct 4, 2019 •

edited

Loading

aeros Oct 21, 2019 •

edited

Loading

aeros Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 3, 2019 •

edited

Loading

aeros commented Oct 22, 2019 •

edited

Loading

aeros commented Oct 24, 2019 •

edited

Loading