-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38356: Fix ThreadedChildWatcher thread leak in test_asyncio #16552
Conversation
Lib/asyncio/unix_events.py
Outdated
threads = [thread for thread in list(self._threads.values()) | ||
if thread.is_alive()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor optimization, would it be worthwhile to change this to a generator expression?
threads = (thread for thread in list(self._threads.values())
if thread.is_alive())
There's no need to use a list here. The same applies to the __del__()
method on line 1284.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to this, we could also change the list conversion to an iterable conversion. There's no reason to use a list since we're just iterating over the threads in the values view without adding or removing items.
threads = (thread for thread in iter(self._threads.values())
if thread.is_alive())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads()
and use it from our unit tests. No need to document it beyond just writing a doc string. We can add a public API for this sometime later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a public API for this sometime later.
@1st1 Yeah those were my thoughts as well. I think it made the most sense to include it as part of close()
, since that's intended to be used as the Watcher cleanup method and fits well with the existing tests. I generally prefer to avoid modifying the regression tests as much as possible if there's not a functional benefit for doing so.
Is there anything from the current state of the PR you'd like to see changed other than writing a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it would be acceptable to add a private methof to ThreadedChildWatcher _join_threads() and use it from our unit tests
If we were to call it explicitly from the unit test, how would you recommend doing so? The current tearDown()
method (in SubprocessWatcherMixin
) is generically structured for all of the ChildWatcher
classes:
def tearDown(self):
super().tearDown()
policy = asyncio.get_event_loop_policy()
watcher = policy.get_child_watcher()
policy.set_child_watcher(None)
watcher.attach_loop(None)
watcher.close()
Would it work if we overrode the tearDown()
method for the ThreadedChildWatcher
test class?:
Current:
class SubprocessThreadedWatcherTests(SubprocessWatcherMixin,
test_utils.TestCase):
Watcher = unix_events.ThreadedChildWatcher
Suggested:
class SubprocessThreadedWatcherTests(SubprocessWatcherMixin,
test_utils.TestCase):
Watcher = unix_events.ThreadedChildWatcher
def tearDown(self):
super().tearDown()
Watcher._join_threads()
I'm not sure if _join_threads()
should be called earlier or if it's okay to just call it after SubprocessWatcherMixin.tearDown() is finished (as the above example does).
threads = [thread for thread in list(self._threads.values()) | ||
if thread.is_alive()] | ||
for thread in threads: | ||
thread.join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to remove the thread from self._threads as soon as a thread complete?
for pid in list(self._threads):
thread = self._threads[pid]
thread.join()
self._threads.pop(pid)
Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?
Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to remove the thread from self._threads as soon as a thread complete?
Yeah sure, I would have no issue with adding something like this. I'm not 100% sure that we have to remove the threads that aren't alive from self._threads
, but it would certainly be possible.
What exactly does removing the threads from self._threads
accomplish that joining them doesn't already do? When a thread is successfully joined, thread._is_stopped
is set to true, which is what thread.is_alive()
uses to check if the thread is still active. I don't think we necessarily need to remove the threads from self._threads
as part of the cleanup. Unless there's something I'm missing, I think that would just add additional overhead.
Would it be possible to log a warning if a thread runs longer than timeout seconds? Or even exit silently if the timoeut is exceeded?
This should be possible by running thread.join(timeout)
and if thread.is_alive()
is True afterwards, the join timed out. In what situations would you want to differentiate between logging a warning vs silently exiting?
Unless by "exit silently" you were referring to thread.join()
, which effectively does that already when the timeout is reached. thread.join(timeout)
will block for timeout duration or when the join is complete, whichever one occurs first.
Maybe we should also modify add_child_handler() to raise an exception if it's called after close() has been called? For example, add a _closed attribute? It would prevent to spawn new threads while we wait for existing threads.
Yeah this would be a fairly straightforward change and I agree with it. A RuntimeError
would be suitable as an exception to raise for that scenario.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the PR.
As @vstinner mentioned watcher.close()
is not called normally.
Moreover, the watcher is designed to work with multiple event loops.
The lifecycle of watcher normally is equal to event loop policy lifetime, so typically the watcher lives forever.
However, the CPython test suite requires that any test has no side effects. That's why we explicitly call set_event_loop_policy(None)
after every asyncio test. We should clean up a watcher after this step (again, explicitly).
Please note, asyncio watchers are part of public API.
I never love it but we cannot make wild changes. Watchers potentially can be used not only by asyncio but by third-party loops as well. That's why ThreadedChildWatcher
was designed to strongly keep the backward compatibility.
Third-party watcher is also possible, we don't restrict that. The watcher may have no proposed _close_loop()
method as well.
P.S.
I believe after polishing ThreadedChildWatcher we can start deprecation of watchers subsystem at all, it generates more problems than solves. Removing this part from public API can help but this is another thing.
This was primarily based upon fixing the thread leak in the unittest class SubprocessWatcherMixin(SubprocessMixin):
Watcher = None
def setUp(self):
super().setUp()
policy = asyncio.get_event_loop_policy()
self.loop = policy.new_event_loop()
self.set_event_loop(self.loop)
watcher = self.Watcher()
watcher.attach_loop(self.loop)
policy.set_child_watcher(watcher)
def tearDown(self):
super().tearDown()
policy = asyncio.get_event_loop_policy()
watcher = policy.get_child_watcher()
policy.set_child_watcher(None)
watcher.attach_loop(None)
watcher.close() I figured the easiest way to have the fix conform to the existing tests would be to have all of the threads joined in def tearDown(self):
super().tearDown()
policy = asyncio.get_event_loop_policy()
watcher = policy.get_child_watcher()
policy.set_child_watcher(None)
watcher.attach_loop(None)
if isinstance(watcher, ThreadedChildWatcher):
watcher._join_threads()
watcher.close() But, this would work even worse in terms to compatibility with other implementations, and adds an unneeded conditional to the other watcher class tests.
Do you have an implementation idea as to how this could be done? Would each watcher have it's own cleanup method separate from
Would this still fix the dangling thread issue in the SubprocessThreadedWatcherTests? As far as I can tell, only the watcher itself is closed, not the event loop. |
After reading through your comment, I also disagree with the current state of it. As I mentioned in the bpo comments, I really didn't have an idea of the intended public API design for ThreadedChildWatcher (beyond what is said in the docs), this was mostly just an attempt to fix the dangling thread issue. I'd be more than glad to modify this PR though in any recommended manner that you think would fit the intended API design. |
Also, there's something else that's not clear to me about this. In the current docs, it says the
So where else other than |
Well, joining spawn threads in |
Just to make sure it's clear, So if we're in agreement about joining the threads in |
Looks OK to me. If @asvetlov approves this it's good to go. |
Okay, I'll just add the docstring then after I finish working on the 3.8 asyncio "What's New" changes. |
In the recently added commits, I fixed the formatting, added a docstring as requested by @1st1, and made a slight modification to not join daemon threads in This is because daemon threads don't need to be joined as part of this cleanup process. When the program exits, they're terminated automatically. See #16552 (comment) for more details. |
@asvetlov Are there any changes you'd like to suggest for this PR or is it good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @aeros for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
…onGH-16552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/GH-/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356 (cherry picked from commit 0ca7cc7) Co-authored-by: Kyle Stanley <[email protected]>
GH-17963 is a backport of this pull request to the 3.8 branch. |
…6552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/GH-/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356 (cherry picked from commit 0ca7cc7) Co-authored-by: Kyle Stanley <[email protected]>
…onGH-16552) Motivation for this PR (comment from @vstinner in bpo issue): ``` Warning seen o AMD64 Ubuntu Shared 3.x buildbot: https://buildbot.python.org/all/#/builders/141/builds/2593 test_devnull_output (test.test_a=syncio.test_subprocess.SubprocessThreadedWatcherTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2) ``` The following implementation details for the new method are TBD: 1) Public vs private 2) Inclusion in `close()` 3) Name 4) Coroutine vs subroutine method 5) *timeout* parameter If it's a private method, 3, 4, and 5 are significantly less important. I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this. https://bugs.python.org/issue38356
Motivation for this PR (comment from @vstinner in bpo issue):
The following implementation details for the new method are TBD:
Public vs private
Inclusion in
close()
Name
Coroutine vs subroutine method
timeout parameter
If it's a private method, 3, 4, and 5 are significantly less important.
I started with the most minimal implementation that fixes the dangling threads without modifying the regression tests, which I think is particularly important. I typically try to avoid directly modifying existing tests as much as possible unless it's necessary to do so. However, I am open to changing any part of this.
https://bugs.python.org/issue38356
Automerge-Triggered-By: @asvetlov