Skip to content

test_asyncio.test_subprocess leaked dangling threads on PPC64LE Fedora Stable 3.x: #110205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Oct 2, 2023 · 4 comments
Assignees
Labels
tests Tests in the Lib/test dir topic-asyncio

Comments

@vstinner
Copy link
Member

vstinner commented Oct 2, 2023

The machine was very slow (very busy) when the test failed: load avg: 21.37. Also, I noticed that this buildbot builder is now.

PPC64LE Fedora Stable 3.x:

0:11:53 load avg: 21.37 [383/467/1] test.test_asyncio.test_subprocess failed (env changed) (40.0 sec) -- running (8): test_subprocess (1 min 6 sec), test_capi (47.1 sec), test_math (57.5 sec), test_io (46.4 sec), test.test_gdb.test_cfunction (1 min 3 sec), test_weakref (1 min 17 sec), test_unparse (2 min 11 sec), test.test_asyncio.test_tasks (38.9 sec)

(...)

test_close_kill_running (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_close_kill_running) ... ok
test_communicate (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_communicate) ... ok
test_communicate_ignore_broken_pipe (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_communicate_ignore_broken_pipe) ... ok
test_communicate_none_input (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_communicate_none_input) ... ok
test_create_subprocess_env_exec (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_create_subprocess_env_exec) ... ok
test_create_subprocess_env_shell (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_create_subprocess_env_shell) ... ok
test_create_subprocess_exec_text_mode_fails (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_create_subprocess_exec_text_mode_fails) ... ok
test_create_subprocess_exec_with_path (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_create_subprocess_exec_with_path) ... ok
test_create_subprocess_shell_text_mode_fails (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_create_subprocess_shell_text_mode_fails) ... ok
test_devnull_error (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_devnull_error) ... ok
test_devnull_input (test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_devnull_input) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 140735686786240)>
Warning -- Dangling thread: <Thread(waitpid-0, started daemon 140735332413728)>
ok

test_asyncio.test_subprocess passed when re-run in verbose mode.

build: https://buildbot.python.org/all/#/builders/90/builds/3855

Linked PRs

@vstinner vstinner added tests Tests in the Lib/test dir topic-asyncio labels Oct 2, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio Oct 2, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Oct 13, 2023
ThreadedChildWatcher._join_threads() now clears references to
completed threads.

test_asyncio.utils.TestCase now calls
ThreadedChildWatcher._join_threads(), uses SHORT_TIMEOUT to join a
thread, and raise an exception if there are still running threads
after joining all threads.
vstinner added a commit to vstinner/cpython that referenced this issue Oct 13, 2023
ThreadedChildWatcher._join_threads() now clears references to
completed threads.

test_asyncio.utils.TestCase now calls _join_threads() of the watcher,
uses SHORT_TIMEOUT to join a thread, and then raises an exception if
there are still running threads.
vstinner added a commit to vstinner/cpython that referenced this issue Oct 13, 2023
ThreadedChildWatcher._join_threads() now clears references to
completed threads.

Rename also ThreadedChildWatcher threads to add "asyncio-" prefix to
the name.

test_asyncio.utils.TestCase now calls _join_threads() of the watcher,
uses SHORT_TIMEOUT to join a thread, and then raises an exception if
there are still running threads.
vstinner added a commit to vstinner/cpython that referenced this issue Oct 13, 2023
ThreadedChildWatcher._join_threads() now clears references to
completed threads.

test_asyncio.utils.TestCase now calls _join_threads() of the watcher,
uses SHORT_TIMEOUT to join a thread, and then raises an exception if
there are still running threads.

Rename also ThreadedChildWatcher threads to add "asyncio-" prefix to
the name.
@github-project-automation github-project-automation bot moved this from Todo to Done in asyncio Oct 13, 2023
@gvanrossum
Copy link
Member

Sorry, this is not fixed. The PR attempting to fix it was not merged.

@gvanrossum gvanrossum reopened this Oct 13, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in asyncio Oct 13, 2023
@vstinner
Copy link
Member Author

Sorry, this is not fixed. The PR attempting to fix it was not merged.

I'm not interested to fix this issue anymore. If you consider that it's important issue, you can fix it. I close again the issue and unsubscribe from the issue.

@github-project-automation github-project-automation bot moved this from In Progress to Done in asyncio Oct 14, 2023
@gvanrossum
Copy link
Member

Sorry to the audience for the waffling. Victor doesn't want anything to do with this issue any more. But I will fix it (starting with his PR). So reopening one more time (it's not completed yet.)

@gvanrossum
Copy link
Member

Interestingly, this test output quoted above:

Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 140735686786240)>
Warning -- Dangling thread: <Thread(waitpid-0, started daemon 140735332413728)>

(which sets the "environment changed" flag, causing CI to fail) still occurs when _join_threads() is fixed to close all threads. The messages are printed from threading_cleanup() in test/support/threading_helper.py. Looking at the code and the output it's clear that it's the waitpid-0 thread that wasn't there before. The PR renames it to asyncio-waitpid-0, and I still see that output here.

It looks like the work to delete references to the watcher's threads is gone to waste because the code in ThreadedChildWatcher._join_threads() only joins (waits for) non-daemon threads, but the child handler thread is a daemon thread. There was a discussion about this very cleanup when the join code was added -- it was proposed to use self._threads.pop(pid) but that was never implemented. The child watcher thread has been a daemon thread since it was introduced by gh-14344, so I'm not sure that the join code ever did what it was supposed to do.

In conclusion, I think there's an old bug (_join_threads() doesn't join daemon threads, but watcher threads are daemon threads), and and a new bug in this PR (close_loop() used to join all threads, but the refactored logic only waits for non-daemon threads).

We need to ask ourselves, why do we sometimes find a dangling waitpid thread? The only reason I can see would be that the process hasn't exited -- then the thread will be hanging in the os.waitpid() call in ThreadedChildWatcher_do_waitpid(). And how can we prevent this situation from happening, or recover from it? IIUC we have no way to kill a thread, so calling join() on the blocked thread won't help (unless the process is about to exit).

IMO _join_threads() is always a no-op, we might as well get rid of it. One could argue that waiting for the watcher's thread in TestCase.close_loop() is the right thing, possibly with a 30s timeout, as long as it includes daemon threads, which are the only threads created by the watcher. The remaining embellishments currently in the PR aren't needed.

gvanrossum added a commit that referenced this issue Oct 27, 2023
- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 27, 2023
…thonGH-110884)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
(cherry picked from commit c3bb10c)

Co-authored-by: Guido van Rossum <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 27, 2023
…thonGH-110884)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
(cherry picked from commit c3bb10c)

Co-authored-by: Guido van Rossum <[email protected]>
gvanrossum added a commit that referenced this issue Oct 27, 2023
…H-110884) (#111412)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
(cherry picked from commit c3bb10c)

Co-authored-by: Guido van Rossum <[email protected]>
gvanrossum added a commit that referenced this issue Oct 27, 2023
…H-110884) (#111413)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
(cherry picked from commit c3bb10c)

Co-authored-by: Guido van Rossum <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Done in asyncio Oct 29, 2023
iritkatriel pushed a commit to iritkatriel/cpython that referenced this issue Oct 29, 2023
…thon#110884)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…thon#110884)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…thon#110884)

- `ThreadedChildWatcher.close()` is now *officially* a no-op; `_join_threads()` never did anything.
- Threads created by that class are now named `asyncio-waitpid-NNN`.
- `test.test_asyncio.utils.TestCase.close_loop()` now waits for the child watcher's threads, but not forever; if a thread hangs, it raises `RuntimeError`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-asyncio
Projects
Status: Done
Development

No branches or pull requests

2 participants