-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
test_concurrent_futures.test_interpreter_pool failing #125716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been looking into this for the past hour or so, and I wasn't able to reproduce a segfault, nor do any debuggers detect any foul play. However, I was able to get this assertion to fail after sending a CTRL+C to unittest. That, and running |
Oh, it turns out this is a BUG. I triggered this while implementing the PR. |
The biggest clue is what the USAN buildbot tells us:
The pointer in question is the mutex created in There is a unlikely-but-possible race there in
That may improve the situation. However, it would be best if we could definitively determine why the mutex is NULL when it shouldn't be. |
Something that could possibly be related: I've noticed an issue with exceptions inside subinterpreters created by |
…queues Module (gh-125802) The fix applies to the _interpchannels module as well. I've also included a drive-by typo fix for _interpqueues.
…interpqueues Module (pythongh-125802) The fix applies to the _interpchannels module as well. I've also included a drive-by typo fix for _interpqueues. (cherry picked from commit 44f841f) Co-authored-by: Eric Snow <[email protected]>
…_interpqueues Module (gh-125808) The fix applies to the _interpchannels module as well. I've also included a drive-by typo fix for _interpqueues. (cherry picked from commit 44f841f, AKA gh-125802) Co-authored-by: Eric Snow <[email protected]>
…_interpqueues Module (gh-125803) This includes a drive-by cleanup in _queues_init() and _queues_fini(). This change also applies to the _interpchannels module.
…r The _interpqueues Module (pythongh-125803) This includes a drive-by cleanup in _queues_init() and _queues_fini(). This change also applies to the _interpchannels module. (cherry picked from commit 4848b0b) Co-authored-by: Eric Snow <[email protected]>
…or the _interpqueues Module (gh-125817) This includes a drive-by cleanup in _queues_init() and _queues_fini(). This change also applies to the _interpchannels module. (cherry picked from commit 4848b0b, AKA gh-125803) Co-authored-by: Eric Snow <[email protected]>
It looks like the latest fix has helped. I'll reopen this if there are any intermittent failures. |
…interpqueues Module (pythongh-125802) The fix applies to the _interpchannels module as well. I've also included a drive-by typo fix for _interpqueues.
…r The _interpqueues Module (pythongh-125803) This includes a drive-by cleanup in _queues_init() and _queues_fini(). This change also applies to the _interpchannels module.
Bug report
Bug description:
I've seen 4 kinds of failure which I'm failure sure have the same cause:
WorkerContext.initialize()
(line 137)The failures have happened in different test methods. Different failures have happened during the retry. Sometimes the retry passes. In all cases the architecture is AMD64, but across a variety of builders and non-Windows operating systems. The failures have all been on either refleaks buildbots or the USAN buildbot.
FWIW, it looks like
InterpreterPoolExecutor
has only exposed an underlying problem in the _interpqueues module, which means any fix would need to target 3.13 also (and maybe 3.12).Here are the buildbots where I've seen failures:
Here's the failure text:
segfault
hang 1
hang 2
test failed
USAN
CC @encukou
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: