Skip to content

test_concurrent_futures.test_interpreter_pool failing #125716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericsnowcurrently opened this issue Oct 18, 2024 · 5 comments
Closed

test_concurrent_futures.test_interpreter_pool failing #125716

ericsnowcurrently opened this issue Oct 18, 2024 · 5 comments
Assignees
Labels
3.14 bugs and security fixes topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Oct 18, 2024

Bug report

Bug description:

I've seen 4 kinds of failure which I'm failure sure have the same cause:

  • segfault during WorkerContext.initialize() (line 137)
  • hanging
  • weird test failure
  • undefined behavior on USAN buildbot

The failures have happened in different test methods. Different failures have happened during the retry. Sometimes the retry passes. In all cases the architecture is AMD64, but across a variety of builders and non-Windows operating systems. The failures have all been on either refleaks buildbots or the USAN buildbot.

FWIW, it looks like InterpreterPoolExecutor has only exposed an underlying problem in the _interpqueues module, which means any fix would need to target 3.13 also (and maybe 3.12).


Here are the buildbots where I've seen failures:

  • AMD64 RHEL8 Refleaks 3.x
  • AMD64 FreeBSD Refleaks 3.x
  • AMD64 CentOS9 NoGIL Refleaks 3.x
  • AMD64 Arch Linux Usan Function 3.x

Here's the failure text:

segfault
test_submit (test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_submit) ... Fatal Python error:

Segmentation fault
Current thread 0x00007fb4fbfff700 (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/concurrent/futures/interpreter.py", line 137 in initialize
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/concurrent/futures/thread.py", line 98 in _worker
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/threading.py", line 992 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007fb521d71240 (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/threading.py", line 359 in wait
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/concurrent/futures/_base.py", line 443 in result
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/test_concurrent_futures/executor.py", line 31 in test_submit
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/case.py", line 606 in _callTestMethod
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/case.py", line 660 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/case.py", line 716 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/unittest/runner.py", line 240 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 135 in test_func
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/refleak.py", line 132 in runtest_refleak
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 87 in regrtest_runner
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 138 in _load_run_test
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 181 in _runtest_env_changed_exc
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 281 in _runtest
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/single.py", line 310 in run_single_test
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/worker.py", line 83 in worker_process
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/worker.py", line 118 in main
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/test/libregrtest/worker.py", line 122 in <module>
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/runpy.py", line 88 in _run_code
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.refleak/build/Lib/runpy.py", line 198 in _run_module_as_main
hang 1
test_submit_exception_in_func (test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_submit_exception_in_func) ... Timeout (3:20:00)!

Thread 0x000000082e546e00 (most recent call first):  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/concurrent/futures/interpreter.py", line 190 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/concurrent/futures/thread.py", line 85 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/concurrent/futures/thread.py", line 118 in _worker
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/threading.py", line 992 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x0000000825a7c000 (most recent call first):
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/threading.py", line 359 in wait
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/concurrent/futures/_base.py", line 443 in result
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/test_concurrent_futures/test_interpreter_pool.py", line 251 in test_submit_exception_in_func
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/case.py", line 606 in _callTestMethod
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/case.py", line 660 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/case.py", line 716 in __call__
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/unittest/runner.py", line 240 in run
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 135 in test_func
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/refleak.py", line 132 in runtest_refleak
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 87 in regrtest_runner
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 138 in _load_run_test
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 181 in _runtest_env_changed_exc
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 281 in _runtest
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/single.py", line 310 in run_single_test
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/worker.py", line 83 in worker_process
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/worker.py", line 118 in main
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/test/libregrtest/worker.py", line 122 in <module>
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/runpy.py", line 88 in _run_code
  File "/buildbot/buildarea/3.x.ware-freebsd.refleak/build/Lib/runpy.py", line 198 in _run_module_as_main
hang 2
test_shutdown_race_issue12456 (test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_shutdown_race_issue12456) ... Exception in initializer:
RuntimeError: Failed to import encodings module

During handling of the above exception, another exception occurred:

interpreters.Interpreter Error: sub-interpreter creation failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 98, in _worker
    ctx.initialize()
    ~~~~~~~~~~~~~~^^
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/interpreter.py", line 131, in initialize
    self.interpid = _interpreters.create(reqrefs=True)
                    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
interpreters.InterpreterError: interpreter creation failed
Timeout (0:45:00)!

Thread 0x00007fe3febe2640 (most recent call first):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 115 in _worker
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 992 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007fe3fcbda640 (most recent call first):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 115 in _worker
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 992 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007fe3e77fe640 (most recent call first):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 115 in _worker
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 992 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007fe3e7fff640 (most recent call first):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 115 in _worker
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 992 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007fe3ffe6f740 (most recent call first):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/threading.py", line 1092 in join
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/concurrent/futures/thread.py", line 272 in shutdown
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/test_concurrent_futures/executor.py", line 79 in test_shutdown_race_issue12456
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/case.py", line 606 in _callTestMethod
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/case.py", line 660 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/case.py", line 716 in __call__
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/unittest/runner.py", line 240 in run
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 135 in test_func
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/refleak.py", line 132 in runtest_refleak
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 87 in regrtest_runner
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 138 in _load_run_test
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 181 in _runtest_env_changed_exc
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 281 in _runtest
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/single.py", line 310 in run_single_test
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 83 in worker_process
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 118 in main
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/libregrtest/worker.py", line 122 in <module>
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/runpy.py", line 88 in _run_code
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/runpy.py", line 198 in _run_module_as_main
test failed
======================================================================
FAIL: test_free_reference (test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_free_reference)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.itamaro-centos-aws.refleak.nogil/build/Lib/test/test_concurrent_futures/executor.py", line 132, in test_free_reference
    self.assertIsNone(wr())
    ~~~~~~~~~~~~~~~~~^^^^^^
AssertionError: <test.test_concurrent_futures.executor.MyObject object at 0x200121a00a0> is not None
USAN
test_map_exception (test.test_concurrent_futures.test_interpreter_pool.InterpreterPoolExecutorTest.test_map_exception) ... Python/thread_pthread.h:555:42: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/semaphore.h:55:36: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Python/thread_pthread.h:555:42 in 
Fatal Python error: Segmentation fault

Thread 0x00007f89aedfd6c0 (most recent call first):  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/interpreter.py", line 131 in initialize
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/thread.py", line 98 in _worker
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 992 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007f89adcfb6c0 (most recent call first):  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/interpreter.py", line 131 in initialize
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/thread.py", line 98 in _worker
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 992 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1012 in _bootstrap

Current thread 0x00007f89af7fe6c0 (most recent call first):  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/interpreter.py", line 137 in initialize
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/thread.py", line 98 in _worker
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 992 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007f89ad4fa6c0 (most recent call first):  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/interpreter.py", line 131 in initialize
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/thread.py", line 98 in _worker
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 992 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1041 in _bootstrap_inner
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 1012 in _bootstrap

Thread 0x00007f89b64500c0 (most recent call first):
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/threading.py", line 359 in wait
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/_base.py", line 443 in result
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/_base.py", line 309 in _result_or_cancel
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/concurrent/futures/_base.py", line 611 in result_iterator
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/test_concurrent_futures/executor.py", line 54 in test_map_exception
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/case.py", line 606 in _callTestMethod
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/case.py", line 660 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/case.py", line 716 in __call__
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/suite.py", line 122 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/suite.py", line 84 in __call__
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/unittest/runner.py", line 240 in run
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 57 in _run_suite
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 37 in run_unittest
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 135 in test_func
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 91 in regrtest_runner
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 138 in _load_run_test
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 181 in _runtest_env_changed_exc
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 281 in _runtest
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/single.py", line 310 in run_single_test
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/worker.py", line 83 in worker_process
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/worker.py", line 118 in main
  File "/buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Lib/test/libregrtest/worker.py", line 122 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: _testinternalcapi (total: 1)
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==2260212==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x03e800227cf4 (pc 0x7f89b64e1194 bp 0x7f89af7fd370 sp 0x7f89af7fd330 T2260241)
==2260212==The signal is caused by a READ memory access.
    #0 0x7f89b64e1194  (/usr/lib/libc.so.6+0x90194) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #1 0x7f89b648dd6f in raise (/usr/lib/libc.so.6+0x3cd6f) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #2 0x555a15c75b3d in faulthandler_fatal_error /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Modules/faulthandler.c:338:5
    #3 0x7f89b648de1f  (/usr/lib/libc.so.6+0x3ce1f) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #4 0x7f89b64e7504 in sem_wait (/usr/lib/libc.so.6+0x96504) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #5 0x555a15c3e86b in PyThread_acquire_lock_timed /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Python/thread_pthread.h:555:33
    #6 0x7f89b47308c6 in _queues_add /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Modules/_interpqueuesmodule.c:909:5
    #7 0x7f89b47308c6 in queue_create /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Modules/_interpqueuesmodule.c:1103:19
    #8 0x7f89b47308c6 in queuesmod_create /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Modules/_interpqueuesmodule.c:1487:19
    #9 0x555a157a3caa in cfunction_call /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Objects/methodobject.c:551:18
    #10 0x555a15677105 in _PyObject_MakeTpCall /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Objects/call.c:242:18
    #11 0x555a15a556ef in _PyEval_EvalFrameDefault /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Python/generated_cases.c.h:2759:35
    #12 0x555a15680c7c in _PyObject_VectorcallTstate /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Include/internal/pycore_call.h:167:11
    #13 0x555a1567e32f in method_vectorcall /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Objects/classobject.c:71:20
    #14 0x555a15da9af5 in thread_run /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/./Modules/_threadmodule.c:337:21
    #15 0x555a15c3f5f3 in pythread_wrapper /buildbot/buildarea/3.x.pablogsal-arch-x86_64.clang-ubsan-function/build/Python/thread_pthread.h:242:5
    #16 0x7f89b64df1ce  (/usr/lib/libc.so.6+0x8e1ce) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #17 0x7f89b65606eb  (/usr/lib/libc.so.6+0x10f6eb) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV (/usr/lib/libc.so.6+0x90194) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
==2260212==ABORTING

CC @encukou

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@ZeroIntensity
Copy link
Member

I've been looking into this for the past hour or so, and I wasn't able to reproduce a segfault, nor do any debuggers detect any foul play.

However, I was able to get this assertion to fail after sending a CTRL+C to unittest. That, and running test_interpreter_pool with a TSan build causes it to absolutely explode with errors, but I wouldn't be surprised if they were all false positives--TSan is far from perfect (do we even support it on GIL-ful builds?)

@rruuaanng
Copy link
Contributor

Oh, it turns out this is a BUG. I triggered this while implementing the PR.

@ericsnowcurrently
Copy link
Member Author

The biggest clue is what the USAN buildbot tells us:

  • a NULL pointer is being passed to sem_wait()
  • the pointer points to uninitialized/deallocated memory
  • the failure happens when the executor's worker context is initialized and calls _interpqueues.create()
Python/thread_pthread.h:555:42: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/semaphore.h:55:36: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Python/thread_pthread.h:555:42 in 

Current thread 0x00007f89af7fe6c0 (most recent call first):
  File "Lib/concurrent/futures/interpreter.py", line 137 in initialize

==2260212==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x03e800227cf4
==2260212==The signal is caused by a READ memory access.
    #4 0x7f89b64e7504 in sem_wait (/usr/lib/libc.so.6+0x96504) (BuildId: 915eeec6439cfded1125deefc44a8d73e57873d9)
    #5 0x555a15c3e86b in PyThread_acquire_lock_timed Python/thread_pthread.h:555:33
    #6 0x7f89b47308c6 in _queues_add Modules/_interpqueuesmodule.c:909:5
    #7 0x7f89b47308c6 in queue_create Modules/_interpqueuesmodule.c:1103:19
    #8 0x7f89b47308c6 in queuesmod_create Modules/_interpqueuesmodule.c:1487:19

The pointer in question is the mutex created in _globals_init(), which is called by the module exec function the first time the module is loaded. The mutex is cleared (in _globals_fini()) when the last copy of the module is cleared.

There is a unlikely-but-possible race there in _globals_init() with the module count, which may play a part here. It would make sense to do the following:

  • make the lock a PyMutex
  • use atomic operations for the module count

That may improve the situation. However, it would be best if we could definitively determine why the mutex is NULL when it shouldn't be.

@ZeroIntensity
Copy link
Member

Something that could possibly be related: I've noticed an issue with exceptions inside subinterpreters created by _interpreters that causes them to finalize earlier than they should--depending on what's going on, that could explain the NULL mutex. I'm investigating to see if that's an issue on my end, or something deeper.

ericsnowcurrently added a commit that referenced this issue Oct 21, 2024
…queues Module (gh-125802)

The fix applies to the _interpchannels module as well.

I've also included a drive-by typo fix for _interpqueues.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2024
…interpqueues Module (pythongh-125802)

The fix applies to the _interpchannels module as well.

I've also included a drive-by typo fix for _interpqueues.
(cherry picked from commit 44f841f)

Co-authored-by: Eric Snow <[email protected]>
ericsnowcurrently added a commit that referenced this issue Oct 21, 2024
…_interpqueues Module (gh-125808)

The fix applies to the _interpchannels module as well.

I've also included a drive-by typo fix for _interpqueues.

(cherry picked from commit 44f841f, AKA gh-125802)

Co-authored-by: Eric Snow <[email protected]>
ericsnowcurrently added a commit that referenced this issue Oct 21, 2024
…_interpqueues Module (gh-125803)

This includes a drive-by cleanup in _queues_init() and _queues_fini().

This change also applies to the _interpchannels module.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2024
…r The _interpqueues Module (pythongh-125803)

This includes a drive-by cleanup in _queues_init() and _queues_fini().

This change also applies to the _interpchannels module.
(cherry picked from commit 4848b0b)

Co-authored-by: Eric Snow <[email protected]>
ericsnowcurrently added a commit that referenced this issue Oct 21, 2024
…or the _interpqueues Module (gh-125817)

This includes a drive-by cleanup in _queues_init() and _queues_fini().

This change also applies to the _interpchannels module.

(cherry picked from commit 4848b0b, AKA gh-125803)

Co-authored-by: Eric Snow <[email protected]>
@ericsnowcurrently
Copy link
Member Author

It looks like the latest fix has helped. I'll reopen this if there are any intermittent failures.

@github-project-automation github-project-automation bot moved this from Todo to Done in Subinterpreters Oct 22, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…interpqueues Module (pythongh-125802)

The fix applies to the _interpchannels module as well.

I've also included a drive-by typo fix for _interpqueues.
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…r The _interpqueues Module (pythongh-125803)

This includes a drive-by cleanup in _queues_init() and _queues_fini().

This change also applies to the _interpchannels module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 bugs and security fixes topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

3 participants