-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix thread safety for pybind11 loader_life_support #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
53b3919
caa974f
9179d60
0c2bf55
4b7dc7a
fdfff88
d91a4a7
c6720ca
2c07c0d
8a1a59f
dfc94f3
4f25e31
366f40d
b5a0538
ffc52a3
dc7df66
dd8f264
c4c6acb
6ad3de6
d7e3067
638d091
a06f851
5c58953
5f66855
1237bbe
afbc066
fe49b37
5787104
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,47 +31,54 @@ PYBIND11_NAMESPACE_BEGIN(detail) | |
/// A life support system for temporary objects created by `type_caster::load()`. | ||
/// Adding a patient will keep it alive up until the enclosing function returns. | ||
class loader_life_support { | ||
private: | ||
loader_life_support* parent = nullptr; | ||
std::unordered_set<PyObject *> keep_alive; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably better to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Previously the possibility of using PySet was discussed, but that cannot be used since if the type defines a custom hash and equality function then it won't work correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately py::object is not hashable, so that would require intrusive adaptors or similar. Since the control flow here is so limited, I think that the simplest answer is to just use PyObject* here and refcount it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, good point. |
||
|
||
static loader_life_support** get_stack_pp() { | ||
#if defined(WITH_THREAD) | ||
thread_local static loader_life_support* per_thread_stack = nullptr; | ||
laramiel marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It occurred to me that an unfortunate effect of using C++ thread_local is that every single pybind11 extension module will add an additional TLS variable --- that could make the thread-local storage rather large, as there can easily be a large number of extension modules. To avoid that, the Python TLS API could be used instead --- the key that is allocated would have to be stored in the internals struct (which would make the ABI incompatible), or if we really want to maintain ABI compatibility, could be accessed via a separate PyCapsule that is handled in the same way as the existing internals struct. Probably it would make more sense to just break the ABI, and take the opportunity to merge in the other ABI-breaking changes, though I don't have sufficient context to really offer much judgement on that decision. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And why does that concern you? I suspect that many extensions may already use thread_local variables without this concern. The problem with using the python TLS API is that without versioning the internal data structure we're left with the same issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some extensions may already use TLS, but I expect that is only a tiny fraction of them, and presumably it is providing useful functionality. This will add an extra separate TLS variable for every pybind11 extension module (due to the use of -fvisibility=hidden, they will not be merged). As you can easily have hundreds of extension modules loaded in a program, this is effectively adding potentially a very large number of TLS variables, and the memory usage scales with Yes, the Python TLS API only helps if we ensure there is a single key shared by all pybind11 extensions. As far as versioning the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're talking about potentially hundreds of instances, which is admittedly an outlier. As it stands now, I'd rather that we get this in to fix the threading issue, which I see as significant, and then we can rework it with an API version revision to use a shared python TLS variable or a pycapsule. I'd prefer broader consensus w.r.t. the pycapsule encapsulation, though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think hundreds is an outlier. For example, tensorflow alone seems to have ~76. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another issue that occurred to me is that if you call For example, within Google's code base, this could occur if we define some utility function in a pybind_cc_library target that calls There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If in the implementation of such a call you call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that there is already a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just my two cents - I would say that this is a critical bug, of which the impact is far worse than any of the aforementioned side effects/edge cases. The team I work with have taken a fork of pybind11 and applied this fix so that we can move forward - as we have a strong requirement to run IO-bound and computationally expensive functions in parallel. Without this fix, we cannot reliably do so. I would second @laramiel's suggestion to get this fix in, and then rework/optimise later if required. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have any objection to merging this as is, especially given that there appears to be a relatively easy path to revising it to avoid duplicating the TLS variables without introducing an ABI break. |
||
return &per_thread_stack; | ||
#else | ||
static loader_life_support* global_stack = nullptr; | ||
return &global_stack; | ||
#endif | ||
} | ||
|
||
public: | ||
/// A new patient frame is created when a function is entered | ||
loader_life_support() { | ||
get_internals().loader_patient_stack.push_back(nullptr); | ||
loader_life_support** stack = get_stack_pp(); | ||
parent = *stack; | ||
*stack = this; | ||
} | ||
|
||
/// ... and destroyed after it returns | ||
~loader_life_support() { | ||
auto &stack = get_internals().loader_patient_stack; | ||
if (stack.empty()) | ||
loader_life_support** stack = get_stack_pp(); | ||
if (*stack != this) | ||
pybind11_fail("loader_life_support: internal error"); | ||
|
||
auto ptr = stack.back(); | ||
stack.pop_back(); | ||
Py_CLEAR(ptr); | ||
|
||
// A heuristic to reduce the stack's capacity (e.g. after long recursive calls) | ||
if (stack.capacity() > 16 && !stack.empty() && stack.capacity() / stack.size() > 2) | ||
stack.shrink_to_fit(); | ||
*stack = parent; | ||
for (auto* item : keep_alive) | ||
Py_DECREF(item); | ||
} | ||
|
||
/// This can only be used inside a pybind11-bound function, either by `argument_loader` | ||
/// at argument preparation time or by `py::cast()` at execution time. | ||
PYBIND11_NOINLINE static void add_patient(handle h) { | ||
auto &stack = get_internals().loader_patient_stack; | ||
if (stack.empty()) | ||
loader_life_support* frame = *get_stack_pp(); | ||
if (!frame) { | ||
// NOTE: It would be nice to include the stack frames here, as this indicates | ||
// use of pybind11::cast<> outside the normal call framework, finding such | ||
// a location is challenging. Developers could consider printing out | ||
// stack frame addresses here using something like __builtin_frame_address(0) | ||
throw cast_error("When called outside a bound function, py::cast() cannot " | ||
"do Python -> C++ conversions which require the creation " | ||
"of temporary values"); | ||
|
||
auto &list_ptr = stack.back(); | ||
if (list_ptr == nullptr) { | ||
list_ptr = PyList_New(1); | ||
if (!list_ptr) | ||
pybind11_fail("loader_life_support: error allocating list"); | ||
PyList_SET_ITEM(list_ptr, 0, h.inc_ref().ptr()); | ||
} else { | ||
auto result = PyList_Append(list_ptr, h.ptr()); | ||
if (result == -1) | ||
pybind11_fail("loader_life_support: error adding patient"); | ||
} | ||
|
||
if (frame->keep_alive.insert(h.ptr()).second) | ||
Py_INCREF(h.ptr()); | ||
} | ||
}; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
/* | ||
tests/test_thread.cpp -- call pybind11 bound methods in threads | ||
|
||
Copyright (c) 2021 Laramie Leavitt (Google LLC) <[email protected]> | ||
|
||
All rights reserved. Use of this source code is governed by a | ||
BSD-style license that can be found in the LICENSE file. | ||
*/ | ||
|
||
#include <pybind11/cast.h> | ||
laramiel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#include <pybind11/pybind11.h> | ||
|
||
#include <chrono> | ||
#include <thread> | ||
|
||
#include "pybind11_tests.h" | ||
|
||
namespace py = pybind11; | ||
|
||
namespace { | ||
|
||
struct IntStruct { | ||
explicit IntStruct(int v) : value(v) {}; | ||
~IntStruct() { value = -value; } | ||
IntStruct(const IntStruct&) = default; | ||
IntStruct& operator=(const IntStruct&) = default; | ||
|
||
int value; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could add a destructor that modified Separately, perhaps add a note that this test should be run with asan for greater effectiveness. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
}; | ||
|
||
} // namespace | ||
|
||
TEST_SUBMODULE(thread, m) { | ||
|
||
py::class_<IntStruct>(m, "IntStruct").def(py::init([](const int i) { return IntStruct(i); })); | ||
|
||
// implicitly_convertible uses loader_life_support when an implicit | ||
// conversion is required in order to lifetime extend the reference. | ||
// | ||
// This test should be run with ASAN for better effectiveness. | ||
py::implicitly_convertible<int, IntStruct>(); | ||
|
||
m.def("test", [](int expected, const IntStruct &in) { | ||
{ | ||
py::gil_scoped_release release; | ||
std::this_thread::sleep_for(std::chrono::milliseconds(5)); | ||
} | ||
|
||
if (in.value != expected) { | ||
throw std::runtime_error("Value changed!!"); | ||
} | ||
}); | ||
|
||
m.def( | ||
"test_no_gil", | ||
[](int expected, const IntStruct &in) { | ||
std::this_thread::sleep_for(std::chrono::milliseconds(5)); | ||
if (in.value != expected) { | ||
throw std::runtime_error("Value changed!!"); | ||
} | ||
}, | ||
py::call_guard<py::gil_scoped_release>()); | ||
|
||
// NOTE: std::string_view also uses loader_life_support to ensure that | ||
laramiel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// the string contents remain alive, but that's a C++ 17 feature. | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
import threading | ||
|
||
from pybind11_tests import thread as m | ||
|
||
|
||
class Thread(threading.Thread): | ||
def __init__(self, fn): | ||
super(Thread, self).__init__() | ||
self.fn = fn | ||
self.e = None | ||
|
||
def run(self): | ||
try: | ||
for i in range(10): | ||
self.fn(i, i) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can I avoid implicit cast (and triggering life support) by changing this line to |
||
except Exception as e: | ||
self.e = e | ||
|
||
def join(self): | ||
super(Thread, self).join() | ||
if self.e: | ||
raise self.e | ||
|
||
|
||
def test_implicit_conversion(): | ||
laramiel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a = Thread(m.test) | ||
b = Thread(m.test) | ||
c = Thread(m.test) | ||
for x in [a, b, c]: | ||
x.start() | ||
for x in [c, b, a]: | ||
x.join() | ||
|
||
|
||
def test_implicit_conversion_no_gil(): | ||
laramiel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a = Thread(m.test_no_gil) | ||
b = Thread(m.test_no_gil) | ||
c = Thread(m.test_no_gil) | ||
for x in [a, b, c]: | ||
x.start() | ||
for x in [c, b, a]: | ||
x.join() |
Uh oh!
There was an error while loading. Please reload this page.