-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Create a static library mode for pybind11 #4001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d0ef7aa
to
aa7d132
Compare
It seems that the CI doesn't pass because of some cmake/setup.py errors. Will look into it. |
ab34ae1
to
9ddbd1e
Compare
I'm sorry but it is very difficult for me to support this PR. I don't want to stand in the way, but from my perspective, a serious cleanup of the file organization has to come first. What we have at the moment is very far at the "uncontrolled organic growth" side of a spectrum that has "very well organized" at the other end. If we (more-or-less) double the number of source code files by splitting up the headers, we're effectively cementing the current file organization: If we don't have the ability to properly organize now, chances are we're stuck forever, after making that job even harder. From my perspective, representing Google's needs, the smart_holder functionality is a critical requirement. For large-scale deployment, systematic UB (#1138) is not a viable solution. But what is currently on the smart_holder branch has a serious drawback, the dual mode: smart_holder as add-on, or default. Fixing that will require going into the internals. For that to happen in a clean & maintainable way, the general file organization must be improved. Talking about dual mode: if we officially support header-only and precompiled, we have to double the number of CI jobs, to test everything in both modes. Combined with the current smart_holder dual mode, I'd be looking at 4 x, or compromises (usually messy). If we cleanly refactor master first, before splitting up everything, maybe I can build future smart_holder work on that. If the split comes first, I have serious doubts I'll be able to do that. The question then is, who has the ability and free bandwidth to cleanly refactor master, as a solid basis for future developments? |
Hi @rwgk, thank you for the feedback! If I understand you correctly, then a relatively large refactor is needed in order to cleanly merge the smart holder branch. Is this something you or someone else is planning on doing? Because if not, it sounds like the smart holder branch will not be merged anytime soon, making it functionally equivalent to a (closely following) fork. I can't really help with that problem, but obviously I wouldn't want to create problems for the development of the smart holders branch. This is why I haven't actually done the header/inline separation for any file except for detail/common.h: to create the least amount of conflicts as possible. Then the actual splitting of implementations could wait for the upstreaming of the smart holders. IMHO building pybind11 as a library is a pretty important feature for large-scale users and has been requested a lot in the past, as the compile times of pybind11 can get very large. It is also something that could help pybind11 get more module friendly, with c++modules slowly getting production ready. |
A bit of a tangent: the smart_holder branch could be merged as-is, it's an add-on, but it would double the number of CI jobs for full testing, which always stopped me from making a move. With respect to this PR, say we're merging smart_holder, then split the files for the static library, refactoring for future developments (fully baking in the cooperating unique_ptr/shared_ptr & disowning functionality) would be significantly harder.
That something I was wondering about, with a lot of uncertainty, because I have no experience actually working with C++ modules: does splitting the header files make it easier or harder to transition pybind11 to modules? My completely naive guess was: harder, because IIUC modules do away with the .h, .cpp split. Is that a wrong guess? |
I see. The CI issue is unfortunate. IMHO splitting the implementation shouldn't make future refactors that much harder - it doesn't duplicate the code, just splits some parts of it to other files. I don't think that makes future development much harder than the actual work - but again I'm also not very familiar with the current code structure and the needed changes. I'm not sure what's your decision making process in this project - so if you discuss this and reach a consensus about how to proceed please let me know 😄.
I don't have a lot of experience myself, but modules make explicit what declarations and definitions get exported and what doesn't. So having this made explicit in the source is nice and makes it much easier to export just the declarations in the future. |
If @henryiii & @Skylion007 want to go ahead I will try to follow with the smart_holder branch. I'm not sure I'll have the free energy to keep up, but maybe I'm too worried. We'll see. If we came together to reorg the code into more reusable units first, that would be an investment in time that I could more easily explain to my management. — I realize my situation is unusual: we (Google) have a distributed build system, the static library gain for us will be minimal, or maybe even negative when also taking human time into account, because of friction due to increased complexity. |
@Skylion007 @henryiii, I'm waiting on an answer from you whether you still support this given @rwgk's objections. There seem to be a problem with the windows tests but I put it on hold for the moment. |
It's best to look at my situation solely as "I won't be able to participate in this effort, and might not be able to follow with the smart_holder branch". I'm thinking it comes down to: if @wjakob supports this step, please go ahead. |
I think we all had decided to support this some time ago, it was just hard to do with many inflight changes also happening. Now we are are down to one (a big one, smart_holder), but that's not as bad as it used to be. I was still hoping to get smart_holder in before making a big change to avoid double work, but I've been hoping that since last year, IIRC. So maybe it's best just to move ahead. The more we can break this up in to incremental stand-alone chucks, the better. Reorganizing the code as well as making parts pre-compilable. |
Slightly random, but the last couple weeks I familiarized myself more with
Seeing
I'm thinking the static library needs to be versioned by not just the pybind11 version number, but also by those macros, and/or even everything that goes into |
03a4749
to
7ff8dcf
Compare
…s a static library and linking against it. This consists of the following changes: 1. CMake infrastructure to support exporting and installing this target, with an option to disable it. 2. Defining some new macros to support this dual mode compilation. 3. Creating one .cc file that is compiled into the new static library. The rest of the code could be refactored in subsequent PRs. 4. Creating tests that compiling against the new target works. Co-authored-by: Ciro Santilli <[email protected]>
f6d6b1e
to
56a6d99
Compare
Those are great questions. The main use case for using this right now is as a vendored subdirectory in a CMake project that builds a dynamic extension. I certainly wouldn't want system package managers or pypi to distribute binary artifacts, IMHO that's a terrible idea for C++ libraries. And between different libraries loaded into the same process I think we have hidden visibility already in the pybind11 namespace macro so I think that should work fine? |
I see, thanks for the explanation. (From past experience: I fear that's exactly what some people will try anyway, but if our stance is "That's your own problem" I'm fine with it.)
I think you're right. (Assuming my current limited understanding isn't too far off.) Something completely different: Could you please not force push, except maybe for rebase on master? Otherwise it's very difficult to see what changed between CI runs. Just keep adding commits. It's very easy to get the view of the accumulated diffs (just click on Files changed), and when PRs are merged the commits are squashed. |
…to work. * Fix typo in pyproject.toml and disable static library in Clang-Tidy CI pass.
for more information, see https://pre-commit.ci
Currently, after battling the Windows CI I gave up on the 'installation' part. Someone who knows CMake better than me could probably make this work, I guess, but maybe it's just not worth the effort given the potential problems and I hope it could unblock this PR. (And I admit that as I usually use bazel, I'm less invested in getting it to work).
Sure, sorry! |
@henryiii I gave up on the CMake installation code and fixed the remaining CI failures, so I think this is (finally) ready for review. |
Potentially stupid question: will this PR fix the really gross symbols in pybind11? I use "nm -D" on my .so file and I get all sorts of strange symbols surrounding my function calls. I want to read the .so file using FFI to integrate my torch code with jax, and these odd symbols cause problems. I have a hack for now, but I am merely curious if this PR is related to my concern. I appreciate your thoughts or redirection to a more appropriate place to ask this question. |
Could you please provide some examples of what you consider gross? My guess, without knowing what you have in mind: probably not.
At first glance it seems counter-intuitive to combine pybind11 and FFI. Again, without knowing anything about your context, I'd guess that building an .so just for your library (i.e. completely independent from pybind11) would be the way to go. |
For example, I define a function
I wanted to compile only a single ".so" file for two purposes: interfacing with Pytorch and Jax. Pytorch reads the pybind11, but Jax requires interfacing with XLA which requires a boilerplate c++ function signature. To add this boilerplate, I wanted to create a c++ function inside of Python to programmatically add this for my library [read: way less for me]. However, compiling the c++-wrapper inside of Python using something like Numba will not work with Pybind11 functions [working numba example here]. As a work around, I am reading the functions in with FFI since this will compile. However, to do this I need to use the symbol names with these strange prefix/suffixes. I am a bit self-conscious this isn't the place to ask this question. I am happy to jump off this PR thread no problem if you like 😄 |
What you're seeing there is known as name mangling: https://en.wikipedia.org/wiki/Name_mangling All C++ functions will have mangled names. You'd have to declare them with
To come back to your original question: No. This PR will not change the type of symbols (name mangled) that you're seeing.
I strongly recommend against it. It's almost certainly bad engineering because it'll create more work than you're hoping to avoid. |
Wow you rock! Thank you for the detailed answer. I totally forgot about the extern necessity. I can see now how this PR is unrelated to what I was asking -- how embarrassing for me. I appreciate your direction here. |
@henryiii @Skylion007 ping |
I had dual conferences last week (a problem with virtual conferences), and I just started teaching a class. Once I get a bit more caught up on writing material, and once we get a patch release out fixing a few important bugs in pybind11, I think we should focus on this. I'm also about to release CLI11 with contributed pre-compile support; might be interesting to compare. |
@royjacobson do you have build time comparisons for a couple big projects, with and without this PR? I forgot where I wrote this before, but splitting up the sources amounts to a permanent tax on development velocity in the future. What are we getting for that tax? From memory, on other PRs I've seen "substantial" or similar, I think that's not sufficient, concrete measurements are important. FWIW: The pybind11 unit tests are probably not a good test case for timings, with defaults ( |
The split sped up build of the gem5 project by 40% from 25 minutes to 15 minutes, reproduction details at: https://gem5.atlassian.net/browse/GEM5-572 |
I don't know if this is true. Splitting between header and source isn't necessarily that much of a tax - it's how C++ was originally developed - and we might gain some really nice simplicity for it. One thing we have to do today that is very painful is pre-declare some things that should be coming from headers, but can't because they'd be cyclical because we force implementations into headers too. If this was split up, some of those ugly pre-declarations will simply go away; we can now include what we want from the headers without getting implementations, too. And, of course, if our tests build faster, then that's a boon to development velocity too. But I'm thinking even without that, it might be at least a trade-off. |
The main thing I'd like to ensure is that we can support C++20/23 modules well, but I'd assume that's easier or similar with this change. |
Wow, nice writeup. I didn't look at the linked information, but just the main page looks convincing. What's your opinion regarding the point I made earlier?
Do you feel you could help with that? |
C++ 20 modules finally go the other way, thanks god. This 1970 punch-card era approach is just terrible, 50 years later. |
I'm afraid I stopped working on gem5 and don't have much time for it now :-( Let's check with the people who I think are still working on the as they also seemed quite interested in this: CC. @giactra and @gabemblack Do you have a more specific description of the refactoring being proposed? |
Nothing specific, I think it comes down to a large number of case-by-case judgements:
I think it'll take a couple people with a lot of experience having done similar things before. I nominate myself, but having someone else actively working on it, with mutual reviews, would be ideal. Estimated effort is "a very intense couple weeks." To not let this slip into an eternal (dying) project, I think it's important that the second person can commit ~full attention until it's Done (released) in one intense push. |
Hello again Ciro :-). I am also no longer working on gem5 seriously, and
will probably not contribute to this. I would recommend contacting Giacomo.
Gabe
…On Mon, Sep 19, 2022, 2:41 PM Ralf W. Grosse-Kunstleve < ***@***.***> wrote:
Do you have a more specific description of the refactoring being proposed?
Nothing specific, I think it comes down to a large number of case-by-case
judgements:
-
Organize into subdirs (e.g. pytypes are currently lumped into one big
file, a directory would make more sense; same for stl.h, stl_binders.h,
probably a bunch more, pybind11.h, cast.h are real fur balls). A great role
model to use as reference: the Boost.Python file organization.
-
Try not to break too many things by (a) making meaningful compromises
and (b) adding backward compatibility headers.
I think it'll take a couple people with a lot of experience having done
similar things before. I nominate myself, but having someone else actively
working on it, with mutual reviews, would be ideal. Estimated effort is "a
very intense couple weeks." To not let this slip into an eternal (dying)
project, I think it's important that the second person can commit ~full
attention until it's Done (released) in one intense push.
—
Reply to this email directly, view it on GitHub
<#4001 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADVD64IFZSOHHVRDX2AENGDV7DMYRANCNFSM5YTLWIUA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description
Adds a new target 'pybind11::static' that allows compiling pybind11 as a static library and linking against it.
@Skylion007 @giactra
This consists of the following changes:
This PR is partially based on the work here: #2445 by @cirosantilli2. A discussion of the motivation is there as well.
Note that the CMake approach taken here is different: The static/header only modes are configurable by selecting a different target, and not by globally configuring the CMake build scripts generation. I find this better and more flexible.
One issue I'm slightly concerned about is the visibility of pybind11 symbols when using the static library. I think it should be fine given the usual
-fvisibility=hidden
override, but please validate me on this.Suggested changelog entry:
Introduce a new ``pybind11::static`` target. When using this target, a static library will be built and statically linked against instead of the default header-only mode. It is expected that using this target will improve compile times. Currently this target is only available when using pybind11 as a CMake subdirectory.