Skip to content

[BUG]: Segmentation Fault 11 w/ Conda + Pybind11 #3907

Closed
@coreyjadams

Description

@coreyjadams

Required prerequisites

  • Make sure you've read the documentation. Your issue may be addressed there.
    Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
    Consider asking first in the Gitter chat room or in a Discussion.

Problem description

I have a segmentation fault on macos that only appears using the conda builds of python. I haven't been able to solve this one myself, sorry.

In short: When using the package I've built with pybind11, I can not import the libraries from python without a segfault. I've verified this with python 3.6, 3.9, 3.10, and using the latest version of pybind11. I have a stand-alone repository that reproduces this bug.

Here is the stack track when running with lldb, it appears to be related to take_gil

>>> import larcv
Process 24818 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x00000001041afc17 libpython3.10.dylib`take_gil + 71
libpython3.10.dylib`take_gil:
->  0x1041afc17 <+71>: movq   0x10(%rax), %r13
    0x1041afc1b <+75>: leaq   0x1b0(%r13), %r12
    0x1041afc22 <+82>: movq   %r12, %rdi
    0x1041afc25 <+85>: callq  0x1042e1212               ; symbol stub for: pthread_mutex_lock
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x00000001041afc17 libpython3.10.dylib`take_gil + 71
    frame #1: 0x0000000104226230 libpython3.10.dylib`PyGILState_Ensure + 48
    frame #2: 0x0000000101e495df pylarcv.cpython-310-darwin.so`___lldb_unnamed_symbol1$$pylarcv.cpython-310-darwin.so + 63
    frame #3: 0x0000000101e490a6 pylarcv.cpython-310-darwin.so`PyInit_pylarcv + 118
    frame #4: 0x00000001001fd17e python`_imp_create_dynamic + 1486
    frame #5: 0x00000001000e75a5 python`cfunction_vectorcall_FASTCALL + 85
    frame #6: 0x00000001001b2b9a python`_PyEval_EvalFrameDefault + 2986
    frame #7: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #8: 0x00000001001c16ee python`call_function + 174
    frame #9: 0x00000001001b8fec python`_PyEval_EvalFrameDefault + 28668
    frame #10: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #11: 0x00000001001c16ee python`call_function + 174
    frame #12: 0x00000001001b79b2 python`_PyEval_EvalFrameDefault + 22978
    frame #13: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #14: 0x00000001001c16ee python`call_function + 174
    frame #15: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #16: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #17: 0x00000001001c16ee python`call_function + 174
    frame #18: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #19: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #20: 0x00000001001c16ee python`call_function + 174
    frame #21: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #22: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #23: 0x000000010008577b python`object_vacall + 427
    frame #24: 0x0000000100085a29 python`_PyObject_CallMethodIdObjArgs + 249
    frame #25: 0x00000001001f8a64 python`PyImport_ImportModuleLevelObject + 3076
    frame #26: 0x00000001001b8410 python`_PyEval_EvalFrameDefault + 25632
    frame #27: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #28: 0x00000001001aa979 python`builtin_exec + 345
    frame #29: 0x00000001000e75a5 python`cfunction_vectorcall_FASTCALL + 85
    frame #30: 0x00000001001b2b9a python`_PyEval_EvalFrameDefault + 2986
    frame #31: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #32: 0x00000001001c16ee python`call_function + 174
    frame #33: 0x00000001001b8fec python`_PyEval_EvalFrameDefault + 28668
    frame #34: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #35: 0x00000001001c16ee python`call_function + 174
    frame #36: 0x00000001001b79b2 python`_PyEval_EvalFrameDefault + 22978
    frame #37: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #38: 0x00000001001c16ee python`call_function + 174
    frame #39: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #40: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #41: 0x00000001001c16ee python`call_function + 174
    frame #42: 0x00000001001b6cd3 python`_PyEval_EvalFrameDefault + 19683
    frame #43: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #44: 0x000000010008577b python`object_vacall + 427
    frame #45: 0x0000000100085a29 python`_PyObject_CallMethodIdObjArgs + 249
    frame #46: 0x00000001001f8a64 python`PyImport_ImportModuleLevelObject + 3076
    frame #47: 0x00000001001b8410 python`_PyEval_EvalFrameDefault + 25632
    frame #48: 0x00000001001b0588 python`_PyEval_Vector + 376
    frame #49: 0x00000001002277a9 python`PyRun_InteractiveOneObjectEx + 1049
    frame #50: 0x000000010022640a python`_PyRun_InteractiveLoopObject + 122
    frame #51: 0x0000000100225cbf python`_PyRun_AnyFileObject + 63
    frame #52: 0x000000010022a106 python`PyRun_AnyFileExFlags + 118
    frame #53: 0x0000000100250f2f python`pymain_run_stdin + 175
    frame #54: 0x000000010025057d python`pymain_run_python + 509
    frame #55: 0x0000000100250335 python`Py_RunMain + 37
    frame #56: 0x0000000100251910 python`pymain_main + 64
    frame #57: 0x00000001000026d8 python`main + 56
    frame #58: 0x000000010049a51e dyld`start + 462

Reproducible example code

This repository can reproduce the bug.  Sorry if you wanted something smaller, this is about as small as I can make it, and it is nearly stand alone - obviously, you need conda to run it...

[git@github.com:coreyjadams/larcv3-pybind11-example.git](git@github.com:coreyjadams/larcv3-pybind11-example.git)

To replicate the bug, you need to be on Mac OS (I am on Monteray, the latest) and using miniconda.  I created an environment for each test I did:


conda create -n test-env-python-3.10 # Accept any questions, etc
conda activate test-env-python-3.10 # Activate the environment
conda install python=3.10 cmake scikit-build # The dependencies are just build systems.


Then, after cloning the repository I linked above, one can do:
```bash
git submodule update --init # pybind11 is a submodule here
python setup.py build # Trigger scikit-build to run cmake
python setup.py install

From a different directory (otherwise, it tries to import the larcv folder in the repo), do:

>>> import larcv

And it ought to reproduce the crash.

Activity

henryiii

henryiii commented on May 11, 2022

@henryiii
Collaborator

Conda doesn't support building from python, only from Conda-build. You are likely mixing the system compilers and the conda compilers, causing the crash. Try conda install compilers - that might get it to use the conda compilers (make sure you remove any caching, like _skbuild).

wolfv

wolfv commented on Jun 29, 2022

@wolfv

I do see this issue as well on macOS x64 -- but I am pretty sure I am using the conda compilers :)

I tried to add -undefined dynamic_lookup which helped in the past, and I tried to remove the CMAKE_STRIP step, but none of that helped so far. Will investigate further.

It's failing for us for rclpy which is a dependency of ROS, the robot operating system. Same exact error.

wolfv

wolfv commented on Jun 29, 2022

@wolfv

Hm, I managed to replicate the issue with your example larcv code.
The problem seems to boil down to not explicitly link Python in the lower level libraries (or anywhere) and to trust "-undefined dynamic_lookup".

I've added

set_target_properties(larcv3 PROPERTIES
                      LINK_FLAGS "-undefined dynamic_lookup")

and removed any instances of linking to ${Python_LIBRARIES} and things then seem to work. I think the pybind11_add_module automatically sets that linker flag already.

coreyjadams

coreyjadams commented on Jun 29, 2022

@coreyjadams
Author

@wolfv thanks for this tip! I will test it out tomorrow and get back to you, that'd be awesome to have this resolved.

wolfv

wolfv commented on Jun 29, 2022

@wolfv

In my case, pybind11_add_module(blabla SHARED ...) did not work, however pybind11_add_module(blabla MODULE ...) works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNew bug, unverified

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @wolfv@henryiii@coreyjadams

        Issue actions

          [BUG]: Segmentation Fault 11 w/ Conda + Pybind11 · Issue #3907 · pybind/pybind11