Skip to content

Weird alltoallw segfault when libcuda and btl smcuda are present #7460

Open
@leofang

Description

@leofang

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.0.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Tested with both building from source myself as well as installing from the conda-forge channel. In both cases, the build time flag --with-cuda was set to turn on CUDA awareness.

Please describe the system on which you are running

  • Operating system/version:
    Linux (native) / Linux docker (with CUDA Toolkit and driver installed)
  • Computer hardware:
    N/A
  • Network type:
    (single node)

Details of the problem

This is a summary for the original bug report on mpi4py-fft's issue tracker.

I was running the test suite of mpi4py-fft, and I noticed there's an AssertionError when testing with 2 processes:

# in mpi4py-fft root
$ mpirun -n 2 python tests/test_mpifft.py

and with 4 processes all nonsense started appearing with a segfault:

$ mpirun -n 4 python tests/test_mpifft.py 
[xf03id-srv2:33127] *** Process received signal ***
[xf03id-srv2:33127] Signal: Segmentation fault (11)
[xf03id-srv2:33127] Signal code:  (128)
[xf03id-srv2:33127] Failing at address: (nil)
[xf03id-srv2:33129] *** Process received signal ***
[xf03id-srv2:33129] Signal: Segmentation fault (11)
[xf03id-srv2:33129] Signal code:  (128)
[xf03id-srv2:33129] Failing at address: (nil)
[xf03id-srv2:33127] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f53379fe890]
[xf03id-srv2:33127] [ 1] [xf03id-srv2:33129] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f2829250890]
[xf03id-srv2:33129] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0xb4d3)[0x7f282924c4d3]
[xf03id-srv2:33129] /lib/x86_64-linux-gnu/libpthread.so.0(+0xb4d3)[0x7f53379fa4d3]
[xf03id-srv2:33127] [ 2] [ 2] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_rndv+0x38f)[0x7f27f00cc8ef]
[xf03id-srv2:33129] [ 3] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(+0x10a66)[0x7f27f00c3a66]
[xf03id-srv2:33129] [ 4] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(+0x10c2d)[0x7f27f00c3c2d]
[xf03id-srv2:33129] [ 5] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_component_progress+0x4ea)[0x7f27f1fb5c6a]
[xf03id-srv2:33129] [ 6] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_rndv+0x38f)[0x7f52f28208ef]
[xf03id-srv2:33127] [ 3] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(+0x10a66)[0x7f52f2817a66]
[xf03id-srv2:33127] [ 4] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_pml_ob1.so(+0x10c2d)[0x7f52f2817c2d]
[xf03id-srv2:33127] [ 5] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_component_progress+0x4ea)[0x7f530079ac6a]
[xf03id-srv2:33127] [ 6] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f27fc95688c]
[xf03id-srv2:33129] [ 7] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f530b10488c]
[xf03id-srv2:33127] [ 7] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7f530b10af65]
[xf03id-srv2:33127] [ 8] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libopen-pal.so.40(ompi_sync_wait_mt+0xb5)[0x7f27fc95cf65]
[xf03id-srv2:33129] [ 8] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libmpi.so.40(ompi_request_default_wait_all+0x3bc)[0x7f530b6e251c]
[xf03id-srv2:33127] [ 9] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libmpi.so.40(ompi_request_default_wait_all+0x3bc)[0x7f27fcf3451c]
[xf03id-srv2:33129] [ 9] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_coll_basic.so(mca_coll_basic_alltoallw_intra+0x232)[0x7f27eafa8632]
/home/leofang/.openmpi-4.0.2_cuda_9.2/lib/openmpi/mca_coll_basic.so(mca_coll_basic_alltoallw_intra+0x232)[0x7f52f17af632]
[xf03id-srv2:33127] [10] [xf03id-srv2:33129] [10] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libmpi.so.40(PMPI_Alltoallw+0x23d)[0x7f530b6f87fd]
[xf03id-srv2:33127] [11] /home/leofang/conda_envs/mpi4py-fft_dev2/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-x86_64-linux-gnu.so(+0x10c771)[0x7f530babd771]
[xf03id-srv2:33127] [12] python(_PyCFunction_FastCallDict+0x154)[0x55559d645b44]
[xf03id-srv2:33127] [13] /home/leofang/.openmpi-4.0.2_cuda_9.2/lib/libmpi.so.40(PMPI_Alltoallw+0x23d)[0x7f27fcf4a7fd]
[xf03id-srv2:33129] [11] /home/leofang/conda_envs/mpi4py-fft_dev2/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-x86_64-linux-gnu.so(+0x10c771)[0x7f27fd30f771]
[xf03id-srv2:33129] [12] python(_PyCFunction_FastCallDict+0x154)[0x55b6b887bb44]
[xf03id-srv2:33129] [13] python(+0x1a155c)[0x55559d6d355c]
[xf03id-srv2:33127] [14] python(+0x1a155c)[0x55b6b890955c]
[xf03id-srv2:33129] [14] python(_PyEval_EvalFrameDefault+0x30a)[0x55559d6f87aa]
[xf03id-srv2:33127] [15] python(+0x171a5b)[0x55559d6a3a5b]
[xf03id-srv2:33127] [16] python(_PyEval_EvalFrameDefault+0x30a)[0x55b6b892e7aa]
[xf03id-srv2:33129] [15] python(+0x171a5b)[0x55b6b88d9a5b]
[xf03id-srv2:33129] [16] python(+0x1a1635)[0x55559d6d3635]
[xf03id-srv2:33127] [17] python(+0x1a1635)[0x55b6b8909635]
[xf03id-srv2:33129] [17] python(_PyEval_EvalFrameDefault+0x30a)[0x55559d6f87aa]
[xf03id-srv2:33127] [18] python(_PyEval_EvalFrameDefault+0x30a)[0x55b6b892e7aa]
[xf03id-srv2:33129] [18] python(+0x170cf6)[0x55559d6a2cf6]
[xf03id-srv2:33127] [19] python(+0x170cf6)[0x55b6b88d8cf6]
[xf03id-srv2:33129] [19] python(_PyFunction_FastCallDict+0x1bc)[0x55559d6a416c]
[xf03id-srv2:33127] [20] python(_PyFunction_FastCallDict+0x1bc)[0x55b6b88da16c]
[xf03id-srv2:33129] [20] python(_PyObject_FastCallDict+0x26f)[0x55559d645f0f]
[xf03id-srv2:33127] [21] python(_PyObject_FastCallDict+0x26f)[0x55b6b887bf0f]
[xf03id-srv2:33129] [21] python(_PyObject_Call_Prepend+0x63)[0x55559d64ab33]
[xf03id-srv2:33127] [22] python(_PyObject_Call_Prepend+0x63)[0x55b6b8880b33]
[xf03id-srv2:33129] [22] python(PyObject_Call+0x3e)[0x55559d64594e]
[xf03id-srv2:33127] [23] python(PyObject_Call+0x3e)[0x55b6b887b94e]
[xf03id-srv2:33129] [23] python(+0x15cde7)[0x55559d68ede7]
[xf03id-srv2:33127] [24] python(+0x15cde7)[0x55b6b88c4de7]
[xf03id-srv2:33129] [24] python(_PyObject_FastCallDict+0x8b)[0x55559d645d2b]
[xf03id-srv2:33127] [25] python(_PyObject_FastCallDict+0x8b)[0x55b6b887bd2b]
[xf03id-srv2:33129] [25] python(+0x1a16ae)[0x55559d6d36ae]
[xf03id-srv2:33127] [26] python(+0x1a16ae)[0x55b6b89096ae]
[xf03id-srv2:33129] [26] python(_PyEval_EvalFrameDefault+0x30a)[0x55559d6f87aa]
[xf03id-srv2:33127] [27] python(_PyEval_EvalFrameDefault+0x30a)[0x55b6b892e7aa]
[xf03id-srv2:33129] [27] python(+0x171a5b)[0x55559d6a3a5b]
[xf03id-srv2:33127] [28] python(+0x171a5b)[0x55b6b88d9a5b]
[xf03id-srv2:33129] [28] python(+0x1a1635)[0x55559d6d3635]
[xf03id-srv2:33127] [29] python(+0x1a1635)[0x55b6b8909635]
[xf03id-srv2:33129] [29] python(_PyEval_EvalFrameDefault+0x30a)[0x55559d6f87aa]
[xf03id-srv2:33127] *** End of error message ***
python(_PyEval_EvalFrameDefault+0x30a)[0x55b6b892e7aa]
[xf03id-srv2:33129] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node xf03id-srv2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

We realized it's due to the presence of the smcuda btl, which got activated because we had CUDA driver (libcuda) installed in our test environments, even though none of the code in mpi4py-fft uses GPU. So, by ejecting smcuda everything runs just fine:

# tested N = 1, 2, 4 
$ mpirun -n N --mca btl ^smcuda python tests/test_mpifft.py

My questions:

  1. Is this a known problem with Open MPI's CUDA support?
  2. Based on the segfault trace, it seems smcuda was invoked during the alltoallw() calls (likely from mpi4py-fft's Pencil code). Why does alltoallw() need smcuda even when we don't use GPU?
  3. Is there a better fix other than ejecting smcuda? Could we apply a patch or set some env vars when building Open MPI?

The 3rd question is most urgent, as from conda-forge's maintenance viewpoint this means we probably shouldn't turn on CUDA awareness by default in our Open MPI package, otherwise all non-GPU users and downstream packages (like mpi4py-fft) are all affected.

ps. I should add that oddly mpi4py's test suite runs just fine with any N processes and without ejecting smcuda. We were unable to reproduce the alltoallw segfault on the mpi4py side.

ps2. For why and how CUDA-awareness was turned on in conda-forge's package, see conda-forge/openmpi-feedstock#42 and conda-forge/openmpi-feedstock#54; @jsquyres kindly offered help when we did that.

cc: @dalcinl @mikaem

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions