Skip to content

Bus error with btl/sm+XPMEM in MPI_Finalize() #9868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gkatev opened this issue Jan 13, 2022 · 10 comments
Closed

Bus error with btl/sm+XPMEM in MPI_Finalize() #9868

gkatev opened this issue Jan 13, 2022 · 10 comments
Assignees
Milestone

Comments

@gkatev
Copy link
Contributor

gkatev commented Jan 13, 2022

Hi, I've been seeing some crashes related to btl/sm and XPMEM, during MPI_FInalize().

Environment:

Open MPI 5.0.x (#b640590) (from git)
CentOS 8, aarch64

Example execution:

$(which mpirun) --host localhost:160 --mca coll basic,libnbc --mca pml ob1 --mca btl sm,self --mca smsc xpmem osu_bcast

Backtrace:

Program terminated with signal SIGBUS, Bus error.
(gdb) bt
#0  0x0000ffffae4ed550 in mca_btl_sm_check_fboxes () at ../../../../opal/mca/btl/sm/btl_sm_fbox.h:241
#1  mca_btl_sm_component_progress () at btl_sm_component.c:578
#2  0x0000ffffae4717a8 in opal_progress () at runtime/opal_progress.c:224
#3  0x0000ffffaeaa625c in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:299
#4  0x00000000004019f0 in main (argc=<optimized out>, argv=<optimized out>) at osu_bcast.c:119

I claim that it is related to XPMEM, because if I set smsc=cma it goes away (and because it is XPMEM that traditionally triggers bus errors?). This is an aarch64 system, but I can also reproduce the error in an x86 one. For what it's worth, I do remember a similar (or same?) bug even before smsc's time, so it might not be directly related to smsc.

@hppritcha
Copy link
Member

@gkatev this is indeed due to xpmem. unlike other shared memory mechanisms used by open mpi, xpmem is different in that the mappings exported by a process to other attaching processes (and subsequently mapped in to the virtual address space of these attaching processes) become invalid once the exporting process exits. This is different from the typical shared memory using system V, posix, or memory mapped files.

We could make sm smarter by mapping the mail boxes to memory mapped files rather than using xpmem, even when xpmem is available.

@hppritcha hppritcha self-assigned this Jan 13, 2022
@hppritcha
Copy link
Member

@gkatev how many ranks/nodes are you using when you see this problem?

@jsquyres jsquyres added this to the v5.0.0 milestone Jan 13, 2022
@gkatev
Copy link
Contributor Author

gkatev commented Jan 13, 2022

160 ranks/cores on the arm64 system, and 64 on the x86 one. Regarding the similar (or same(?)) issue I remember seeing in the past, that would occur on the 64-core system, but not on another 32-core one (or maybe rarely(?)), so it does indeed sound like that could be a factor -- I will see if I can reproduce it with less cores/ranks.

@hppritcha
Copy link
Member

no need. I have access to a aarch64/tx2 system with that kind of core count/node. its sort of a race condition so if you use fewer ranks/node you're less likely to observe this.

@bosilca
Copy link
Member

bosilca commented Jan 13, 2022

I think the issue is in ompi_mpi_finalize. We cannot wait for the PMIX barrier while calling opal_progress, because we will be subject to the kind of issues put forward by the XPMEM support, where a remote process release the memory used by the local process polls while we are still actively calling BTL progress functions.

The simplest fix is to add a second PMIX barrier, one where we wait for the completion without calling opal_progress (because we know that the first barrier already drained the network of all MPI related messages).

@hppritcha
Copy link
Member

I don't like this fix as it will not work with mpi_session_finalize

@bosilca
Copy link
Member

bosilca commented Jan 13, 2022

I'm not sure how this has anything to do with sessions as in session_finalize you are not tearing down the BTL infrastructure.

hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 14, 2022
xpmem has different behvior than other shared memory support mechanisms.
in particular, any xpmem-attached regions in a process will become invalid
once the exporting process exits.

Under certain circumstances, this behavior can result in SIGBUS errors
during mpi finalize.

Related to open-mpi#9868

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha
Copy link
Member

@gkatev could you give #9880 a try?

@gkatev
Copy link
Contributor Author

gkatev commented Jan 17, 2022

Yes, as far as I can tell that fixes the problem, I no longer see it on either of my systems.

hppritcha added a commit to hppritcha/ompi that referenced this issue Feb 1, 2022
xpmem has different behvior than other shared memory support mechanisms.
in particular, any xpmem-attached regions in a process will become invalid
once the exporting process exits.

Under certain circumstances, this behavior can result in SIGBUS errors
during mpi finalize.

Related to open-mpi#9868

Signed-off-by: Howard Pritchard <[email protected]>
hppritcha added a commit to hppritcha/ompi that referenced this issue Feb 2, 2022
xpmem has different behvior than other shared memory support mechanisms.
in particular, any xpmem-attached regions in a process will become invalid
once the exporting process exits.

Under certain circumstances, this behavior can result in SIGBUS errors
during mpi finalize.

Related to open-mpi#9868

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit 8bac539)
@hppritcha
Copy link
Member

closed via #9954 and #9880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants