-
Notifications
You must be signed in to change notification settings - Fork 900
Bus error with btl/sm+XPMEM in MPI_Finalize() #9868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gkatev this is indeed due to xpmem. unlike other shared memory mechanisms used by open mpi, xpmem is different in that the mappings exported by a process to other attaching processes (and subsequently mapped in to the virtual address space of these attaching processes) become invalid once the exporting process exits. This is different from the typical shared memory using system V, posix, or memory mapped files. We could make sm smarter by mapping the mail boxes to memory mapped files rather than using xpmem, even when xpmem is available. |
@gkatev how many ranks/nodes are you using when you see this problem? |
160 ranks/cores on the arm64 system, and 64 on the x86 one. Regarding the similar (or same(?)) issue I remember seeing in the past, that would occur on the 64-core system, but not on another 32-core one (or maybe rarely(?)), so it does indeed sound like that could be a factor -- I will see if I can reproduce it with less cores/ranks. |
no need. I have access to a aarch64/tx2 system with that kind of core count/node. its sort of a race condition so if you use fewer ranks/node you're less likely to observe this. |
I think the issue is in ompi_mpi_finalize. We cannot wait for the PMIX barrier while calling opal_progress, because we will be subject to the kind of issues put forward by the XPMEM support, where a remote process release the memory used by the local process polls while we are still actively calling BTL progress functions. The simplest fix is to add a second PMIX barrier, one where we wait for the completion without calling opal_progress (because we know that the first barrier already drained the network of all MPI related messages). |
I don't like this fix as it will not work with |
I'm not sure how this has anything to do with sessions as in session_finalize you are not tearing down the BTL infrastructure. |
xpmem has different behvior than other shared memory support mechanisms. in particular, any xpmem-attached regions in a process will become invalid once the exporting process exits. Under certain circumstances, this behavior can result in SIGBUS errors during mpi finalize. Related to open-mpi#9868 Signed-off-by: Howard Pritchard <[email protected]>
Yes, as far as I can tell that fixes the problem, I no longer see it on either of my systems. |
xpmem has different behvior than other shared memory support mechanisms. in particular, any xpmem-attached regions in a process will become invalid once the exporting process exits. Under certain circumstances, this behavior can result in SIGBUS errors during mpi finalize. Related to open-mpi#9868 Signed-off-by: Howard Pritchard <[email protected]>
xpmem has different behvior than other shared memory support mechanisms. in particular, any xpmem-attached regions in a process will become invalid once the exporting process exits. Under certain circumstances, this behavior can result in SIGBUS errors during mpi finalize. Related to open-mpi#9868 Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit 8bac539)
Hi, I've been seeing some crashes related to btl/sm and XPMEM, during MPI_FInalize().
Environment:
Example execution:
Backtrace:
I claim that it is related to XPMEM, because if I set
smsc=cma
it goes away (and because it is XPMEM that traditionally triggers bus errors?). This is an aarch64 system, but I can also reproduce the error in an x86 one. For what it's worth, I do remember a similar (or same?) bug even beforesmsc
's time, so it might not be directly related to smsc.The text was updated successfully, but these errors were encountered: