You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'd like to report two low-impact issues to do with finalization of mmap shmem areas.
OpenMPI main (#4b39d07)
$ git submodule status
250004266bc046c6303c8531ababdff4e1237525 ../../../../3rd-party/openpmix (v1.1.3-3661-g25000426)
ca2bf3aeab38261ae7c88cea64bc782c949bd76e ../../../../3rd-party/prrte (psrvr-v2.0.0rc1-4517-gca2bf3aeab)
5c8de3d97b763bf8981fb49cbedd36e201b8fc0a ../../../../config/oac (5c8de3d)
Issue 1, btl/sm and opal/shmem finalization order
It looks like btl/sm is finalized after components in opal/shmem, and therefore sm_finalize() is called after the shmem component has been closed, making the calls to opal_shmem_unlink and opal_shmem_segment_detach have no effect.
I'm not sure how straightforward it might be to resolve this, e.g. by delaying the finalization of opal/shmem.
To reproduce:
diff --git a/opal/mca/btl/sm/btl_sm_module.c b/opal/mca/btl/sm/btl_sm_module.c
index 7835742e4f..eb83ff9af4 100644
--- a/opal/mca/btl/sm/btl_sm_module.c+++ b/opal/mca/btl/sm/btl_sm_module.c@@ -345,6 +345,8 @@ static int sm_finalize(struct mca_btl_base_module_t *btl)
free(component->fbox_in_endpoints);
component->fbox_in_endpoints = NULL;
+ printf("sm_finalize() do unlink/detach\n");+
opal_shmem_unlink(&mca_btl_sm_component.seg_ds);
opal_shmem_segment_detach(&mca_btl_sm_component.seg_ds);
diff --git a/opal/mca/shmem/base/shmem_base_close.c b/opal/mca/shmem/base/shmem_base_close.c
index 415ea6c22e..0e69ee721c 100644
--- a/opal/mca/shmem/base/shmem_base_close.c+++ b/opal/mca/shmem/base/shmem_base_close.c@@ -31,6 +31,10 @@
/* ////////////////////////////////////////////////////////////////////////// */
int opal_shmem_base_close(void)
{
+ printf("opal_shmem_base_close()\n");+
/* if there is a selected shmem module, finalize it */
if (NULL != opal_shmem_base_module && NULL != opal_shmem_base_module->module_finalize) {
opal_shmem_base_module->module_finalize();
diff --git a/opal/mca/shmem/base/shmem_base_wrappers.c b/opal/mca/shmem/base/shmem_base_wrappers.c
index b1b0c02f6e..5e8827151e 100644
--- a/opal/mca/shmem/base/shmem_base_wrappers.c+++ b/opal/mca/shmem/base/shmem_base_wrappers.c@@ -59,6 +59,8 @@ void *opal_shmem_segment_attach(opal_shmem_ds_t *ds_buf)
int opal_shmem_segment_detach(opal_shmem_ds_t *ds_buf)
{
if (!opal_shmem_base_selected) {
+ printf("NOT SELECTED\n");+
return OPAL_ERROR;
}
@@ -69,6 +71,8 @@ int opal_shmem_segment_detach(opal_shmem_ds_t *ds_buf)
int opal_shmem_unlink(opal_shmem_ds_t *ds_buf)
{
if (!opal_shmem_base_selected) {
+ printf("NOT SELECTED\n");+
return OPAL_ERROR;
}
$ mpirun -n 2 --mca btl sm,self --output tag osu_bcast -m 4:4 2>&1 | grep 0]
[mpirun-gkpc-701889@1,0]<stdout>:
[mpirun-gkpc-701889@1,0]<stdout>: # OSU MPI_Bcast (data-varying) v7.0
[mpirun-gkpc-701889@1,0]<stdout>: # Size Avg Latency(us)
[mpirun-gkpc-701889@1,0]<stdout>: root = 0
[mpirun-gkpc-701889@1,0]<stdout>: 4 0.40
[mpirun-gkpc-701889@1,0]<stdout>: opal_shmem_base_close()
[mpirun-gkpc-701889@1,0]<stdout>: sm_finalize() do unlink/detach
[mpirun-gkpc-701889@1,0]<stdout>: NOT SELECTED
[mpirun-gkpc-701889@1,0]<stdout>: NOT SELECTED
However, there appears to be a second issue lurking:
Issue 2, error unlinking mmap backing file
If we further do something like this to temporarily resolve issue 1:
diff --git a/opal/mca/shmem/base/shmem_base_close.c b/opal/mca/shmem/base/shmem_base_close.c
index 415ea6c22e..32cae9d021 100644
--- a/opal/mca/shmem/base/shmem_base_close.c+++ b/opal/mca/shmem/base/shmem_base_close.c@@ -31,6 +31,9 @@
/* ////////////////////////////////////////////////////////////////////////// */
int opal_shmem_base_close(void)
{
+ printf("opal_shmem_base_close() (fake)\n");+ return;+
/* if there is a selected shmem module, finalize it */
if (NULL != opal_shmem_base_module && NULL != opal_shmem_base_module->module_finalize) {
opal_shmem_base_module->module_finalize();
$ mpirun -n 2 --mca btl sm,self --output tag osu_bcast_dv -m 4:4 2>&1 | grep 0]
[mpirun-gkpc-712148@1,0]<stdout>:
[mpirun-gkpc-712148@1,0]<stdout>: # OSU MPI_Bcast (data-varying) v7.0
[mpirun-gkpc-712148@1,0]<stdout>: # Size Avg Latency(us)
[mpirun-gkpc-712148@1,0]<stdout>: root = 0
[mpirun-gkpc-712148@1,0]<stdout>: 4 0.36
[mpirun-gkpc-712148@1,0]<stdout>: opal_shmem_base_close() (fake)
[mpirun-gkpc-712148@1,0]<stdout>: sm_finalize() do unlink/detach
[mpirun-gkpc-712148@1,0]<stderr>: --------------------------------------------------------------------------
[mpirun-gkpc-712148@1,0]<stderr>: A system call failed during shared memory initialization that should
[mpirun-gkpc-712148@1,0]<stderr>: not have. It is likely that your MPI job will now either abort or
[mpirun-gkpc-712148@1,0]<stderr>: experience performance degradation.
[mpirun-gkpc-712148@1,0]<stderr>:
[mpirun-gkpc-712148@1,0]<stderr>: Local host: gkpc
[mpirun-gkpc-712148@1,0]<stderr>: System call: unlink(2) /dev/shm/sm_segment.gkpc.1000.16b00001.0
[mpirun-gkpc-712148@1,0]<stderr>: Error: No such file or directory (errno 2)
[mpirun-gkpc-712148@1,0]<stderr>: --------------------------------------------------------------------------
This is not specific to btl/sm, I initially stumbled upon it in my coll component. It looks as if the /dev/shm backing files somehow get deleted before their proper spot. (this won't normally trigger because of issue 1). Now if I put on my mad-debugger glasses (TM) and take a shot at finding out where something like this might happen:
If I place a while(1) {} before the call to PMIx_Finalize here:
and inspect the contents of /dev/shm while it's hanging, I see sm's backing files as expected. If I move the hang-loop after the call to PMIx_Finalize and do the same thing, the files are gone. Might pmix be removing these files somehow?
Hi, I'd like to report two low-impact issues to do with finalization of mmap shmem areas.
Issue 1, btl/sm and opal/shmem finalization order
It looks like btl/sm is finalized after components in opal/shmem, and therefore
sm_finalize()
is called after the shmem component has been closed, making the calls toopal_shmem_unlink
andopal_shmem_segment_detach
have no effect.I'm not sure how straightforward it might be to resolve this, e.g. by delaying the finalization of opal/shmem.
To reproduce:
However, there appears to be a second issue lurking:
Issue 2, error unlinking mmap backing file
If we further do something like this to temporarily resolve issue 1:
This is not specific to btl/sm, I initially stumbled upon it in my coll component. It looks as if the /dev/shm backing files somehow get deleted before their proper spot. (this won't normally trigger because of issue 1). Now if I put on my mad-debugger glasses (TM) and take a shot at finding out where something like this might happen:
If I place a
while(1) {}
before the call toPMIx_Finalize
here:ompi/ompi/runtime/ompi_rte.c
Lines 973 to 977 in 53acf37
and inspect the contents of /dev/shm while it's hanging, I see sm's backing files as expected. If I move the hang-loop after the call to
PMIx_Finalize
and do the same thing, the files are gone. Might pmix be removing these files somehow?If I follow the rabbit hole a bit, it leads to this call: https://github.com/openpmix/openpmix/blob/250004266bc046c6303c8531ababdff4e1237525/src/client/pmix_client.c#L1075
Placing the while before/after it triggers/untriggers the behavior. (doesn't really look like it unlinks backing files, I know, perhaps it triggers the actual code that does it?)
Edit: Appears to be happening here: https://github.com/openpmix/openpmix/blob/250004266bc046c6303c8531ababdff4e1237525/src/include/pmix_globals.c#L521
The text was updated successfully, but these errors were encountered: