-
Notifications
You must be signed in to change notification settings - Fork 923
Closed
Description
I think there is a problem in the btl/sm component.
It is the synchronization problem that is between the node master process and other child processes.
Other child processes read 0 byte from shared_mem_btl_rndv.HOSTNAME file before the node master
process writes sizeof(opal_shmem_ds_t) bytes to the file.
I think this problem can be solved with the following correction.
Index: ompi/mca/btl/sm/btl_sm.c
===================================================================
--- ompi/mca/btl/sm/btl_sm.c (リビジョン 4013)
+++ ompi/mca/btl/sm/btl_sm.c (作業コピー)
@@ -180,6 +180,7 @@
static int
sm_segment_attach(mca_btl_sm_component_t *comp_ptr)
{
+ struct stat buf;
int rc = OMPI_SUCCESS;
int fd = -1;
ssize_t bread = 0;
@@ -195,6 +196,14 @@
rc = OMPI_ERR_IN_ERRNO;
goto out;
}
+ do {
+ if (0 != fstat(fd,&buf)) {
+ opal_output(0, "sm_segment_attach: "
+ "fstat errno=%d\n",errno);
+ rc = OMPI_ERROR;
+ goto out;
+ }
+ } while (sizeof(opal_shmem_ds_t) != buf.st_size);
if ((ssize_t)sizeof(opal_shmem_ds_t) != (bread =
read(fd, tmp_shmem_ds, sizeof(opal_shmem_ds_t)))) {
opal_output(0, "sm_segment_attach: "
This problem occurs in the Open MPI version 1.8.
And it may not occur in the Open MPI master.