You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@rhc54 As discussed in #2859, when I enable the PMIx dstore, An MPI process of singleton execution (launch directly; no mpiexec) fails with the following message on v2.x branch.
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "Bad parameter" (-5) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
The problem will be in fork_hnp function of the singleton ESS. It checks the number of PMIx parameters. But the number varies if dstore is enabled. Probably PMIX_DSTORE_ESH_BASE_PATH is added.
At least 93e7384 and fb5bcc4 is also needed in addition to a1e8e58. I cannot determine other commits in orte/mca/ess/singleton in master is needed. @rhc54@ggouaillardet Could you take a look?
@kawashima-fj at first glance, that looks good to me
fwiw, i just noted fb5bcc4 not only plugs a memory leak (as indicated by the commit message), but it also fixes an array overflow when more than 4 PMIX_* environment variables are set.
thanks !
@rhc54 As discussed in #2859, when I enable the PMIx dstore, An MPI process of singleton execution (launch directly; no mpiexec) fails with the following message on v2.x branch.
The problem will be in
fork_hnp
function of the singleton ESS. It checks the number of PMIx parameters. But the number varies if dstore is enabled. ProbablyPMIX_DSTORE_ESH_BASE_PATH
is added.https://github.com/open-mpi/ompi/blob/v2.x/orte/mca/ess/singleton/ess_singleton_module.c#L615
The master seems to have the solution. Probably a1e8e58. Cherry-picking this commit is sufficient?
The text was updated successfully, but these errors were encountered: