-
Notifications
You must be signed in to change notification settings - Fork 900
master: pmix MPI spawn / list assertion error #2920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can reproduce, checking ... |
Register namespace even if there is no node-local processes that belongs to it. We need this for the MPI_Spawn case. Addressing open-mpi#2920. Was introduced in be3ef77. Signed-off-by: Artem Polyakov <[email protected]>
|
Also on MTT, per #2863. |
Yes, it looks very similar. I guess this should fix it. |
However this fix doesn't explain list failure, it only removes the problem with missing data. |
I think I have this fixed with 0c8609c - let's see how MTT does overnight. |
@rhc54 out of curiosity - I don't see any visible changes related to the topic. What exactly fixes the problem? |
You have to stop the progress thread prior to tearing down the infrastructure. The list problem was caused by the messaging system continuing to operate in the progress thread while the PMIx_Finalize routine was tearing down the messaging framework. See the changes in the server, client, and tool routines where we now stop the progress thread prior to calling rte_finalize. |
So it doesn't address spawn problem, right? |
No - I revised the fix for spawn in #2977 |
This appears to now be fixed - still seeing the ptl_base_send errors, but that's in a different issue. |
Register namespace even if there is no node-local processes that belongs to it. We need this for the MPI_Spawn case. Addressing open-mpi#2920. Was introduced in be3ef77. Signed-off-by: Artem Polyakov <[email protected]>
Register namespace even if there is no node-local processes that belongs to it. We need this for the MPI_Spawn case. Addressing open-mpi#2920. Was introduced in be3ef77. Signed-off-by: Artem Polyakov <[email protected]>
From @siegmargross post on users (https://www.mail-archive.com/[email protected]/msg30564.html):
I have installed openmpi-master-201702010209-6cb484a on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Unfortunately, I get errors when I run my spawn programs.
I used the following commands to build and install the package. ${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my Linux machine. Options "--enable-mpi-cxx-bindings and
"--enable-mpi-thread-multiple" are now unrecognized. Probably they are now automatically supported. "configure" reports a warning that I should report.
I get the following errors, if I run "spawn_master" or "spawn_multiple_master".
The text was updated successfully, but these errors were encountered: