Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v5.0.0rc4
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed Open MPI from distribution tarball of v5.0.0rc5 from https://www.open-mpi.org/software/ompi/v5.0/
Please describe the system on which you are running
- Network type: TCP
Details of the problem
I am trying to replicate a simple client/server MPI application using
MPI_Comm_accept and MPI_Comm_connect, together with MPI_Publish / Lookup . Before version 5.0.x, I used the
ompi-server
command to allow the communication between different executions, but since ORTE is deprecated as runtime, the previous method does not work anymore, and as expected, the processes do not have any shared registry where the information is published or where they can connect.
A minimal example below.
server.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv ) {
MPI_Comm client;
char port_name[MPI_MAX_PORT_NAME];
int size;
MPI_Info info;
MPI_Init( &argc, &argv );
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Open_port(MPI_INFO_NULL, port_name);
printf("Server available at %s\n", port_name);
MPI_Info_create(&info);
MPI_Publish_name("name", info, port_name);
printf("Wait for client connection\n");
MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client );
printf("Client connected\n");
MPI_Unpublish_name("name", MPI_INFO_NULL, port_name);
MPI_Comm_free( &client );
MPI_Close_port(port_name);
MPI_Finalize();
return 0;
}
client.c
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv ) {
MPI_Comm server;
char port_name[MPI_MAX_PORT_NAME];
MPI_Init( &argc, &argv );
printf("Looking for server\n");
MPI_Lookup_name( "name", MPI_INFO_NULL, port_name);
printf("server found at %s\n", port_name);
printf("Wait for server connection\n");
MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );
printf("Server connected\n");
MPI_Comm_disconnect( &server );
MPI_Finalize();
return 0;
}
Moreover, even if I communicate the server port to the client in other ways
(such as printing on a file), the two processes hang (I am considering mpirun
as a runtime).
Possible solution
The current solution that I employing, following the PRRTE model, is the following:
- Start a DVM with
prte
(as a system server for simplicity) - Start the MPI jobs with
prun --system-server-only
Is this the expected and correct solution? is there any other way to connect different MPI executions (other than MPI_Comm_spawn) ?
I also suggest a sort-of migration guide for people that were used to ompi-server
flow, maybe directly in the documentation of these MPI directives that need a DVM to work (MPI_Comm_join, Lookup/Publish, Accept/Connect, come to my mind, but I may be missing some 🤔 ). Thank you very much!