Skip to content

master/v5.0.x: missing documentation on MPI_Comm_join/Publish usage (client/server pattern) #10222

Closed
@klaa97

Description

@klaa97

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v5.0.0rc4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed Open MPI from distribution tarball of v5.0.0rc5 from https://www.open-mpi.org/software/ompi/v5.0/

Please describe the system on which you are running

  • Network type: TCP

Details of the problem

I am trying to replicate a simple client/server MPI application using
MPI_Comm_accept and MPI_Comm_connect, together with MPI_Publish / Lookup . Before version 5.0.x, I used the
ompi-server command to allow the communication between different executions, but since ORTE is deprecated as runtime, the previous method does not work anymore, and as expected, the processes do not have any shared registry where the information is published or where they can connect.
A minimal example below.

server.c

#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv ) {
    MPI_Comm client;
    char port_name[MPI_MAX_PORT_NAME];
    int size;
    MPI_Info info;

    MPI_Init( &argc, &argv );
    MPI_Comm_size(MPI_COMM_WORLD, &size);


    MPI_Open_port(MPI_INFO_NULL, port_name);
    printf("Server available at %s\n", port_name);

    MPI_Info_create(&info);

    MPI_Publish_name("name", info, port_name);

    printf("Wait for client connection\n");
    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client );
    printf("Client connected\n");

    MPI_Unpublish_name("name", MPI_INFO_NULL, port_name);
    MPI_Comm_free( &client );
    MPI_Close_port(port_name);
    MPI_Finalize();
    return 0;
}

client.c

#include <mpi.h>
#include <stdio.h>


int main(int argc, char **argv ) {
    MPI_Comm server;
    char port_name[MPI_MAX_PORT_NAME];

    MPI_Init( &argc, &argv );

    printf("Looking for server\n");
    MPI_Lookup_name( "name", MPI_INFO_NULL, port_name);
    printf("server found at %s\n", port_name);

    printf("Wait for server connection\n");
    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &server );
    printf("Server connected\n");

    MPI_Comm_disconnect( &server );
    MPI_Finalize();
    return 0;
}

Moreover, even if I communicate the server port to the client in other ways
(such as printing on a file), the two processes hang (I am considering mpirun as a runtime).

Possible solution

The current solution that I employing, following the PRRTE model, is the following:

  • Start a DVM with prte (as a system server for simplicity)
  • Start the MPI jobs with prun --system-server-only

Is this the expected and correct solution? is there any other way to connect different MPI executions (other than MPI_Comm_spawn) ?

I also suggest a sort-of migration guide for people that were used to ompi-server flow, maybe directly in the documentation of these MPI directives that need a DVM to work (MPI_Comm_join, Lookup/Publish, Accept/Connect, come to my mind, but I may be missing some 🤔 ). Thank you very much!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions