Skip to content

master/v5.0.x: missing documentation on MPI_Comm_join/Publish usage (client/server pattern) #10222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
klaa97 opened this issue Apr 5, 2022 · 13 comments

Comments

@klaa97
Copy link

klaa97 commented Apr 5, 2022

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v5.0.0rc4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed Open MPI from distribution tarball of v5.0.0rc5 from https://www.open-mpi.org/software/ompi/v5.0/

Please describe the system on which you are running

  • Network type: TCP

Details of the problem

I am trying to replicate a simple client/server MPI application using
MPI_Comm_accept and MPI_Comm_connect, together with MPI_Publish / Lookup . Before version 5.0.x, I used the
ompi-server command to allow the communication between different executions, but since ORTE is deprecated as runtime, the previous method does not work anymore, and as expected, the processes do not have any shared registry where the information is published or where they can connect.
A minimal example below.

server.c

#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv ) {
    MPI_Comm client;
    char port_name[MPI_MAX_PORT_NAME];
    int size;
    MPI_Info info;

    MPI_Init( &argc, &argv );
    MPI_Comm_size(MPI_COMM_WORLD, &size);


    MPI_Open_port(MPI_INFO_NULL, port_name);
    printf("Server available at %s\n", port_name);

    MPI_Info_create(&info);

    MPI_Publish_name("name", info, port_name);

    printf("Wait for client connection\n");
    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client );
    printf("Client connected\n");

    MPI_Unpublish_name("name", MPI_INFO_NULL, port_name);
    MPI_Comm_free( &client );
    MPI_Close_port(port_name);
    MPI_Finalize();
    return 0;
}

client.c

#include <mpi.h>
#include <stdio.h>


int main(int argc, char **argv ) {
    MPI_Comm server;
    char port_name[MPI_MAX_PORT_NAME];

    MPI_Init( &argc, &argv );

    printf("Looking for server\n");
    MPI_Lookup_name( "name", MPI_INFO_NULL, port_name);
    printf("server found at %s\n", port_name);

    printf("Wait for server connection\n");
    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &server );
    printf("Server connected\n");

    MPI_Comm_disconnect( &server );
    MPI_Finalize();
    return 0;
}

Moreover, even if I communicate the server port to the client in other ways
(such as printing on a file), the two processes hang (I am considering mpirun as a runtime).

Possible solution

The current solution that I employing, following the PRRTE model, is the following:

  • Start a DVM with prte (as a system server for simplicity)
  • Start the MPI jobs with prun --system-server-only

Is this the expected and correct solution? is there any other way to connect different MPI executions (other than MPI_Comm_spawn) ?

I also suggest a sort-of migration guide for people that were used to ompi-server flow, maybe directly in the documentation of these MPI directives that need a DVM to work (MPI_Comm_join, Lookup/Publish, Accept/Connect, come to my mind, but I may be missing some 🤔 ). Thank you very much!

@jsquyres
Copy link
Member

Good point. I just added this to the docs to-do list in #10256. If you'd like to add this to the upcoming v5.x docs, please feel free to open a PR!

@klaa97
Copy link
Author

klaa97 commented Apr 14, 2022

Thank you @jsquyres !

Before opening a PR for this, I would like to understand if actually prun with the DVM machine is the correct way to handle these MPI_Join/Publish/Connect operations.

In case it is, and it is the only way, it would also be very important in my opinion to give some kind of guidance on its usage with relation to the classic mpirun command. A simple example is the --with-ft ulfm command, which does not seem to work on prun. I think it would be incorrect to lead to the usage of prun in the docs, without a related guidance on the usage of prun, which does not seem a drop-in replacement of mpirun. Let me know if I should open another issue for this though!

Thank you again!

@jsquyres
Copy link
Member

That is an excellent question. I'm afraid I don't know the answer.

@bosilca @abouteiller This user is volunteering to write / amend some docs. Can you help answer the questions above?

@bosilca
Copy link
Member

bosilca commented Apr 18, 2022

ULFM does not join worlds, instead we spawn processes from an existing job, so things are slightly simpler there (not that they consistently work, but at least we do not have the issue with exchanging the modex).

Any documentation, especially in a so rarely used area as connect/accept, will be of tremendous help.

@qkoziol
Copy link
Contributor

qkoziol commented Jun 5, 2023

Split out from #10480

@jsquyres
Copy link
Member

jsquyres commented Jun 6, 2023

Downgraded from "blocker" to "critical".

@edgargabriel
Copy link
Member

edgargabriel commented Jun 20, 2023

I did some investigation on this issue, and here is the precise list of steps required to make connect/accept, join, and publish/unpublish work with ompi v5.0. The key hints are provided by @klaa97 above, I am just trying to document here the sequence of steps.

  1. start a prte server:
user@myhost:~/OpenMPI/bin$ ./prte --system-server --report-uri -
[email protected];tcp4://127.0.0.1:56669
DVM ready
  1. create a file that contains the URI of the prte system-server, e.g.
user@myhost~/testdir$ cat dvm_uri.txt
[email protected];tcp4://127.0.0.1:56669
  1. Launch the first application using the dvm identified with the uri file shown above
user@myhost:~testdir$ mpirun --dvm file:dvm_uri.txt -np 1 ./server
  1. Launch the second application using also the dvm identified in the uri file:
user@myhost:~testdir$ mpirun --dvm file:dvm_uri.txt -np 1 ./client

A few more things:

  • it works also with more than one process per job
  • instead of using MPI_Publish_name and MPI_Lookup_name one can print the port obtained using MPI_Open_port e.g. on the screen and pass the argument directly to the client code as a string.

@rhc54
Copy link
Contributor

rhc54 commented Jun 20, 2023

Errr...that will sometimes work, but some steps aren't actually required and won't work in some situations. I'll try to provide a more generic set of steps in a bit.

@edgargabriel
Copy link
Member

Errr...that will sometimes work, but some steps aren't actually required and won't work in some situations. I'll try to provide a more generic set of steps in a bit.

ok, thank you, any details and additional information would be appreciated

@rhc54
Copy link
Contributor

rhc54 commented Jun 21, 2023

There are two cases to consider:

If you own all of the nodes involved in the session (i.e., you are not sharing nodes), then you can start PRRTE with prte --system-server. The system-server flag just tells PRRTE to put a rendezvous file at the top of the session directory tree - i.e., in /tmp - do it is easier to find. You can then just use prun --system-server-only to launch each job you want to execute and you'll always connect to that DVM, even if you have multiple mpirun executions running in parallel.

If you are sharing nodes, then you cannot use the system-server method as there can be only one system server at a time. Instead, you start PRRTE with prte --report-uri <filename> to store the URI information in the specified file. You can then use prun --dvm-uri file:<filename> to execute the applications.

You can substitute mpirun for prun if you like. In the system server case, you need to use mpirun --dvm system. In the non-system-server case, you need mpirun --dvm file:filename.

@naughtont3 Note that the --dvm option help entry for mpirun is incomplete as it doesn't include the system or system-first options. Also, there is a bug in prte.c when looking for those options such that system-first will always default to system.

@edgargabriel
Copy link
Member

@rhc54 thank you very much, I will try to put this information into the docs. One follow up question: would this also work for direct launch (e.g. srun), or is it bound to prrte/mpirun utilization?

@rhc54
Copy link
Contributor

rhc54 commented Jun 21, 2023

would this also work for direct launch (e.g. srun), or is it bound to prrte/mpirun utilization?

I'm afraid Slurm does not include support for these operations, so it is constrained to PRRTE.

@edgargabriel edgargabriel self-assigned this Jun 23, 2023
@edgargabriel
Copy link
Member

Completed in #11776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants