Skip to content

Memory leak with persistent MPI sends and the ob1 "get" protocol #6565

Closed
@s-kuberski

Description

@s-kuberski

Background information

A memory leak appears when using persistent communication with the vader BTL and large message sizes.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

4.0.0 on local computer, 2.1.2 and 3.1.3 on clusters

Describe how Open MPI was installed

4.0.0: from tarball

Please describe the system on which you are running

  • Operating system/version: Ubuntu 18.04 / CentOS 7.5 / Scientific Linux release 7.5
  • Computer hardware: Laptop / Intel Skylake / Intel Nehalem
  • Network type: vader RDMA

Details of the problem

Running a simulation program with Open MPI, memory leaks appeared causing the application to crash. The behaviour can be reproduced with the attached code block.

When the vader BTL is used, the used memory increases linearly over time. This bug is directly connected to the message size:
With btl_vader_eager_limit 4096 and a message size of 4041 bytes, the bug appears. If the eager limit is raised or the message size is decreased, no problem occurs.

Only btl_vader_single_copy_mechanism set to cma could be tested, no problem is seen if the value is set to none.

If buffered communication is used and the buffer is detached and attached manually, the problem does not appear.

Only the shared-memory communication with vader is affected. If the processes are located on different nodes or -mca btl ^vader is set, everything is fine.

#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"

int main(int argc, char **argv) {
  int rank, size, rplus, rminus, cnt, max = 4000000, i;
  char *sbuf, *rbuf;
  static MPI_Request req;
  int lbuf = 4041;

  /* Initialize MPI and assert even size */
  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&size);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);

  if ( size & 1 ) {
    if ( rank == 0 ) fprintf(stderr,"ERROR: Invalid number of MPI tasks: %d\n", size);
    MPI_Finalize();
    return -1;
  }

  /* Optional arguments: max and lbuf*/
  if ( argc > 1 ) {
    if(atoi(argv[1])>0) max = atoi(argv[1]);
    if ( rank == 0 ) printf("max=%d\n", max);
  }
  if ( argc > 2 ) {
    lbuf = atoi(argv[2]);
    if ( rank == 0 ) printf("lbuf=%d\n", lbuf);
  }
  /* allocate buffers */
  sbuf = malloc(sizeof(char) * lbuf);
  rbuf = malloc(sizeof(char) * lbuf);
  /* Initialize buffers */
  for(i=0; i<lbuf; i++) sbuf[i] = rbuf[i] = 0; 
  sbuf[0] = rank;

  /* Initialize communicators: single message from all even ranks to next odd rank */
  rplus  = ( rank + 1 ) % size;
  rminus = ( rank - 1 + size ) % size;
  if ( rank & 1 )
    MPI_Recv_init(rbuf, lbuf, MPI_CHAR, rminus, 0, MPI_COMM_WORLD, &req);
  else
  	MPI_Send_init(sbuf, lbuf, MPI_CHAR, rplus, 0, MPI_COMM_WORLD, &req);
  
  /* Repeat communications */
  MPI_Barrier(MPI_COMM_WORLD);
  for(cnt = 0; cnt<max; cnt++) {
    MPI_Status stat;
    MPI_Start(&req);
    MPI_Wait(&req,&stat);
  }
  MPI_Request_free(&req);
  MPI_Finalize();
  return 0;
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions