Skip to content

MPI_Win_create() fails #8086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nathanweeks opened this issue Oct 11, 2020 · 4 comments
Closed

MPI_Win_create() fails #8086

nathanweeks opened this issue Oct 11, 2020 · 4 comments

Comments

@nathanweeks
Copy link
Contributor

Background information

Using the test.c from #6201 (comment), MPI_Win_create() fails using an Open MPI build from the master branch, but succeeds when using a build from tag v4.1.0rc2.

What version of Open MPI are you using?

master, 0bcef04

Describe how Open MPI was installed

The following Dockerfile:

FROM ubuntu:20.04

RUN apt update && apt install -y --no-install-recommends \
  autoconf \
  automake \
  ca-certificates \
  flex \
  g++-10 \
  gcc-10 \ 
  git \
  libtool \
  make \
  openssh-client

WORKDIR /src

ARG COMMIT

RUN git init \
  && git remote add origin https://github.com/open-mpi/ompi.git \
  && git fetch --depth=1 origin ${COMMIT}  \
  && git checkout ${COMMIT} \
  && git submodule update --init --recursive --depth=1 \
  && ./autogen.pl

RUN sh ./configure --disable-io-romio --disable-man-pages CC=gcc-10 CXX=g++-10 \
  && make -j \
  && make install \
  && ldconfig

ENV OMPI_ALLOW_RUN_AS_ROOT=1 \
    OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

Build thus:

docker build --build-arg COMMIT=0bcef049c9f28564303ab60b26feae24eb7eec04 -t open-mpi:v5.0.0-0bcef04 .

Please describe the system on which you are running

single host, Docker Engine Version 19.03.13, linux/amd64


Details of the problem

When the above Dockerfile is used to build v4.1.0rc2, an executable compiled from the aforementioned test.c exits with status 0:

docker build --build-arg COMMIT=31496e28e6276a1863306ea4046258fe163ac9b8 -t open-mpi:v4.1.0rc2 .
...
$ docker run -it --rm --entrypoint=/bin/bash open-mpi:v4.1.0rc2
root@39e756f57775:/src# cat > test.c
...
root@39e756f57775:/src# mpicc test.c
root@39e756f57775:/src# mpiexec -n 2 ./a.out
root@39e756f57775:/src# echo $?
0

However, for the latest commit from the master branch (pre-v5.0), an error occurs during MPI_Win_create():

$ docker build --build-arg COMMIT=0bcef049c9f28564303ab60b26feae24eb7eec04 -t open-mpi:v5.0.0-0bcef04 .
...
$ docker run -it --rm --entrypoint=/bin/bash open-mpi:v5.0.0-0bcef04
root@e6c749b26c3f:/src# cat > test.c
...
root@e6c749b26c3f:/src# mpicc test.c
root@e6c749b26c3f:/src# mpiexec -n 2 ./a.out
[e6c749b26c3f:00000] *** An error occurred in MPI_Win_create
[e6c749b26c3f:00000] *** reported by process [231931905,1]
[e6c749b26c3f:00000] *** on communicator MPI_COMM_WORLD
[e6c749b26c3f:00000] *** MPI_ERR_WIN: invalid window
[e6c749b26c3f:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[e6c749b26c3f:00000] ***    and MPI will try to terminate your MPI job as well)
[e6c749b26c3f:00000] *** An error occurred in MPI_Win_create
[e6c749b26c3f:00000] *** reported by process [231931905,0]
[e6c749b26c3f:00000] *** on communicator MPI_COMM_WORLD
[e6c749b26c3f:00000] *** MPI_ERR_WIN: invalid window
[e6c749b26c3f:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[e6c749b26c3f:00000] ***    and MPI will try to terminate your MPI job as well)
@devreal
Copy link
Contributor

devreal commented Oct 12, 2020

I'm trying to look into this. So far what I am seeing suggests that in docker the querying for working btls in ompi_osc_rdma_query_btls finds the sm btl to be available for communication between ranks but not for the same rank. I haven't figured out yet why btl sm is not in the local endpoint's bml_btls. @hjelmn any idea why that could be?

@nathanweeks
Copy link
Contributor Author

I'm no longer seeing this error with v5.0.0.rc2 (tested by adding python3 to the list of packages installed in the first RUN directive of the Dockerfile, and using 7fa73f1 as the value of the COMMIT build-arg).

@janjust
Copy link
Contributor

janjust commented Mar 16, 2022

Can we close this?

@nathanweeks
Copy link
Contributor Author

Can we close this?

Still no error with v5.0.0rc3; I'd say "yes".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants