-
Notifications
You must be signed in to change notification settings - Fork 900
MPI_Win_create failure under 4.0.0 when UCX enabled #6201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@xinzhao3 please take a look. |
Can you run with |
Let me know if anything else might help track this issue down:
Incase pml and btl debug is also useful:
|
Might be worth building with --enable-debug. Something is going with vader and it's not clear what without the extra debugging. |
Hmmm. I wonder if a fix is missing from v4.0.0. Can you try master? |
I see the same behavior in master. Enabling debug and running from master gives a bit more verbose output, although I'm not familiar enough with it to know if it's useful:
|
Ok. That helps. I will take a look in the morning. |
Here's a bit more debug info that might help:
|
Interesting. This looks like an edge case initialization failure in osc/rdma. Should have it fixed today. |
Couldn't work on this over the break. A combination of bad company policy and VMware suckage. I should get it fixed today. |
Ok, I see what is happening. Because pml/ucx is in use btl/vader is not set up properly. Should be easy enough to fix. |
This commit fixes a bug where add_procs can incorrectly return an error when going through the dynamic add_procs path. This doesn't happen normally, only when pml/ob1 is not in use. References open-mpi#6201 Signed-off-by: Nathan Hjelm <[email protected]>
This commit fixes a bug where add_procs can incorrectly return an error when going through the dynamic add_procs path. This doesn't happen normally, only when pml/ob1 is not in use. References open-mpi#6201 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 30b8336)
Thanks for getting this fixed! We've tested #6249 against some production applications and it seems to fix the issues we were originally seeing. |
This commit fixes a bug where add_procs can incorrectly return an error when going through the dynamic add_procs path. This doesn't happen normally, only when pml/ob1 is not in use. References open-mpi#6201 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 30b8336)
echo 0 > /proc/sys/kernel/yama/ptrace_scope |
I am runnig openmpi 2.1.6 with ucx and I am getting the following: [r2i0n6:342074] *** An error occurred in MPI_Win_create Any solution for this? |
Please upgrade to a more recent version of Open MPI (e.g., 4.0.1). If you are still having problems, please open a new issue. Thanks. |
What version of Open MPI are you using?
v4.0.0 release
UCX is at v1.4.0
Configured with
./configure --with-ucx
Please describe the system on which you are running
Details of the problem
Calls to
MPI_Win_create
fail with the following:I can get the test to run using the following variations:
Compiling OMPI without UCX also seems to work as well.
The tests above were all conducted with
test.c
, provided below.test.c:
The text was updated successfully, but these errors were encountered: