-
Notifications
You must be signed in to change notification settings - Fork 901
openmpi v4 launch error messages #13293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sometimes cannot use the PMIx shared memory in a VM, so set Only UDP I see in OMPI v4 (doing a really quick grep) is in the USNIC BTL, so try adding |
Thank you for the response @rhc54
with PMIX_MCA_gds=hash:
with --mca btl self,sm,tc , mpirun failed to launch -
|
It's |
Thank you , i retried -
and I get same issue. when i remove sm , launch works
with sm -
without sm
|
The UDP messages appear to be coming from the PSM3 library:
You mention ethernet interfaces, but didn't mention anything more specific than that (there's several OS-bypass / HPC-quality ethernet-based hardware platforms available -- PSM3 is one of them). Meaning: if you have networking hardware that can utilize the PSM3 library, then the PSM3 stack isn't installed or configured properly because it apparently isn't able to open the NICs successfully. You'll need to investigate your hardware / PSM3 documentation to resolve that; we can't help with that. If you don't have PSM3-capable hardware, then you should probably remove the PSM3 library from your systems to avoid confusion (depending on what layer is using it, you may need to rebuild Open MPI). You're also having shared memory problems in #13294. I'm going to take a guess: you should follow what the error messages are telling you in that ticket and have a TMPDIR on a non-NFS directory. Weird (i.e., bad) things can happen when trying to mount shared memory on NFS-based filesystems. Doing so may make the |
Uh oh!
There was an error while loading. Please reload this page.
Hi Team ,
We had installed openmpi 4.1.6 and openmpi v5 using easy build scripts on a VM based cluster on RHEL 9.5 based cluster.
https://docs.easybuild.io/version-specific/supported-software/o/OpenMPI/
Each server in VM based cluster has following 3 interfaces :
when we run the openmpi v4 , we are able to perform multinode runs , but we see lots of error messages at the beginning and end of the expected output.
Based on the output error messages, i see 2 category of issues -
issue 1)
issue 2) at the end of run i see following message -
I am attaching the complete stdout herewith, ompiv4_error.txt
Please do let me know if any further information is required from my end.
The text was updated successfully, but these errors were encountered: