-
Notifications
You must be signed in to change notification settings - Fork 902
Device vendor ID: 0x02c9, part ID: 4123, WARNING: No preset parameters were found for the device that Open MPI detected #10841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
UCX is definitely the way to go; this is NVIDIA's preferred mechanism. Sidenote: running with The warning message is definitely coming from the I think you should be able to run with:
This tells Open MPI to use the UCX PML for point-to-point MPI communication, and to skip using the Any further suggestions from the @open-mpi/ucx team? |
Here's what it looks like running
And perhaps of interest the warning is clearer and no segv when running:
Is there an option/parameter to verify ucx is available? |
Also when running
|
Ah good question was done before my time. Is there a way to use |
I have a dim recollection that there were some bugs with UCX/UCT in the past that caused segvs when used in specific combinations. Do you have the most recent version of UCX? |
@RobbieTheK can you pls run openmpi with "-mca pml_ucx_verbose 999" to show why UCX is not selected? |
|
@RobbieTheK i don't see the error "PML ucx cannot be selected" here, seems UCX is used. |
We don't have ucx installed, isn't it available in OpenMPI 4.1.1? From the verbose output does this apply?
Here's what I get with OSU's benchmark:
|
OpenMPI was probably build with custom UCX path, need to run ucx_info from there |
OK so my questions are why the How would I find ucx here?
|
I'd advise to disable btl/openib component since it was deprecated and replaced by ucx: add
|
Thanks was aware of that just was suggesting a more useful message/log rather than segfault. Does anyone know of a way to better use these benchmarks to find if there is a performance difference, e.g., if one's faster than the other? I get very similar results using STREAMS and OSU's micro benchmarks. also:
But is there a way to run ucx standalone when it's compiled in OpenMPI? |
This error happens when I use the OSU Multi-threaded Latency Test and likely, OpenMPI was not compiled with OCX support. Can I make a suggestion to improve the warning to mention that? |
another one. WARNING: No preset parameters were found for the device that Open MPI detected: |
OpenMPI 4.1.1 in RHEL 8, 5e:00.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
Using this STREAM benchmark here are some verbose logs:
I found a reference to this in the comments for mca-btl-openib-device-params.ini
Forcing ucx still generates the warning:
With version 4.1.1, if I use
--mca btl 'openib'
I get seg faults which I believe is expected as it's deprecated. I've tried--mca btl '^openib'
,--mca btl 'tcp'
(or--mca btl 'tcp,self'
using the OSU BMs) and the benchmark results are very similar even when I use multiple CPUs, threads and/or nodes. They also run without the warning messages. If I don't use a--mca
option, I get theWARNING:
message.Does anyone know of a tried and true way to run these benchmarks so know if these MCA parameters make a difference or am I just not understanding how to use these? Perhaps running these benchmarks on a very active cluster with shared CPUs/nodes will affect the results?
The text was updated successfully, but these errors were encountered: