Skip to content

Commit 0f3dc20

Browse files
committed
TO BE SQUASHED
Suggestions above Charles' original commit. Signed-off-by: Jeff Squyres <[email protected]>
1 parent 8c978cc commit 0f3dc20

File tree

1 file changed

+48
-22
lines changed

1 file changed

+48
-22
lines changed

faq/openfabrics.inc

Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -2343,50 +2343,76 @@ shell$ mpirun --mca pml ucx --mca osc ucx --mca scoll ucx --mca atomic ucx ...
23432343

23442344
/////////////////////////////////////////////////////////////////////////
23452345

2346-
$q[] = "I'm getting errors about \"initializing an OpenFabrics device\" when running
2347-
4.0.0 with UCX support enabled. What should I do?";
2346+
$q[] = "I'm getting errors about \"initializing an OpenFabrics device\" when running v4.0.0 with UCX support enabled. What should I do?";
23482347
$anchor[] = "ofa-device-error";
2349-
$a[] = "Disable ib verbs support (--without-verbs).
23502348

2351-
The messages below were observed by at least one site where OpenMPI 4.0.0
2352-
was built with support for InfiniBand verbs (--with-verbs), OFA UCX (--with-ucx),
2353-
and CUDA (--with-cuda) with applications running on gpu-enabled hosts.
2349+
$a[] = "The short answer is that you should probably just disable
2350+
verbs support in Open MPI.
23542351
2355-
Since the openib BTL (configured via --with-verbs) is deprecated in favor
2356-
of UCX and the UCX PML includes support for the OpenFabrics devices in question
2357-
(Mellanox HCAs), the openib BTL is not needed.
2352+
The messages below were observed by at least one site where Open MPI v4.0.0
2353+
was built with support for InfiniBand verbs ([--with-verbs]), OFA UCX ([--with-ucx]),
2354+
and CUDA ([--with-cuda]) with applications running on GPU-enabled hosts.
23582355
2359-
<pre>
2356+
<geshi>
2357+
-----------------------------------------------------------------------------
23602358
WARNING: There was an error initializing an OpenFabrics device.
23612359
2362-
Local host: c36a-s39
2363-
Local device: mlx4_0
2364-
</pre>
2360+
Local host: c36a-s39
2361+
Local device: mlx4_0
2362+
-----------------------------------------------------------------------------
2363+
</geshi>
23652364
23662365
and
23672366
2368-
<pre>
2367+
<geshi>
23692368
-----------------------------------------------------------------------------
23702369
By default, for Open MPI 4.0 and later, infiniband ports on a device
23712370
are not used by default. The intent is to use UCX for these devices.
23722371
You can override this policy by setting the btl_openib_allow_ib MCA parameter
23732372
to true.
23742373
2375-
Local host: c36a-s39
2376-
Local adapter: mlx4_0
2377-
Local port: 1
2374+
Local host: c36a-s39
2375+
Local adapter: mlx4_0
2376+
Local port: 1
23782377
-----------------------------------------------------------------------------
2379-
</pre>";
2378+
</geshi>
2379+
2380+
As noted in the messages above, Open MPI deprecated the openib BTL
2381+
(enabled when Open MPI is configured [--with-verbs]) is deprecated in
2382+
favor of the UCX PML, which includes support for OpenFabrics devices.
2383+
The [openib] BTL is therefore not needed.
2384+
2385+
You can disable the [openib] BTL in a few different ways:
2386+
2387+
<ul>
2388+
<li> Configure Open MPI [--without-verbs]. This will prevent building
2389+
the [openib] BTL in the first place.</li>
2390+
<li> Disable the [openib] BTL via the [btl] MCA param (see <a
2391+
href=\"?category=tuning#setting-mca-params\">this FAQ item</a> for
2392+
information on how to set MCA params). For example,
2393+
<geshi bash>
2394+
shell$ mpirun --mca btl '^openib' ...
2395+
</geshi></li>
2396+
<li> Force the use of the UCX PML via the [pml] MCA param. For example:
2397+
<geshi bash>
2398+
shell$ mpirun --mca pml ucx ...
2399+
</geshi></li>
2400+
</ul>";
2401+
23802402
/////////////////////////////////////////////////////////////////////////
23812403

23822404
$q[] = "How can I find out what devices and transports are supported by UCX on my system?";
23832405
$anchor[] = "ucx-supported-devices";
2384-
$a[] = "The <code>ucx_info</code> command can be used.
23852406

2386-
For example,
2387-
<pre>
2407+
$a[] = "Check out the <a
2408+
href=\"http://www.openucx.org/documentation/\">UCX documentation</a>
2409+
for more information, but you can use the [ucx_info] command. For
2410+
example:
2411+
2412+
<gesh bash>
23882413
shell$ ucx_info -d
2389-
</pre>";
2414+
</geshi>";
2415+
23902416
/////////////////////////////////////////////////////////////////////////
23912417

23922418
$q[] = "What is <code>cpu-set</code>?";

0 commit comments

Comments
 (0)