Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Commit 8d6f2f0

Browse files
committed
usnic: fix resource check
The math for checking the number of QPs and CQs per usNIC/VF was incorrect, allowing you to run MPI processes even when usNICs (i.e., VIC VFs) had fewer QPs and CQs than were necessary. This led to a confusing error later when fi_enable(3) failed (because we lazily create QPs). Fixing the math here ensure that we actually print a helpful error message telling the user specifically what is wrong. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit open-mpi/ompi@dc18c32)
1 parent 6d0f0d2 commit 8d6f2f0

File tree

2 files changed

+11
-13
lines changed

2 files changed

+11
-13
lines changed

opal/mca/btl/usnic/btl_usnic_component.c

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -337,11 +337,11 @@ static int check_usnic_config(opal_btl_usnic_module_t *module,
337337
1. num_vfs (i.e., "usNICs") >= num_local_procs (to ensure that
338338
each MPI process will be able to have its own protection
339339
domain), and
340-
2. num_vfs * num_qps_per_vf >= num_local_procs * NUM_CHANNELS
340+
2. num_qps_per_vf >= NUM_CHANNELS
341341
(to ensure that each MPI process will be able to get the
342342
number of QPs it needs -- we know that every VF will have
343343
the same number of QPs), and
344-
3. num_vfs * num_cqs_per_vf >= num_local_procs * NUM_CHANNELS
344+
3. num_cqs_per_vf >= NUM_CHANNELS
345345
(to ensure that each MPI process will be able to get the
346346
number of CQs that it needs) */
347347
if (uip->ui.v1.ui_num_vf < unlp) {
@@ -350,19 +350,17 @@ static int check_usnic_config(opal_btl_usnic_module_t *module,
350350
goto error;
351351
}
352352

353-
if (uip->ui.v1.ui_num_vf * uip->ui.v1.ui_qp_per_vf <
354-
unlp * USNIC_NUM_CHANNELS) {
355-
snprintf(str, sizeof(str), "Not enough WQ/RQ (found %d, need %d)",
356-
uip->ui.v1.ui_num_vf * uip->ui.v1.ui_qp_per_vf,
357-
unlp * USNIC_NUM_CHANNELS);
353+
if (uip->ui.v1.ui_qp_per_vf < USNIC_NUM_CHANNELS) {
354+
snprintf(str, sizeof(str), "Not enough transmit/receive queues per usNIC (found %d, need %d)",
355+
uip->ui.v1.ui_qp_per_vf,
356+
USNIC_NUM_CHANNELS);
358357
goto error;
359358
}
360-
if (uip->ui.v1.ui_num_vf * uip->ui.v1.ui_cq_per_vf <
361-
unlp * USNIC_NUM_CHANNELS) {
359+
if (uip->ui.v1.ui_cq_per_vf < USNIC_NUM_CHANNELS) {
362360
snprintf(str, sizeof(str),
363-
"Not enough CQ per usNIC (found %d, need %d)",
364-
uip->ui.v1.ui_num_vf * uip->ui.v1.ui_cq_per_vf,
365-
unlp * USNIC_NUM_CHANNELS);
361+
"Not enough completion queues per usNIC (found %d, need %d)",
362+
uip->ui.v1.ui_cq_per_vf,
363+
USNIC_NUM_CHANNELS);
366364
goto error;
367365
}
368366

opal/mca/btl/usnic/help-mpi-btl-usnic.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This means that you have either not provisioned enough usNICs on this
1818
VIC, or there are not enough total receive, transmit, or completion
1919
queues on the provisioned usNICs. On each VIC in a given server, you
2020
need to provision at least as many usNICs as MPI processes on that
21-
server. In each usNIC, you need to provision at least two each of the
21+
server. In each usNIC, you need to provision enough of each of the
2222
following: send queues, receive queues, and completion queues.
2323

2424
Open MPI will skip this usNIC interface in the usnic BTL, which may

0 commit comments

Comments
 (0)