Skip to content

Error launching under slurm (Out of resource) #11371

Closed
openpmix/prrte
#1669
@gkatev

Description

@gkatev

Hi, I've been unable to start mpi jobs under slurm reservations with the latest main.
I'm under salloc -N 1 -n 48, and the mesage is:

$ mpirun -n 1 hostname
--------------------------------------------------------------------------
Your job failed to map. Either no mapper was available, or none
of the available mappers was able to perform the requested
mapping operation.

  Mapper result:       Out of resource
  Application:         hostname
  #procs to be mapped: 1
  Mapping policy:      BYSLOT
  Binding policy:      CORE

--------------------------------------------------------------------------

This happens with 1 as well as with 2 nodes in the reservation. It also doesn't work in 5.0.x, but in 5.0.0rc8 all is well. It doesn't happen when not under slurm.

I tried to chase it down a bit:

$ mpirun -n 1 --prtemca rmaps_base_verbose 10 --display alloc --output tag hostname
[deepv:02593] mca: base: component_find: searching NULL for rmaps components
[deepv:02593] mca: base: find_dyn_components: checking NULL for rmaps components
[deepv:02593] pmix:mca: base: components_register: registering framework rmaps components
[deepv:02593] pmix:mca: base: components_register: found loaded component ppr
[deepv:02593] pmix:mca: base: components_register: component ppr register function successful
[deepv:02593] pmix:mca: base: components_register: found loaded component rank_file
[deepv:02593] pmix:mca: base: components_register: component rank_file has no register or open function
[deepv:02593] pmix:mca: base: components_register: found loaded component round_robin
[deepv:02593] pmix:mca: base: components_register: component round_robin register function successful
[deepv:02593] pmix:mca: base: components_register: found loaded component seq
[deepv:02593] pmix:mca: base: components_register: component seq register function successful
[deepv:02593] [prterun-deepv-2593@0,0] rmaps:base set policy with slot
[deepv:02593] mca: base: components_open: opening rmaps components
[deepv:02593] mca: base: components_open: found loaded component ppr
[deepv:02593] mca: base: components_open: component ppr open function successful
[deepv:02593] mca: base: components_open: found loaded component rank_file
[deepv:02593] mca: base: components_open: found loaded component round_robin
[deepv:02593] mca: base: components_open: component round_robin open function successful
[deepv:02593] mca: base: components_open: found loaded component seq
[deepv:02593] mca: base: components_open: component seq open function successful
[deepv:02593] mca:rmaps:select: checking available component ppr
[deepv:02593] mca:rmaps:select: Querying component [ppr]
[deepv:02593] mca:rmaps:select: checking available component rank_file
[deepv:02593] mca:rmaps:select: Querying component [rank_file]
[deepv:02593] mca:rmaps:select: checking available component round_robin
[deepv:02593] mca:rmaps:select: Querying component [round_robin]
[deepv:02593] mca:rmaps:select: checking available component seq
[deepv:02593] mca:rmaps:select: Querying component [seq]
[deepv:02593] [prterun-deepv-2593@0,0]: Final mapper priorities
[deepv:02593] 	Mapper: rank_file Priority: 100
[deepv:02593] 	Mapper: ppr Priority: 90
[deepv:02593] 	Mapper: seq Priority: 60
[deepv:02593] 	Mapper: round_robin Priority: 10

======================   ALLOCATED NODES   ======================
    dp-dam01: slots=48 max_slots=0 slots_inuse=0 state=UP
	Flags: DAEMON_LAUNCHED:SLOTS_GIVEN
	aliases: 10.2.10.41,10.2.17.81
=================================================================
[deepv:02593] mca:rmaps: mapping job prterun-deepv-2593@1
[deepv:02593] mca:rmaps: setting mapping policies for job prterun-deepv-2593@1 inherit TRUE hwtcpus FALSE
[deepv:02593] mca:rmaps mapping given by MCA param
[deepv:02593] mca:rmaps[540] default binding policy given
[deepv:02593] mca:rmaps:rf: job prterun-deepv-2593@1 not using rankfile policy
[deepv:02593] mca:rmaps:ppr: job prterun-deepv-2593@1 not using ppr mapper PPR NULL policy PPR NOTSET
[deepv:02593] [prterun-deepv-2593@0,0] rmaps:seq called on job prterun-deepv-2593@1
[deepv:02593] mca:rmaps:seq: job prterun-deepv-2593@1 not using seq mapper
[deepv:02593] mca:rmaps:rr: mapping job prterun-deepv-2593@1
[deepv:02593] [prterun-deepv-2593@0,0] Starting with 1 nodes in list
[deepv:02593] [prterun-deepv-2593@0,0] Filtering thru apps
[deepv:02593] [prterun-deepv-2593@0,0] Retained 1 nodes in list
[deepv:02593] [prterun-deepv-2593@0,0] node dp-dam01 has 48 slots available
[deepv:02593] AVAILABLE NODES FOR MAPPING:
[deepv:02593]     node: dp-dam01 daemon: 1 slots_available: 48
[deepv:02593] mca:rmaps:rr: mapping by slot for job prterun-deepv-2593@1 slots 48 num_procs 1
[deepv:02593] mca:rmaps:rr:slot working node dp-dam01
[deepv:02593] [prterun-deepv-2593@0,0] get_avail_ncpus: node dp-dam01 has 0 procs on it
[deepv:02593] mca:rmaps:rr:slot job prterun-deepv-2593@1 is oversubscribed - performing second pass
[deepv:02593] mca:rmaps:rr:slot working node dp-dam01
[deepv:02593] [prterun-deepv-2593@0,0] get_avail_ncpus: node dp-dam01 has 0 procs on it
--------------------------------------------------------------------------
Your job failed to map. Either no mapper was available, or none
of the available mappers was able to perform the requested
mapping operation.

  Mapper result:       Out of resource
  Application:         hostname
  #procs to be mapped: 1
  Mapping policy:      BYSLOT
  Binding policy:      CORE

--------------------------------------------------------------------------

It looked to me like the failure starts happening because prte_rmaps_base_get_ncpus() returned 0. These debug prints:

diff --git a/src/mca/rmaps/base/rmaps_base_support_fns.c b/src/mca/rmaps/base/rmaps_base_support_fns.c
index 8a2974a90f..c345c2e727 100644
--- a/src/mca/rmaps/base/rmaps_base_support_fns.c
+++ b/src/mca/rmaps/base/rmaps_base_support_fns.c
@@ -668,6 +668,7 @@ int prte_rmaps_base_get_ncpus(prte_node_t *node,
     int ncpus;
 
 #if HWLOC_API_VERSION < 0x20000
+    printf("HWLOC_API_VERSION < 0x20000\n");
     hwloc_obj_t root;
     root = hwloc_get_root_obj(node->topology->topo);
     if (NULL == options->job_cpuset) {
@@ -679,6 +680,7 @@ int prte_rmaps_base_get_ncpus(prte_node_t *node,
         hwloc_bitmap_and(prte_rmaps_base.available, prte_rmaps_base.available, obj->allowed_cpuset);
     }
 #else
+    printf("HWLOC_API_VERSION >= 0x20000\n");
     if (NULL == options->job_cpuset) {
         hwloc_bitmap_copy(prte_rmaps_base.available, hwloc_topology_get_allowed_cpuset(node->topology->topo));
     } else {
diff --git a/src/mca/rmaps/round_robin/rmaps_rr_mappers.c b/src/mca/rmaps/round_robin/rmaps_rr_mappers.c
index 484449ce7a..b3e631fea6 100644
--- a/src/mca/rmaps/round_robin/rmaps_rr_mappers.c
+++ b/src/mca/rmaps/round_robin/rmaps_rr_mappers.c
@@ -123,6 +123,7 @@ pass:
          * the user didn't specify a required binding, then we set
          * the binding policy to do-not-bind for this node */
         ncpus = prte_rmaps_base_get_ncpus(node, NULL, options);
+        printf("prte_rmaps_base_get_ncpus() = %d\n", ncpus);
         if (options->nprocs > ncpus &&
             options->nprocs <= node->slots_available &&
             !PRTE_BINDING_POLICY_IS_SET(jdata->map->binding)) {

Produce:

prte_rmaps_base_get_ncpus() = 0
HWLOC_API_VERSION >= 0x20000

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions