-
Notifications
You must be signed in to change notification settings - Fork 900
Update to hwloc 2.0.0a #3951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to hwloc 2.0.0a #3951
Conversation
@bosilca There are a couple of places in treematch that need to be updated, but they reside upstream. @ggouaillardet set flags to generate compile-time warnings due to the deprecated hwloc functions. |
@jsquyres please check distance calculations in usnic |
@ggouaillardet @hjelmn Can someone look at the openib btl distance calculation? It obviously didn't translate correctly to the new API. |
2 things to fix in the distance code:
* On input, \p nr points to the number of distances that may be stored in \p distances.
* On output, \p nr points to the number of distances that were actually found, even if some of them couldn't be stored in \p distances.
* Distances that couldn't be stored are ignored, but the function still returns
* success (\c 0). The caller may find out by comparing the value pointed by \p nr before and after the function call.
|
It doesn't look like this PR is binding processes properly. When I build thusly:
And when I run inside a SLURM
you can see that the Hmm. After more testing, this is even happening on master (without this PR). Even if I specify # Do not specify a binding
$ mpirun --report-bindings -np 4 --npernode 2 --mca btl vader,self,usnic --mca btl_base_verbose 100 hostname
[mpi006:18492] MCW rank 2 bound to :
[mpi006:18492] MCW rank 3 bound to :
[mpi005:32281] MCW rank 0 bound to :
[mpi005:32281] MCW rank 1 bound to :
mpi005
mpi005
mpi006
mpi006
# Bind to core
$ mpirun --bind-to core --report-bindings -np 4 --npernode 2 --mca btl vader,self,usnic --mca btl_base_verbose 100 hostname
[mpi006:18441] MCW rank 2 bound to :
[mpi006:18441] MCW rank 3 bound to :
[mpi005:32213] MCW rank 0 bound to :
[mpi005:32213] MCW rank 1 bound to :
mpi006
mpi006
mpi005
mpi005
# Bind to socket
$ mpirun --bind-to socket --report-bindings -np 4 --npernode 2 --mca btl vader,self,usnic --mca btl_base_verbose 100 hostname
[mpi005:32246] MCW rank 0 bound to :
[mpi005:32246] MCW rank 1 bound to :
[mpi006:18458] MCW rank 2 bound to :
[mpi006:18458] MCW rank 3 bound to :
mpi005
mpi005
mpi006
mpi006 The usnic BTL only uses hwloc NUMA distances if the processes are bound, so I can't really test the new hwloc stuff until this binding stuff is fixed. ...am I doing something stupid / wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK for my first comment but you still need to address my second bullet:
"Objects are not in logical order in the matrix. You should iterate over the obj array to find the index i and j of your objects, and then use those indexes to get the distance from matrix[i+j*nbobjs]. I'll clarify that in the doc."
@bgoglin Perhaps I am missing something, but we do iterate over the object array to get the indices right before computing the location in the matrix. Is there something else required? |
@jsquyres I'll check the binding on master - was working last I checked |
I can't find any occurence of "->objs" in changes listed at https://github.com/open-mpi/ompi/pull/3951/files. If you were iterating, we would see things like "if (myobj == distances->objs[x])". Am I looking at the right place? In opal/mca/btl/openib/btl_openib_component.c:2480 2486 2516 and 2523, we use logical_index, there's no iteration over the objs array. Same in usnic. |
@bgoglin Okay, I think I'm beginning to understand - your earlier comments confused me as we are looking up the index of the objects we are trying to reference. What you're saying is that they are in a different place in the distances matrix, and so we have to look for them there as well. Sigh - I can see that is going to be really hard to explain clearly in the docs |
@ggouaillardet It looks like your earlier modifications to handle both hwloc2 and earlier versions isn't quite correct. For example, the per-object mapping (ppr mapper) isn't working correctly any more, and neither is the code that prints out the binding report (opal_hwloc_base_cset2mapstr). I'm heading out on vacation for a few days - can you please track it down and fix? Your choice of where/how to do it. You can fix it on master if you prefer, or can create a PR against this one. |
@bgoglin Look okay now? |
@rhc54 Yeah thanks. Do you want a helper function hwloc_distances_get_values_between_objs(distances, obj1, obj2, &value1, &value2) ? |
Might be an idea to make things easier for folks - I don't think there is anything wrong with the current code. I just think it may be hard to communicate to folks. Helper function might eliminate that issue. |
@ggouaillardet I fixed master here: #3963 |
Update to support passing of HWLOC shmem topology to client procs Update use of distance API per @bgoglin Have the openib component lookup its object in the distance matrix Bring usnic up-to-date Restore binding for hwloc2 Signed-off-by: Ralph Castain <[email protected]>
@jsquyres @ggouaillardet I have brought this completely up-to-date and binding works again for both hwloc 1.11 and 2.0. Please check distance calculations - Brice has looked them over for correctness, but I'd like to ensure we still get the right answers. |
thanks Ralph, i was working on a very similar patch. is yours enough to make |
Signed-off-by: Ralph Castain <[email protected]>
@ggouaillardet yeah, i realized just a little while ago that we still needed to filter the cpus against the user-provided cpu list.
There is still something wrong in the rmaps_base_frame.c logic, though it isn't due to these changes. When you specify a cpu-list, we should allow you to still specify a binding policy inside that envelope. In other words, if you don't specify a binding policy, then we bind each proc to the entire envelope. Otherwise, we are supposed to bind each proc within the cpu-list envelope according to the binding policy. Right now, rmaps_base_frame.c will error out if you try to specify both --cpu-list and a binding policy. Would you have a chance to poke at that (I need to get some sleep)? |
@rhc54 ok, i will give it a try |
@jsquyres don't forget to re-try this 😄 |
@jsquyres @hppritcha I would really like to merge this sooner rather than later. So far as we can tell, everything is now working. I won't be on the call this week, but could someone please check to see if anyone has concerns about commit it? |
Now that binding has been fixed, this passes all the sanity tests that I manually checked. |
@rhc54 No one had any objections to this PR on the Webex today. Question, though: does the |
The external component handles both hwloc v1.x and v2.x release branches. Reason: the abstraction glue written by @ggouaillardet and myself is based on macros and direct code all conditioned on the HWLOC version number. So there is no abstraction glue in the components themselves. |
👍 |
Silence some osc/rdma warnings - @hjelmn please comment
Signed-off-by: Ralph Castain [email protected]