Skip to content

Remove hwloc framework. #3029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Remove hwloc framework. #3029

wants to merge 3 commits into from

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Feb 25, 2017

Shift hwloc configure logic to opal_check_hwloc.m4 and dedicate it to finding external hwloc installation. Update all files that access hwloc to the new location

Signed-off-by: Ralph Castain [email protected]

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 25, 2017

Closes #2955 #2954

AC_DEFINE_UNQUOTED(hwloc_external_openfabrics_header,
["$opal_hwloc_openfabrics_include"],
[Location of external hwloc OpenFabrics header])
$1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not think there are $1 nor $2any more.
since hwloc is mandatory, should we simply abort when it is not found ?

last but not least, travis config should be updated to install hwloc packages (both linux and osx) and ibm, lanl and mellanox should be requested to install hwloc packages.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 25, 2017

@artpol84 @jjhursey @hppritcha Can you please add an hwloc rpm to your Jenkins tests?

@artpol84
Copy link
Contributor

@rhc54 how does it works now? Would download and installation of the recent hwloc + providing "--with-hwloc" to OMPI configuration be sufficient?
Is it ok to use external hwloc with internal libevent and PMIx?

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 26, 2017

We find hwloc by default so long as it is in a standard system location - if you install from rpm, that should be fine. There is no issue with external hwloc and the other components as we aren't setting any paths to the system directories.

Ralph Castain and others added 3 commits February 27, 2017 05:27
…oc.m4 and dedicate it to finding external hwloc installation. Update all files that access hwloc to the new location

Have hwloc configure fail if support is not found or mistakenly directed to be omitted.
Return an error if someone attempts to dss.copy an hwloc topology tree so we can still support OS versions that are at v1.5 by default
Silence common symbol warning

Fix typos

Signed-off-by: Ralph Castain <[email protected]>
Signed-off-by: Gilles Gouaillardet <[email protected]>
HWLOC_OBJ_OSDEV_COPROC is not defined in hwloc 1.5, so do not
try to detect coprocessors unless this symbol/macro is defined.

Signed-off-by: Gilles Gouaillardet <[email protected]>
@jsquyres
Copy link
Member

I went to try to install my OS distro's package for the "devel" version of hwloc today (i.e., that includes the hwloc headers). I ended up doing a bit of a survey of hwloc availability in several Linux distros:

  1. RHEL/CentOS: 👎 The hwloc package in the distro does not include the hwloc headers. I cannot find a package in RHEL (i.e., a hwloc-devel-like package) that contains the hwloc headers.
  2. SLES: 👍 I see that SLES 12sp2 contains a package for hwloc and another hwloc-devel package which includes the headers.
  3. Ubuntu: 👎 In Ubuntu server 16.04.2, I do not see an hwloc package at all.

This makes me a bit concerned about unbundling hwloc from Open MPI. I.e., it feels like hwloc has not yet taken over the world / it is not a safe assumption that a user can easily install hwloc from their OS / distro.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 27, 2017

It's easy enough to download and install the tarball, so the lack of a posted rpm doesn't seem like a blocker. We can/should contact the ones lacking it and push them to please add it - I doubt we'd get much resistance.

@jsquyres
Copy link
Member

I agree that we can download / install hwloc easily. But I'm concerned that we're raising the bar for the average user to install Open MPI properly for the distros that are out in the wild that do not have it. I agree that we can probably push hwloc on the distros, but that will take time.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 27, 2017

Well, come up with an alternative solution that resolves all the problems of external/internal packages and that doesn't distort the entire code base. I can't, and I haven't seen one yet. 😄

@jsquyres
Copy link
Member

All I'm saying is: a big part of the reason for bundling hwloc was because it hadn't taken over the world yet. It seems like that is still true.

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 27, 2017

I hear you - and all I'm saying is that bundling creates its own problems that are catching up to us. We haven't yet found a way out of those problems, other than to unbundle - which is what the distros have been begging us to do for years. So maybe it's time to bite the bullet and do it, pushing others to package the bundles we need.

@ggouaillardet
Copy link
Contributor

@jsquyres
hwloc-devel is available on both CentOS 6 and CentOS 7
iirc, the devel repo is not configured by default on genuine RHEL, and that could explain why you cannot find it.
fwiw, here is the link to rhn that references hwloc-devel

in ubuntu 14.04.3LTS, the package is called libhwloc-dev
(i added it to the travis config, and it works just fine)
i am now testing 16.04.2 server

@ggouaillardet
Copy link
Contributor

@jsquyres the libhwloc-dev package is also available in ubuntu 16.04.2 server

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 28, 2017

@jsquyres I did come up with a possible alternative approach, but I very much doubt anyone (especially the distros) will like it. We could change the build system so we copy (or symlink) any external hwloc headers and libs to the prefix location, and then build/link against them from there. This would remove the problem of pulling in unintended versions for libevent and pmix (note: we could do the same for them).

However, I'm not sure how sys admins would feel about us doing this. 🤷‍♂️

@ggouaillardet
Copy link
Contributor

one of the issue was there were two hwloc.h (the one from hwloc, and the one from the ompi hwloc framerowk), so here is an idea

  • include a know to be working hwloc tarball in the ompi tarball
  • configure tries to use the external hwloc first
  • if it fails, then untar/configure/make/make install hwloc (ompi and hwloc use the same prefix)
  • try again with the newly built hwloc (that should always work)
  • business as usual

that might not be the most elegant approach, but

  • we avoid the issue caused by two headers called hwloc.h
  • we transparently fallback to a working version on systems with no/antique or too recent (e.g. > 2) hwloc
  • i do not have a better suggestion

any thoughts ?

@ggouaillardet
Copy link
Contributor

:bot:mellanox:retest
:bot:lanl:retest

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 28, 2017

My preference would be to remove embedded hwloc and always use an external version. If that is unacceptable, I suspect the simplest solution is to leave things as they are, but change the configure logic. If the internal version is selected, then do not set the CPPFLAGS or LDFLAGS to point to the internal version's location.

Instead, we would always create an opal/hwloc location in the build directory tree. We make that directory look just like an installed area for hwloc - i.e., it has include and lib subdirectories. We then set -I{$top_builddir}/opal/hwloc/include. The rest of our code base simply includes hwloc.h, and the build system adds {$top_builddir}/opal/hwloc/lib/libhwloc to libopen-pal.

If our internal version is selected, then we aim the build product at {$top_builddir}/opal/hwloc. If the external version is selected, then we create the symlinks for {$top_builddir}/opal/hwloc/include and {$top_builddir}/opal/hwloc/lib to point to the correct external places.

We then do the same thing for PMIx, and for libevent if we continue to use it. This eliminates the confusion caused by wanting an external version of one and an internal version of another. It also shouldn't be all that much disruption to the existing code base.

@jsquyres
Copy link
Member

@ggouaillardet Where are these packages available? I could not find them on the distribution DVD / ISOs. Can you supply a link to them?

@rhc54 I'm not sure I grok your proposal about symlinks and whatnot. Wouldn't we still have to set CPPFLAGS / LDFLAGS differently based on whether we use the embedded vs. system hwloc? I ask because the current problem raised by Orion is not internal-vs-external, but the framework header file. Are we still entertaining the simpler fix proposed in #2955?

(this might be easier to discuss on the call in ~30 mins...?).

@rhc54
Copy link
Contributor Author

rhc54 commented Feb 28, 2017

I'm going to surrender and withdraw this suggestion. It's clear there is going to be concern over removing the embedded code. I think we are going to have to do so eventually, but there is no driving force to make it happen now rather than later.

@rhc54 rhc54 closed this Feb 28, 2017
@rhc54 rhc54 deleted the topic/hwloc branch February 28, 2017 17:57
@ggouaillardet
Copy link
Contributor

@jsquyres these packages might not be on the DVD

  • on ubuntu
    sudo apt-get install libhwloc-dev
    does the trick out of the box
  • on CentOS
    sudo yum install hwloc-devel
    also works out of the box
  • on RHEL
    you might have to
    subscription-manager repos --enable rhel-7-server-optional-rpms
    before
    yum install hwloc-devel

@ggouaillardet
Copy link
Contributor

fwiw, i will push a new PR tomorrow, that allows hwloc v1.5
most bits will be inspired from this PR

@jsquyres
Copy link
Member

jsquyres commented Mar 1, 2017

Sweet -- thanks @ggouaillardet. That was on my to-do list yesterday, but I didn't get to it -- much appreciated if you could.

@hppritcha
Copy link
Member

well just an FYI I spent the last couple of hours getting hwloc installed on all the LANL jenkins slaves and plan to use the --with-hwloc going forward with all LANL jenkins slaves. hwloc 1.8 will be used.

@ggouaillardet
Copy link
Contributor

let me know if i should update travis config to use the external hwloc too

@jsquyres
Copy link
Member

jsquyres commented Mar 1, 2017

Certainly not a bad idea to have some more testing with --with-hwloc[=external] and/or --with-hwloc=/path/to/hwloc-install. I have MTT with it, but if we have Jenkins cycles for it, that would be great, too.

@hppritcha hppritcha mentioned this pull request Mar 6, 2017
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants