Skip to content

openmpi.spec: updates for v5.0.0 #10224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 13, 2022
Merged

openmpi.spec: updates for v5.0.0 #10224

merged 1 commit into from
Apr 13, 2022

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Apr 6, 2022

  • Default all 3rd-party packages to be "external". It is strognly
    recomended that the RPM only contain Open MPI, not the embedded
    3rd-party packages. This behavior can be overridden via the
    configure_options rpmbuild macros.
  • Copy all the Sphinx-generated HTML files to pkgdocdir/html. These
    are not normally installed, but it's probably worth it for the RPM.
  • Remove no-longer-existing INSTALL file. This file was removed as
    part of the conversion to RST-based documentation.
  • Tighten up files listings for the individual RPMs.
  • Convert -docs sub-package to exclusively use mandir and pkgdocdir
    (instead of docs.files). We already excluded mandir because
    rpmbuild may have renamed all the man pages after compressing them
    (so we didn't necessarily know the correct filenames). We also
    explicitly list pkgdocdir so that it picks up the entire tree of
    files and also removes the entire tree (including directories) when
    the RPM is removed.
  • Remove some stale F77 and VT references.
  • Fix some minor typos.
  • Fix a few "file listed twice" warnings (but leave at least one set
    of these because in some cases it will appear, but in other cases it
    will not -- depending on the prefix).

Signed-off-by: Jeff Squyres [email protected]

@jsquyres jsquyres added this to the v5.0.0 milestone Apr 6, 2022
@jsquyres jsquyres requested review from acgoldma and dancejic April 6, 2022 13:35
@jsquyres
Copy link
Member Author

jsquyres commented Apr 6, 2022

@acgoldma @dancejic You both indicated that your orgs use the specfile. Please review this PR.

Note a very important change in default behavior in this version of the specfile: it defaults to --with-hwloc=external --with-libevent=external --with-pmix=external --with-prrte=external.

If you run into any problems with this update, patches / suggested fixes would be greatly appreciated. 😄

@wzamazon
Copy link
Contributor

wzamazon commented Apr 6, 2022

Thank you! We will test the PR.

--with-hwloc=external --with-libevent=external --with-pmix=external --with-prrte=external.

Is the recommendation to install pmix and prrte before install openmpi rpm?

@bwbarrett
Copy link
Member

Thank you! We will test the PR.

--with-hwloc=external --with-libevent=external --with-pmix=external --with-prrte=external.

Is the recommendation to install pmix and prrte before install openmpi rpm?

That is the strong advice, yes. See https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html#strong-advice-for-packagers.

@shijin-aws
Copy link
Contributor

That is the strong advice, yes. See https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html#strong-advice-for-packagers.

Does that mean it is recommended to install pmix and prrte via package manager (yum, apt) from the OS repo?

@bwbarrett
Copy link
Member

That is the strong advice, yes. See https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html#strong-advice-for-packagers.

Does that mean it is recommended to install pmix and prrte via package manager (yum, apt) from the OS repo?

You definitely should be using the hwloc and libevent from OS repos. There are unlikely to be PMIx and PRRTE packages for currently released OSes (and if they are, they are unlikely to be new enough). So for someone like AWS providing a repo (or distribution bundle) of RPMs, you'll likely have two more RPMs to ship.

@shijin-aws
Copy link
Contributor

@bwbarrett I see. Thank you for your explanation!

@acgoldma
Copy link
Contributor

acgoldma commented Apr 6, 2022

I am building the rpms for the prrte and openpmix, but looks like there is an issue with prrte (prte).

# contrib/dist/make_dist_tarball --no-greek --highok --no-git-update
...
=============================================
prrte-3.0.0 archives ready for distribution:
prrte-3.0.0.tar.gz
prrte-3.0.0.tar.bz2
=============================================

#  ./contrib/dist/linux/buildrpm.sh -s ../prrte-3.0.0.tar.bz2
...
--> Created destination specfile: /root/RPMBUILD/SPECS/prte-3.0.0.spec
--> Building the PRTE SRPM
--> rpmbuild --define 'mflags -j4' --define '_source_filedigest_algorithm md5'  --define '_binary_filedigest_algorithm md5' --define '_topdir /root/RPMBUILD' --define 'dist %{nil}' -bs /root/RPMBUILD/SPECS/prte-3.0.0.spec
error: Bad source: /root/RPMBUILD/SOURCES/prte-3.0.0.tar.bz2: No such file or directory
*** FAILURE BUILDING SRPM!

Looks to be a simple name issue?

@acgoldma
Copy link
Contributor

acgoldma commented Apr 6, 2022

SRPM is building now with this patch to spec file:

rpmbuild --define "_name prrte" -bs /root/rpmbuild/SPECS/prte-3.0.0.spec
@@ -355,7 +355,7 @@
 # there that are not meant to be packaged.
 rm -rf $RPM_BUILD_ROOT

-%setup -q -n prte-%{version}
+%setup -q -n %{name}-%{version}

 #############################################################################
 #

RPM is failing with unpackaged files.

Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/prrte-3.0.0-1.el8.x86_64
error: Installed (but unpackaged) file(s) found:
   /usr/include/prte.h
   /usr/include/prte_version.h


RPM build errors:
    Installed (but unpackaged) file(s) found:
   /usr/include/prte.h
   /usr/include/prte_version.h

@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 7, 2022

When testing this PR, I hit this error when generating a ompi tarball via make dist:

$  ./autogen.pl
...
$  ./configure --prefix=/opt/openmpi --with-libevent=external --with-hwloc=external --with-prrte=external --with-pmix=external
...
$  make dist

make[3]: Entering directory `/home/ec2-user/ompi/test/spc'
make[3]: Leaving directory `/home/ec2-user/ompi/test/spc'
make[2]: Leaving directory `/home/ec2-user/ompi/test'
 (cd docs && make  top_distdir=../openmpi-gitclone distdir=../openmpi-gitclone/docs \
     am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
make[2]: Entering directory `/home/ec2-user/ompi/docs'
make[2]: *** No rule to make target `_build/man/ompi-wrapper-compiler.1', needed by `distdir'.  Stop.
make[2]: Leaving directory `/home/ec2-user/ompi/docs'
make[1]: *** [distdir] Error 1
make[1]: Leaving directory `/home/ec2-user/ompi'
make: *** [dist] Error 2

@jsquyres do you have any ideas where the issue is?

@jsquyres
Copy link
Member Author

jsquyres commented Apr 7, 2022

@acgoldma If the PMIx and PRRTE packages have problems with their specfiles, you'll need to bring that up in their repos.

FWIW: I think that PRRTE is really, really weird (and will be very confusing for users) for using prrte in some places and prte in other places. I've mentioned this before, but have not won this argument.

@jsquyres
Copy link
Member Author

jsquyres commented Apr 7, 2022

jsquyres added a commit that referenced this pull request Apr 7, 2022
@acgoldma
Copy link
Contributor

acgoldma commented Apr 7, 2022

Looks like romio341 detects Level-Zero on my system and causes issues.

  CC       src/gavl/mpl_gavl.lo
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c: In function 'MPL_gpu_ipc_handle_create':
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:123:11: warning: implicit declaration of function 'zeDriverGetMemIpcHandle'; did you mean 'zeMemGetIpcHandle'
? [-Wimplicit-function-declaration]
     ret = zeDriverGetMemIpcHandle(global_ze_driver_handle, ptr, ipc_handle);
           ^~~~~~~~~~~~~~~~~~~~~~~
           zeMemGetIpcHandle
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c: In function 'MPL_gpu_ipc_handle_map':
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:139:9: warning: implicit declaration of function 'zeDriverOpenMemIpcHandle'; did you mean 'zeMemOpenIpcHandle
'? [-Wimplicit-function-declaration]
         zeDriverOpenMemIpcHandle(global_ze_driver_handle,
         ^~~~~~~~~~~~~~~~~~~~~~~~
         zeMemOpenIpcHandle
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:140:69: error: 'MPL_gpu_ipc_mem_handle_t' {aka 'struct _ze_ipc_mem_handle_t'} has no member named 'global_dev
_id'
                                  global_ze_devices_handle[ipc_handle.global_dev_id],
                                                                     ^
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:141:44: error: 'MPL_gpu_ipc_mem_handle_t' {aka 'struct _ze_ipc_mem_handle_t'} has no member named 'handle'
                                  ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                            ^
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:141:53: error: 'ZE_IPC_MEMORY_FLAG_NONE' undeclared (first use in this function); did you mean 'ZE_IPC_MEMORY
_FLAG_TBD'?
                                  ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                                     ^~~~~~~~~~~~~~~~~~~~~~~
                                                     ZE_IPC_MEMORY_FLAG_TBD

Looks like it is not using the correct API?

I will remove level-zero from my system temporarily, but this should be disabled somehow.

@jsquyres
Copy link
Member Author

jsquyres commented Apr 7, 2022

@acgoldma Sounds like that's a general bug in the ROMIO in OMPI master/v5.0.x (i.e., nothing to do with the RPM packaging). Can you file a separate issue about it and label/milestone it for master+v5.0.x?

@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 7, 2022

@jsquyres Thanks. That works for me. I could build the ompi rpm and install it successfully. But I had an issue when running mpirun

$ ./buildrpm.sh -b -R $PWD/ompi_build -r "--define '_prefix /opt/amazon/openmpi'" openmpi-5.1.0a1.tar.bz2
...
$ sudo yum install -y /home/ec2-user/ompi-main/contrib/dist/linux/ompi_build/RPMS/x86_64/openmpi-5.1.0a1-1.amzn2.x86_64.rpm
...
$ /opt/amazon/openmpi/bin/mpirun
The mpirun ("/opt/amazon/openmpi/bin/prterun") cmd failed to exec its actual executable - your application will NOT execute. Error: No such file or directory

$ which prterun
/usr/local/bin/prterun

So I installed prrte earlier in /usr . Must I install prrte in the same location with ompi?

@acgoldma
Copy link
Contributor

acgoldma commented Apr 7, 2022

So I installed prrte earlier in /usr . Must I install prrte in the same location with ompi?

Same issue, I just added a symlink as a workaround and it seems to work so far, need a bit more testing.

@bwbarrett
Copy link
Member

$ /opt/amazon/openmpi/bin/mpirun
The mpirun ("/opt/amazon/openmpi/bin/prterun") cmd failed to exec its actual executable - your application will NOT execute. Error: No such file or directory

$ which prterun
/usr/local/bin/prterun


So I installed prrte earlier in `/usr` . Must I install prrte in the same location with ompi?

Is PRRTE installed in /usr or /usr/local? For various reasons, Open MPI requires PRRTE not move after install. it's weird that it found /usr/local/bin/prterun during RPM build if it was in /usr. But hard to tell what happened without seeing the config.log of the build.

@acgoldma
Copy link
Contributor

acgoldma commented Apr 7, 2022

I was able to get prrte rpm to install to default location:

# which prterun
/usr/bin/prterun

Also has same issue as above.

@shijin-aws
Copy link
Contributor

@bwbarrett I did not install prrte via rpm, I just ./autogen.pl; ./configure; sudo make install. The config.log shows it has a default prefix as /usr

Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,
ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http:/
/bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release
--enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --ena
ble-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash
-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-libsanit
izer --enable-gnu-indirect-function --enable-libcilkrts --enable-libatomic --enable-libquadmath
 --enable-libitm --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux

And I did not move prrte's location after installation.

@acgoldma
Copy link
Contributor

acgoldma commented Apr 7, 2022

I think I found the bug in the configure:

config.out:./configure: line 77762: tets: command not found
configure:                  if tets -z "$PRTE_PATH"; then :
config/ompi_setup_prrte.m4:                  AS_IF([tets -z "$PRTE_PATH"],

this appears to be a bug here:

AS_IF([tets -z "$PRTE_PATH"],

-                  AS_IF([tets -z "$PRTE_PATH"],
+                 AS_IF([test -z "$PRTE_PATH"],

@awlauria
Copy link
Contributor

awlauria commented Apr 7, 2022

@acgoldma nice catch - can you please PR?

@bwbarrett
Copy link
Member

@acgoldma nice catch - can you please PR?

I'll post a PR to fix it. There's something else going on beyond just that issue; we should not have gotten into a situation where we're calling the wrong prterun just because of that one typo.

@jsquyres
Copy link
Member Author

jsquyres commented Apr 7, 2022

I found another unexpected breakage (i.e., assumptions of OMPI and PRTE and/or PMIx in the same install tree): the Java MPI bindings. I'll file a separate issue about this.

@jsquyres jsquyres mentioned this pull request Apr 7, 2022
@shijin-aws
Copy link
Contributor

we should not have gotten into a situation where we're calling the wrong prterun just because of that one typo.

Yeah I tried that patch and re-created the ompi tarball and rpm, but the prte location issue does not get fix.

@jsquyres jsquyres mentioned this pull request Apr 8, 2022
@jsquyres jsquyres force-pushed the pr/rpm-updates-for-v5.0.0 branch 2 times, most recently from 6fd6528 to 6a9cf5f Compare April 9, 2022 15:31
@jsquyres
Copy link
Member Author

jsquyres commented Apr 9, 2022

I have rebased on main HEAD (to get the installation of HTML docs), and I believe I have addressed all reviewer comments and also fixed up the contents of the runtime, devel, and docs RPMs.

I think we still have PMIx/PRTE integration issues (e.g., if installation prefixes are different, etc. -- see #10252), but if @acgoldma @shijin-aws @wzamazon @bwbarrett could verify that the all-in-one and/or sub-project RPMs are building like you expect them to / contain what you expect them to, that would be great. Thanks!

@jjhursey
Copy link
Member

bot:ibm:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Apr 11, 2022
@open-mpi open-mpi deleted a comment from ibm-ompi Apr 11, 2022
@open-mpi open-mpi deleted a comment from ibm-ompi Apr 11, 2022
@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 11, 2022

@jsquyres I did not hit RPM build and install issue for the Ompi spec file within this PR's range. I hit the RPM build and install issue as @acgoldma reported in openpmix/prrte#1332 for PRRTE.

Since Brian opened #10252 to track the prterun location issue separately. So I do not have extra issues for this PR only then.

@jsquyres
Copy link
Member Author

@shijin-aws @acgoldma Just curious: do you build/use/distribute the all-in-one RPM or the sub-packages RPMs?

@shijin-aws
Copy link
Contributor

@jsquyres I build/use the all-in-one RPMs (build_rpms.sh -b)

@acgoldma
Copy link
Contributor

@jsquyres I build/use the all-in-one RPMs (build_rpms.sh -b)

Same with me, though we build in 2 steps: first SRPM (no -b) then use rpmbuild to build single rpm form SRPM.

@jsquyres jsquyres force-pushed the pr/rpm-updates-for-v5.0.0 branch from 6a9cf5f to 390e362 Compare April 11, 2022 23:10
@jsquyres
Copy link
Member Author

jsquyres commented Apr 11, 2022

An extra commit somehow got in this PR ("Remove AMCA param sets") that should not have been here. Just pushed to remove that one commit and leave just the one RPM spec file update commit.

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your commit message looks like it has the remnants of a checkpoint commit squash.

- Default all 3rd-party packages to be "external".  It is strognly
  recomended that the RPM only contain Open MPI, not the embedded
  3rd-party packages.  This behavior can be overridden via the
  configure_options rpmbuild macros.
- Copy all the Sphinx-generated HTML files to pkgdocdir/html.  These
  are not normally installed, but it's probably worth it for the RPM.
- Remove no-longer-existing INSTALL file.  This file was removed as
  part of the conversion to RST-based documentation.
- Tighten up files listings for the individual RPMs.
- Convert -docs sub-package to exclusively use mandir and pkgdocdir
  (instead of docs.files).  We already excluded mandir because
  rpmbuild may have renamed all the man pages after compressing them
  (so we didn't necessarily know the correct filenames).  We also
  explicitly list pkgdocdir so that it picks up the entire tree of
  files and also removes the entire tree (including directories) when
  the RPM is removed.
- Remove some stale F77 and VT references.
- Fix some minor typos.
- Fix a few "file listed twice" warnings and clean up lists of files
  included in the runtime and devel sub-project RPMs.

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres
Copy link
Member Author

jsquyres commented Apr 12, 2022

Your commit message looks like it has the remnants of a checkpoint commit squash.

🤬 Thanks for finding that. Fixed.

@jsquyres jsquyres force-pushed the pr/rpm-updates-for-v5.0.0 branch from 390e362 to 64a0337 Compare April 12, 2022 11:47
@jsquyres jsquyres merged commit 8d20620 into main Apr 13, 2022
@jsquyres jsquyres deleted the pr/rpm-updates-for-v5.0.0 branch April 13, 2022 12:26
jsquyres added a commit to jsquyres/ompi that referenced this pull request Apr 13, 2022
Adding "-j12" to the mflags was intended as a debuggin aid; it
accidentally slipped through and was merged as part of
open-mpi#10224.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Apr 13, 2022
Adding "-j12" to the mflags was intended as a debuggin aid; it
accidentally slipped through and was merged as part of
open-mpi#10224.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit d034ed6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants