-
Notifications
You must be signed in to change notification settings - Fork 900
Patch for linking libfabric #2519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@amckinstry thanks for the patch @rhc54 @jsquyres that is an intersting one ... we link indeed, int mca_common_libfabric_register_mca_variables(void)
{
return OPAL_SUCCESS;
} a possible fix is the suggested patch. int mca_common_libfabric_register_mca_variables(void)
{
if (fi_version() >= FI_VERSION(1,0)) {
return OPAL_SUCCESS;
} else {
return OPAL_ERROR;
}
} and an other one is to simple remove any thoughts ? |
There was some logic behind that component, but I honestly don't recall. I'd just use your patch for now. |
@ggouaillardet Good call -- yes, calling |
I am able to reproduce this issue using the OMPI v4.0.x on Ubuntu 16.04. I can see that the libmpi.so is not picking libfabric as its dynamic link.
This causes basic MPI tests like
Command used to configure OMPI build:
Any suggestions on fixing this? |
1 similar comment
I am able to reproduce this issue using the OMPI v4.0.x on Ubuntu 16.04. I can see that the libmpi.so is not picking libfabric as its dynamic link.
This causes basic MPI tests like
Command used to configure OMPI build:
Any suggestions on fixing this? |
Sounds like the fix never landed the repository ! Will do tomorrow. |
Specifically, if you I suspect that the issue you're seeing here is that you are linking against and older version of What version of libfabric are you linking against? You should open a new issue for this -- this existing issue was a different problem that was already resolved / closed. |
@jsquyres I checked output for ldd mca_mtl_ofi.so and can see it is missing libfabric linking
I have confirmed in the past that my libfabric build does have a fi_dupinfo() call. See below:
Also, I am using OFI v1.7.x I saw that this issue was still open and hence wanted to confirm if it was ever merged. I can open a new issue for this. |
That is pretty weird to me -- I have no idea how you would have an
Can you open a new issue and include the stdout from your configure, your config.log, and the stdout from make? (you might need to paste those large files into a gist or something) |
@jsquyres per my previous analysis, |
use $(opal_common_ofi_*) variables since these are the only one defined (by opal/mca/common/ofi/configure.m4) Refs. open-mpi#2519 Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Gilles Gouaillardet <[email protected]>
Hi @rashikakheria, would you please share the details of your setup? I have Ubuntu 16.04.5 LTS (in KVM) but cannot reproduce the issue you are seeing. I'm building from master and I have libfabric v1.7.0. thanks, macabral@ubuntu16:/tmp/ompi$ git log -1 |head -1 macabral@ubuntu16:/tmp/ompi$ head config.log |grep -e "./configure" macabral@ubuntu16:/tmp$ ldd /tmp/ompi-master-git/lib/openmpi/mca_mtl_ofi.so macabral@ubuntu16:/tmp/mpi_helloworld$ mpirun -np 2 -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include sockets ./mpi_helloworld |
@matcabral I tried the exact version you mentioned and still see the same issue
I am also using Ubuntu 16.04 LTS. Here is my kernel version:
Let's follow further discussion on new issue as requested by @jsquyres |
This is quite odd. I see the following: ompi/ompi/mca/mtl/ofi/Makefile.am Lines 78 to 80 in 0601b3e
which shows that ompi/opal/mca/common/ofi/Makefile.am Line 74 in 0601b3e
which shows that In my build:
And: $ ldd mca_mtl_ofi.so
...
libmca_common_ofi.so.0 => /home/jsquyres/bogus/lib/libmca_common_ofi.so.0 (0x00002aaaab958000)
libfabric.so.1 => /home/jsquyres/libfabric-1.6.1/install/lib/libfabric.so.1 (0x00002aaaabb59000)
... Showing that $ ldd libmca_common_ofi.so.0
...
libfabric.so.1 => /home/jsquyres/libfabric-1.6.1/install/lib/libfabric.so.1 (0x00002aaaaacaf000)
... Showing that the OPAL common OFI library is linked against libfabric. |
Blarg. I just deleted my last comment because it was wrong. In both cases, Libtool inserted |
Note that there was a second issue opened for a while and some discussion happened over there -- be sure to see #6360 for some additional content. We closed that issue and will continue the discussion here, just to keep it all together. |
@rashikakheria Can you do this in your build tree: $ cd opal/mca/common/ofi
$ rm libmca_common_ofi.la
$ make V=1 and send the output? |
Here is the output:
|
the patch fixes the issue for me (up-to-date ubuntu xenial) stock master does not depend on
patched master depends on
here is the up-to-date patch diff --git a/ompi/mca/mtl/ofi/Makefile.am b/ompi/mca/mtl/ofi/Makefile.am
index 2499f85..58c7ce2 100644
--- a/ompi/mca/mtl/ofi/Makefile.am
+++ b/ompi/mca/mtl/ofi/Makefile.am
@@ -5,6 +5,8 @@
# Copyright (c) 2017 Los Alamos National Security, LLC. All rights
# reserved.
# Copyright (c) 2017 IBM Corporation. All rights reserved.
+# Copyright (c) 2019 Research Organization for Information Science
+# and Technology (RIST). All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
@@ -18,7 +20,7 @@ EXTRA_DIST = post_configure.sh \
MAINTAINERCLEANFILES = \
$(generated_sources)
-AM_CPPFLAGS = $(ompi_mtl_ofi_CPPFLAGS) $(opal_common_ofi_CPPFLAGS)
+AM_CPPFLAGS = $(opal_common_ofi_CPPFLAGS)
dist_ompidata_DATA = help-mtl-ofi.txt
@@ -55,7 +57,7 @@ mtl_ofi_sources = \
# files should be added to generated_source_modules, as well as adding
# their .c variants to generated_sources.
%.c : %.pm;
- $(PERL) generate-opt-funcs.pl $@
+ $(PERL) -I$(top_srcdir)/ompi/mca/mtl/ofi $(top_srcdir)/ompi/mca/mtl/ofi/generate-opt-funcs.pl $@
# Make the output library in this directory, and name it either
# mca_<type>_<name>.la (for DSO builds) or libmca_<type>_<name>.la
@@ -73,15 +75,15 @@ mcacomponentdir = $(ompilibdir)
mcacomponent_LTLIBRARIES = $(component_install)
mca_mtl_ofi_la_SOURCES = $(mtl_ofi_sources)
mca_mtl_ofi_la_LDFLAGS = \
- $(ompi_mtl_ofi_LDFLAGS) \
+ $(opal_common_ofi_LDFLAGS) \
-module -avoid-version
mca_mtl_ofi_la_LIBADD = $(top_builddir)/ompi/lib@[email protected] \
- $(ompi_mtl_ofi_LIBS) \
+ $(opal_common_ofi_LIBS) \
$(OPAL_TOP_BUILDDIR)/opal/mca/common/ofi/lib@OPAL_LIB_PREFIX@mca_common_ofi.la
noinst_LTLIBRARIES = $(component_noinst)
libmca_mtl_ofi_la_SOURCES = $(mtl_ofi_sources)
libmca_mtl_ofi_la_LDFLAGS = \
- $(ompi_mtl_ofi_LDFLAGS) \
+ $(opal_common_ofi_LDFLAGS) \
-module -avoid-version
-libmca_mtl_ofi_la_LIBADD = $(ompi_mtl_ofi_LIBS)
+libmca_mtl_ofi_la_LIBADD = $(opal_common_ofi_LIBS)
|
@ggouaillardet I think you just hit the nail on the head: common/ofi doesn't actually utilize any libfabric symbols. This brings up (again) the idea that we should just delete common/ofi, because it never fulfilled its original purpose and just causes indirect problems like this. I think we talked about this in the last week or two on the weekly webex, but I don't think an issue was created for it. EDIT: Correction -- we talked about this on the webex and I put a comment on #6313. |
@ggouaillardet I've got a solution in the works. Should have a PR shortly. |
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]>
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]>
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]>
Please see PR #6363, which is a rollup of all known outstanding OFI configure/linking issues. |
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]>
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]>
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit dd20174)
It never lived up to its purpose (and has caused amorphous indirect errors such as open-mpi#2519), so delete it. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit dd20174)
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
As discussed in open-mpi#2519 the common component does not depend on libfabric yet. This commit introduces this dependency by just calling fi_version(). Signed-off-by: guserav <[email protected]> (cherry picked from commit 8a67a95) Signed-off-by: Brian Barrett <[email protected]>
….1.1 Aboorva Devarajan (3): pml/ucx: fix zero sized datatype transfers pml/ob1: fix build issue in CUDA path ompi/group: fix proc pointer comparison in groups Alex Anenkov (1): coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce Aravind Gopalakrishnan (7): MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE MTL/OFI: Add OFI Scalable Endpoint support Fix for SEP when num local procs is greater than available contexts mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts mtl/ofi: Fix reference to help text object btl/ofi: Fix valgrind complaints on uninitialized pointer use mtl/ofi: Fix segfault when not using Thread-Grouping feature Artem Polyakov (3): schizo/slurm: Disable binding in case of Slurm direct launch pmix/pmix3x: Fix internal PMIx discovery logic. pmix: Fix detection of Externally-built PMIx Aurelien Bouteiller (1): Always return a valid error code from collective operations Austen Lauria (7): Make a managed allocation filter a hostfile/hostlist. Fix bug where orte under a managed allocation does not honor -host. Make sure MPIR_Breakpoint() is compiled without CFLAGS. osc/rdma: Tighten up concurrent memory region access. Fix case where debuggers cannot read the MPIR proctable. Powerpc atomics: Force usage of powerpc assembly. Fix "variadic macros" warning. Bert Wesarg (2): oshmem/mca/sshmem: Fix build with `--enable-mem-debug` fs/lustre: Remove unneeded includes Brelle Emmanuel (1): Bull update of coll/han : added barrier, a 'simple' scatter, some Doxygen and some fixes Brian Barrett (20): dist: Start v4.1.x release series Revert "Remove the OFI/BTL component" mtl/ofi: Fix crash if no providers found mtl/ofi: Print descriptive error message on modex failure mtl/ofi: Provide av count hint during initialization ofi: Call add_procs through PML dist: Add OFI backports to NEWS coll libnbc: Remove dead code dist: Add Collectives backports to NEWS dist: Move version to 4.1.0rc1 dist: Update NEWS file for 4.1.0rc1 dist: Update version to 4.1.0rc2 dist: Update NEWS for 4.1.0 dist: Update NEWS file from branches dist: Add NEWS items for recent commits in v4.1.x series dist: Bump version after releasing 4.1.0rc2 opal: Remove outdated MacOS workaround opal: Disable memory patcher component on MacOS dist: Prep for 4.1.1rc3 dist: Update VERSION and README for v4.1.1rc4 Charles Shereda (1): Fixed uninitialzed memory access bug in base64 encoding. Christoph Niethammer (3): Accept UCX 1.8 in configure of btl/uct Fix memory leak in configure, which prevents leak sanitizer usage Fix error with stricter quoting requirements of autoconf-2.70 Devendar Bureddy (1): UCX: initialize cuda from ucx pml component Dipti Kothari (1): mca/pml: PML check for direct modex Edgar Gabriel (6): common/ompio: use avg. file view size in the aggregator selection logic ompio: resync v4.1 branch to master fbtl/posix: ensure progressing aio requests common_ompio_file_set_view: fix handling of MPI_DISPLACEMENT_CURRENT fbtl_posix_progress: aio_return can indicate partial completion common_ompio_file_set_view: recognize negative disp in access Geoffrey Paulsen (1): Adding SLURM binding policy change to README George Bosilca (19): Remove few warnings in libnbc identified by clang-1000.11.45.2 Use the unaligned SSE memory access primitive. Check unaligned ops for correctness. Fix the cacheline usage in the CUDA BTL. A complete overhaul of the HAN code. Fix partial packing of non data elements. Fix HAN issues reported by Coverity. A started generalized request should be marked as pending. Major update to the AVX* detection and support AVX code generation improvements A better test for MPI_OP performance. Always specify the target architecture for AVX Early selection of the best PML. Prevent the establishment of new BTL connections during matching A new binomial scatter using packed data on intermediary processes. Always include the stddef.h header. Reenable the heterogeneous support. Fixing the partial pack unpack issue. Fix the Makefile to include the correct test. Gilles Gouaillardet (14): mtl/ofi: fix configury when VPATH is used coll/libnbc: fix NBC_Unpack() coll/cuda: remove unnecessary references to ORTE mpi/c: fix param checks in [I]Neighbor_alltoall{v,w} fortran.m4: reword error message when sizeof(int) != sizeof(INTEGER) configury: make build Reproducible op/avx: check for _mm512_mullo_epi64() AVX512 intrinsic coll/base: do not drop const qualifier configury: fix OPAL_GET_VERSION configury: fix typos autogen.pl: patch libtool.m4 for OSX Big Sur gcc_builtin: fix performance regression on x86_64 ofi: fix typo in macro name atomic/gcc_builtin: only apply the workaround when required. Goldman, Adam (2): mtl/ofi: Add mising cq_data_size in hints for ofi mtl mtl/ofi: Disable CUDA convertor for specified ofi providers Harumi Kuno (6): Fix mca_btl_ofi_finalize clean-up logic Add comments about order of close ops set ep to NULL to avoid double close mtl_btl_ofi_rcache_init() before creating domain Fix language text for example Fix .so filenames Howard Pritchard (7): RAS:ALPS add support for ANL Cobalt add a common ofi whitelist/blacklist ofi mtl: fix problem with mrecv suppress icc long double message OFI: patch OFI MTL for GNI provider add blurb about issue 7968 to the README OSC/RDMA: fix typo in btl selection logic Jeff Squyres (43): mpi.h.in: fixups for static assert messages mpi.h.in: Remove //-style comments tests/asm/run_tests: fix basename usage .mailmap: Add entry for Harumi Kuno mtl/ofi/Makefile.am: down with tabs! btl/ofi/Makefile.am: down with tabs! mtl/ofi: add a .gitignore mtl/ofi: check for FI_LOCAL_COMM+FI_REMOTE_COMM ofi: revamp OPAL_CHECK_OFI configury libnbc: remove some stale/dead code common_ofi: fix preprocessor macro typo pmix3x: Remove --enable-install-libpmix option fortran.m4: disallow when sizeof(int) != sizeof(INTEGER) opal_get_version.m4: properly quote dir args configure: abort if dirs with spaces are used opal_functions.m4: remove redundant code configure.ac: Add workaround on MacOS for "readlink -f" getdate.sh: make the date(1) usage more portable coll/adapt and coll/han: fix trivial compiler warnings keyval_parse.c: ensure to init values keyval_parse.c: update whitespace/comments NEWS: More updates for v4.1.0 config/Makefile.am: ensure getdate.sh is in dist tarball opal_functions.m4: add comment orterun.1in: fix minor mistake in :PE=2 example orterun.1in: define "slot" and "processor element" orterun.1in: add some markup Fix many compiler warnings VERSION: 4.1.0rc4 coll/base: fix compiler warnings NEWS: OMPIO is now the default everywhere v4.1.0: README, VERSION, and LICENSE final updates VERSION: Onward to v4.1.1 MPI_Init_thread(3): update refs about MPI_THREAD_MULTIPLE MPI_Init_thread(3): fix statement about C++ binding config: Stash known-good copies of config.guess|sub autogen: use newer config.sub|guess if available op_avx: use MCA enum flags instead of integer values op_avx: Fix MCA enum flags First cut at Git commit checks as Github Actions git-commit-check: fix typo git-commit-checker: require cherry picks on this branch git-commit-checks: use a better name Joseph Schuchart (19): osc rdma: check for outstanding fragments before completing a request OSC UCX: make sure no-op fetch in rget/rput is properly aligned osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well osc/rdma: fail query_btls if no endpoint for non-local peer is found OPAL: fix string buffer allocation for large env variables coll/tuned: add hint about dynamic rules to mca parameters coll/tuned: Mark global static algorithm as const coll/tuned: don't select algorithms knowing when it's clear they would fall back to linear coll/tuned: fix minor errors in comments COLL TUNED: remove stray selection of linear algs for alreduce and allgather COLL TUNED: Use per-rank data size instead of total size for decision coll/base: Fix collective module selection preference treatment coll/[sm|han|adapt]: don't disqualify on priority 0 coll/han: remove references to experimental solo and shared collective components coll/han: reduce default segment size for reduce/allreduce to 64k OSC RDMA: put memory of each process into separate pages OSC RDMA: only touch pages before memory registration, don't fill them coll/han: fix coll preference selection in mca_coll_han_comm_create_new Fix man page for MPI_Win_attach Josh Hursey (12): Add detection for JSM direct launch v4.1.x: schizo/jsm: Disable binding when direct launched Fix cpu-list for non-uniform nodes Update Internal PMIx to OpenPMIx v3.2.1rc1 Disable man pages for internal OpenPMIx v4.1.x: Update Internal PMIx to OpenPMIx v3.2.1 Fix external PMIx v4.x check Fix --debug-daemons CLI option Remove the orte_static_ports rollup path Check for librt when building LSF support LSF Config: Cleanup logic Fix/Cleanup the return value documentation for mpirun Leonid Genkin (1): Replace usage of the deprecated NB API of UCX with NBX Mark Allen (3): noinline to avoid compiler reading TOC before PATCHER_BEGIN symbol pollution make Type_create_resized set FLAG_USER_UB Matias A Cabral (2): MTL OFI: Add support for mem_tag_format MTL_OFI: Changed Recv cancel to be non-blocking Michael Heinz (2): Add check for PSM2 reference counting to PSM2 MTL #7721 Add minimum library version needed to use PSM2 in OMPI #7779 Mikhail Brinskii (2): COLL/TUNED: Add linear scatter using isend for mlnx platform SHMEM/SCOLL: Fix inplace reductions Mikhail Kurnosov (11): coll/base/allgatherv: fix MPI_IN_PLACE processing coll/libnbc: add recursive doubling algorithm for MPI_Iscan coll/libnbc: add recursive doubling algorithm for MPI_Iexscan coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce coll/libnbc: add knomial tree algorithm for MPI_Ibcast coll/libnbc: add recursive doubling algorithm for MPI_Iallgather coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce coll/libnbc/ireduce: silence Coverity warning CID 1440360 coll/libnbc: remove debug output Fix a typo in parsing locality string: L0 changed to L1 coll/base: reduce memory consumption in Scatter NARIBAYASHI Akira (1): opal/util: Fix typo Nathan Hjelm (7): osc/rdma: fix bug in attach for non-debug builds opal: disable the __atomic built-in atomics by default on AArch64 osc/rdma: ensure bml add_procs has been called for all local procs osc/rdma: fix errors in derived datatype handling for accumulate osc/rdma: rearrange accumulate code osc/rdma: remove extra retain on fop osc/rdma: fix amo-based accumulate Nikola Dancejic (5): common/ofi: Added multi-NIC support to provider selection common/ofi: Fixing compilation issue with ofi versions that do not support fi_info.nic v4.1.x: common/ofi: added address format check to fix provider selection Adding ofi include to CPPFLAGS so that configure is able to check fabric.h v4.1.x: Using package_rank to select between NIC of equal distance from the process. Pak Lui (1): oshmem/tools/oshmem_info: fix an issue with fortran keyword when compiling param.c Raghu Raja (8): mtl/ofi: Do not fail if error CQ is empty mtl/ofi: Fix erroneous FI_PEEK/FI_CLAIM usage mtl/ofi: Check cq_data_size without querying providers again VERSION: 4.1.0rc5 common/ofi: Use opal_show_help() to call out lack of locality info NEWS and VERSION updates for 4.1.1rc1 NEWS updates for v4.1.1rc2 VERSION updates for v4.1.1rc2 Ralph Castain (12): Increment the vpid after assignment Correct computation of relative locality Correctly skip the "mpirun" node when launching orted on it Remove PMIx man page setup Fix the verbose output in ess base Update PMIx to v3.2.2 Update Slurm launch support Adjust copyrights Let Slurm know that our daemons are not MPI tasks Update PMIx to v3.2.3 Add the userid to the vader backing file path Retrieve cpuset when configured with pmix rte Robert Wespetal (1): mtl/ofi: Add workaround for EFA local/remote capabilities bug Sami Ilvonen (1): Add fence_nb to flux pmix Sergey Oblomov (4): COMMON/UCX: improved missing events test PML/UCX: improved error processing in MPI_Recv SPML/UCX: removed direct dependency to SPML UCX OSHMEM/SEGMENT-REGISTRATION: added segment filtering Spruit, Neil R (1): MTL_OFI: Generation of specialized functions at build time Thananon Patinyasakdikul (2): btl/ofi: Added 2 side communication support. btl/ofi: fixed compiler warning on OSX. Tim Wickberg (1): Revert "v4.1.x: Update Slurm launch support" Todd Kordenbrock (2): Use the active PML to call add_procs() mtl-portals4: replace abort() with ompi_rte_abort() Tomislav Janjusic (1): Coll/hcoll: adding scatterv interface Valentin Petrov (4): coll/hcoll: reduce_scatter(block) interface coll/hcoll: compile warning fix coll/hcoll: scatterv inplace fix PML/UCX: don't do pml_check_selected call Wei Zhang (4): oob/tcp: fix a race condition on stop_thread pipe [v4.1.x] ompi : add memory barrier in PMIx registration callback [v4.1.x] btl/ofi: fix memory leaks in error handling path [4.1.x] orte/orted: enable OPAL's mutli-thread support William Zhang (8): coll/tuned: Fix typos coll/tuned: Add NULL check to prevent segfault coll/tuned: Change the default collective algorithm selection btl/ofi: Use common provider include/exclude list btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0 btl/ofi: Disable ofi_rxm provider coll/tuned: Revert RSB and RS default algorithms coll/tuned: Fix dynamic message size for gather and scatter Xi Luo (2): Bring ADAPT collective to 4.1 Initial import of the HAN collective module Yossi Itigin (3): ucx: disable version 1.8 ucx: check supported transports and devices for setting priority pml/ucx: ignore request leak by default, override by mca param bsergentm (1): Coll/han Bull dongzhong (1): Add supports for MPI_OP using AVX512, AVX2 and MMX guserav (4): Revert "Remove opal/mca/common/ofi." common/ofi: Fix check for OFI in build files common/ofi: Fix open-mpi/ompi#2519 common/ofi: Set HPE as owner of component raafatfeki (2): fs/gpfs: Support of GPFS file system fs/ime & fbtl/ime: Support of IME file system tomhers (1): BTL/OFI: Fix missing include file. 4.1.1 -- April, 2021 -------------------- - Fix a number of datatype issues, including an issue with improper handling of partial datatypes that could lead to an unexpected application failure. - Change UCX PML to not warn about MPI_Request leaks during MPI_FINALIZE by default. The old behavior can be restored with the mca_pml_ucx_request_leak_check MCA parameter. - Reverted temporary solution that worked around launch issues in SLURM v20.11.{0,1,2}. SchedMD encourages users to avoid these versions and to upgrade to v20.11.3 or newer. - Updated PMIx to v3.2.2. - Fixed configuration issue on Apple Silicon observed with Homebrew. Thanks to François-Xavier Coudert for reporting the issue. (NEWS truncated at 15 lines)
Linking libfabric breaks on Debian/Ubuntu systems (at least) without the following patch:
The text was updated successfully, but these errors were encountered: