-
Notifications
You must be signed in to change notification settings - Fork 900
v5.0.x accelerator/cuda: Add delayed initialization logic #11296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: William Zhang <[email protected]> (cherry picked from commit f0580fd)
Instead of dlopening cuda, add direct dependency on libcuda. This also means we can remove the dlopen dependency. Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 26e244c)
Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 8ed9056)
Signed-off-by: William Zhang <[email protected]> (cherry picked from commit d5ba0a3)
Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit d8c1471) Signed-off-by: William Zhang <[email protected]>
Many compilers (tested with gcc and clang) use the memcpy and memmove keywords as intrinsics functions. They also lack a proper syntactic matching, and this prevents the use of any intrincs names as members of structures. Use a different name for the 2 members of the accelerator framework that handles memory copies and moves. Fixes open-mpi#10869. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 8be95d7) Signed-off-by: William Zhang <[email protected]>
this pr removes the function table from the rocm component (and hence the dlopen functionality), as well as the lock used during initialization and shutdown. Some minor changes are further also required to configure and Makefile logic. Signed-off-by: Edgar Gabriel <[email protected]> (cherry picked from commit 86bc10a)
Previously did not use the right offset and caused data validation issues. Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 36a35fb)
The selected component was not properly skipped due to using the wrong pointer for the skip parameter. Also changed to using mca_base_framework_components_close Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 0bbe734)
Previously did not use the right offset, same as 36a35fb Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 13bcfab)
Added updated documentation for the dso type cuda support and the updated ofi mtl support. Signed-off-by: William Zhang <[email protected]> (cherry picked from commit f914632)
Implement MPI_COMM_TYPE_HW_UNGUIDED for MPI_Comm_split_type for V5.0.x
Make opal/ompi symbols used in only one file 'static' Fix review comments Signed-off-by: David Wootton <[email protected]> (cherry picked from commit 8ebf0d2)
Signed-off-by: Joseph Schuchart <[email protected]> (cherry picked from commit 77e502b)
…d-v5.0.x [v5.0.x] Fix compilation of x86-64-asm based atomic backend
Signed-off-by: Boris Karasev <[email protected]> Co-authored-by: Sergey Oblomov <[email protected]> (cherry picked from commit 8362a2d)
George Katevenis had two names in git logs. Add a mailmap entry to force use of his full name. Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit ce5d507)
I had a slightly different name for commits from my personal email account, resulting in two AUTHORS entries. Update the name so that they are merged into one entry. Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit d427ab0)
Signed-off-by: Edgar Gabriel <[email protected]> (cherry picked from commit 9bdaf3e)
Add some formulaic text to the MPIX man pages: * Indicated that these functions are only present if the corresponding extenion was built * Described the available preprocessor macros * Added a link to the Open MPI Extensions section * Fixed string errors in the example code * Used proper #if conditionals in the example * Added a See Also section Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit cc976e7)
v5.0.x: Fix mailmap file
Signed-off-by: George Katevenis <[email protected]> (cherry picked from commit 9e13c2a)
…_warn v5.0.x: ucx/pml: show warning if already unsupported UCX version is used
…ge-updates v5.0.x: docs: Minor updates to MPIX man pages
v5.0.x: Initialize opal/smsc outside of btl/sm, to enable its use without it
Signed-off-by: David Wootton <[email protected]> (cherry picked from commit 3fcad0e)
Fix 1 byte overlay in comm_method_string: Coverity CID 1515829
Under normal circumstances epoll and poll produce similar performance on Linux. When busy polling is enabled they do not. Testing with a TCP-based system shows a significan performance degredation when using poll with busy waiting enabled. This performance regression is not seen when using epoll. This PR adjusts the default value of opal_event_include to epoll on Linux only to fix the regression. Fixes open-mpi#10929 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 279f6b6)
On architectures that store long doubles as 80 bit extended precisions or as 64 bit "float64"s, we need conversions to 128 bit quad precision to satisfy MPI_Pack_external/Unpack_external. I added a couple more arguments to pFunction to know what architecture the 'to' and 'from' buffers are. Previously we had architecture info 'local' and 'remote' but I don't know how to correlate local/remote with to/from without adding more arguments as I did. With the incresed information about the context, the conversion function can now convert the long double as needed. I'm using code Lisandro Dalcin contributed for the floating point conversions in f80_to_f128, f64_to_f128, f128_to_f80, and f128_to_f64. These conversion functions require the data to be in local endianness, but one of the sides in pack/unpack is always local so operations can be done in an order that allows the long double conversion to see the data in local endianness. I also added a path to use __float128 for the conversion for #ifdef HAVE___FLOAT128 as that ought to be the more reliable method than rolling our own bitwise conversions. The reason for all the arch.h changes is the former code was inconsistent as to how bits were labeled within a byte, and had masks like LONGISxx that didn't match the bits they were supposed to contain. Signed-off-by: Mark Allen <[email protected]> (cherry picked from commit 308a94e)
OpenPMIx commits since last update: b4a55542 - Update NEWS 2b92a6af - Handle session-info in the gds/hash component 4dd99584 - Handle app-info in the gds/hash component 59c8b8c3 - Update NEWS 7b0bb406 - Stop-in-init applies to all procs in a job 8c4cdd37 - Cleanup some store/retrieve issues b1a65392 - Update EXCEPTIONS 2fb902e5 - Provide a little more useful error output 3147fba1 - Add some debug macros for tracking key values 8ebc45fc - PMIX_OBJ_STATIC_INIT: fixed initialization ca350205 - Roll to rc2 31362d74 - Enhance the performance of the var_scope_push/pop script 4685b607 - pnet/nvd: Fix macro escaping issue 92fbde60 - llvm/oneapi: fixes to bring pmix up to iso c99 f1171cf5 - Fix some memory leaks and cleanup macro defns fed0ad14 - Plug a memory leak PRRTe commits since last update: a3e81f2efb - Update NEWS 34735ca44a - Pickup missing changes c49aa76728 - Reduce debugger confusion 3fde1e53cc - misc unused var cleanups 69b0570e8a - remove unused vars and fix rc/ret typo 0041d2278c - more unused vars f9049d514a - unused var df47bc6dea - squash warnings 135452cd11 - remove some unused vars 428a51cc6a - Cleanup grpcomm cruft cc402aa402 - Support query of pset membership 08c03741ba - plm/tm: Fix build breakage 0387c18b7b - Fix memory leaks in RML and at job termination. a001245cfa - Update Open MPI mpirun help text 1a25f6602f - Change --stop-in-* to take optional arguments. 799d7fd769 - schizo/ompi: Fix --report-pid/sid. 59b4d6bb81 - alps fixes for mca move ecb4d2d125 - Allow prterun to act as prun d5d47c8ea6 - Catch some more component updates b2302a09dd - ras/lsf: Fix build breakage 4349e72a6f - Fix a typo and expand debugger example range to cover MPI 1d2bfabb81 - Fix mapping by pe-list when oversubscribed 6944a64068 - Push launch-agent CLI into the env b79d6b0a03 - Fix print statement 1b850dd64f - Actually support the output-proctable option 96096dc428 - Fix some memory leaks during resource mapping. 989a73cc9b - Fix some memory leaks during resource mapping. c7691c7e82 - Update NEWS 7aa528613a - Fix --preload-binary. 111d2baddd - schizo/ompi: Fix --use-hwthread-cpus option. 87fb5c670a - Plug a memory leak 220b7e80a1 - BuildRequires: gcc 5f591bf93b - Complete help text on notifications Signed-off-by: Austen Lauria <[email protected]>
Coverity CID 1498717 Signed-off-by: David Wootton <[email protected]> (cherry picked from commit 64a8d74)
v5: Fix uninitialized pointer in mca_smpl_ucx_register
v5: Fix memory leak in dpm_convert (dpm.c)
v5: Fixing missing lock release in mca_pml_ob1_record_htod_event
v5: Fix missing lock release in oshmem_proc_group_create
v5: Fix memory leak in mca_coll_han_init_dynamic_rules: Coverity CID 1516452
v5: Fix invalid access after free in do_recv: Coverity CID 1517308
You cannot daemonize the "prte" executable when spawning it to support a singleton as that will cause things to hang. Also fix IO forwarding thru the singleton for the spawned child procs by correcting a mistake that caused the IOF request attributes to be overlooked when constructing the job info for the PMIx_Spawn call. Includes an update to the PMIx and PRRTE submodule pointers to pickup a couple of relevant corrections there. See: openpmix/prrte#1621 openpmix/openpmix#2881 This brings the submodule pointers to the HEAD of their respective release branches, which are basically at an rc1 level (but not tagged yet). Signed-off-by: Ralph Castain <[email protected]> (cherry picked from commit 8ca4d7c)
v5.0: Fix singleton operations
Coverity CID 1458001 Signed-off-by: David Wootton <[email protected]> (cherry picked from commit 8a798fd)
Fix a segfault when operating with a login node that has a different topology than the compute nodes. Signed-off-by: Ralph Castain <[email protected]>
…nagement Layer. I have corrected it. Signed-off-by: zhuodong <[email protected]> (cherry picked from commit 19b515f)
The net provider is an enhanced version of tcp provider, therefore should also be excluded. Signed-off-by: Wei Zhang <[email protected]> (cherry picked from commit d7ef0d4)
v5: Fix memory leak in mca_btl_tcp_proc_handle_modex_addresses
v5.0.x: Increment the PMIx/PRRTE submodule pointers
Signed-off-by: Mamzi Bayatpour <[email protected]> (cherry picked from commit a12aa2f)
which sets the LD_LIBRARY_PATH to point to a system pmix which is too old for the prte used by main and v5.0.x. Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit fdaa901)
…-read-write-v5.0 common/ompio: implement pipelined read and write operation
…ap-v5.0 fs/lustre: fix assignment of info objects to lustre args
…r-fix-v5.0 accelerator/rocm: fix check_addr function
update bml.h
…ovider [v5.0.x] opal/common/ofi: add net to provider exclude list
…_module_v50x LANL/CI: workaround for aocc module
application Signed-off-by: Mamzi Bayatpour <[email protected]> Co-authored-by: Tomislav Janjusic <[email protected]> (cherry picked from commit 076fca7)
…evel-v5 v5.0.x OSC/UCX: avoid creating ucp context if the application does not have MPI-RMA
v5.0.x: pml/ucx: move pmix finalize to the end of ompi_rte_finalize()
The current implementation requires the application to do cudaInit before calling MPI_Init. Added delayed initilization logic to wait as long as possible before creating resources requiring a cuContext. Signed-off-by: William Zhang <[email protected]> (cherry picked from commit b751060)
Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 48ae44b)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #11253 PR