Skip to content

v5.0.x accelerator/cuda: Add delayed initialization logic #11296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,690 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1690 commits
Select commit Hold shift + click to select a range
477dd39
rcache/grdma: Replace cuda functions with accelerator functions
wckzhang Sep 26, 2022
0320a11
opal/accelerator: Remove function table and dlopen logic
wckzhang Sep 21, 2022
99333e6
btl/smcuda,rcache/rgpusm,rcache/gpusm Add direct cuda dependency
wckzhang Sep 27, 2022
76058d3
opal/cuda: Remove opal/cuda
wckzhang Sep 27, 2022
214eee7
Use the return value from the get_buffer_id
bosilca Sep 29, 2022
9df64b2
Avoid name collisions with intrinsics functions
bosilca Sep 29, 2022
f6fa79c
accelerator/rocm: updates to the component
edgargabriel Sep 27, 2022
6d5244b
pml/cm: Fix call convertor_prepare_for_send to use the right pointer
wckzhang Sep 30, 2022
5d84c71
opal/accelerator: Fix base selection logic
wckzhang Oct 3, 2022
a8b5d7d
pml/cm: Fix call convertor_prepare_for_send to use the right pointer
wckzhang Oct 7, 2022
d788c53
docs: Update cuda installation and support docs
wckzhang Oct 10, 2022
2de5d6b
Merge pull request #10910 from drwootton/comm_split_unguided_v5
awlauria Oct 12, 2022
3b81d99
Part 1 of resolving global symbol name pollution issue #10708
Sep 20, 2022
6e11d04
Fix compilation of x86-64-asm based atomic backend
devreal Sep 19, 2022
503bbd2
Merge pull request #10913 from devreal/fix-atomic-sync-backend-v5.0.x
awlauria Oct 12, 2022
aae298c
ucx/pml: show warning if already unsupported UCX version is used
karasevb Oct 6, 2022
68ec412
AUTHORS: Fix duplicate author
bwbarrett Oct 13, 2022
0befaeb
AUTHORS: Unify my entry
bwbarrett Oct 13, 2022
eb2efc7
docs: add documentation on ROCm devices
edgargabriel Oct 12, 2022
e25c43a
docs: Minor updates to MPIX man pages
jsquyres Oct 12, 2022
1d70da1
Merge pull request #10921 from jsquyres/pr/v5.0.x/AUTHORS-fix
awlauria Oct 14, 2022
fce9b47
Initialize opal/smsc outside of btl/sm, to enable its use without it
gkatev Oct 6, 2022
3ee4949
Merge pull request #10919 from karasevb/topic/v5.0.x/ucx_vers_warn
awlauria Oct 14, 2022
7f8b119
Merge pull request #10925 from jsquyres/pr/v5.0.x/mpix-man-page-updates
awlauria Oct 14, 2022
5be17e5
Merge pull request #10912 from gkatev/smsc_fix_v5.0.x
awlauria Oct 14, 2022
fa86603
Fix 1 byte overlay in com_method_string: Coverity CID 1515829
Oct 12, 2022
dc069bc
Merge pull request #10931 from drwootton/coverity_bug_1515829_v5
awlauria Oct 17, 2022
9520dc9
opal/event: use epoll by default on Linux
hjelmn Oct 14, 2022
88d6e94
pack external32 long double conversion (extended 80 / quad 128)
markalle May 10, 2021
541a4c0
Update OpenPMIx and PRRTe submodule pointers.
awlauria Oct 17, 2022
a5e85e1
Merge pull request #10934 from awlauria/v5.0.x_prte_pmix_update
awlauria Oct 17, 2022
8f6980b
Merge pull request #10933 from awlauria/v5.0.x_external32
awlauria Oct 17, 2022
b059875
Merge pull request #10911 from wckzhang/v5.0.x
janjust Oct 17, 2022
168f584
Merge pull request #10924 from edgargabriel/pr/rocm-docs-v5.0
janjust Oct 17, 2022
cc283b2
Merge pull request #10932 from hjelmn/use_epoll_in_the_5_0_0_release
awlauria Oct 17, 2022
0d6d028
sessions: fix comm cleanup during mpi_finalize
hppritcha Oct 17, 2022
1f80b6d
Fix resource leak reported by coverity scan report CID 1515816
cniethammer Oct 7, 2022
7d9f2b8
Fix resource leak reported by coverity scan report CID 1515830
cniethammer Oct 4, 2022
dece294
Merge pull request #10945 from cniethammer/pr/v5.0.x/fix-resource-leak
janjust Oct 18, 2022
ed81406
Merge pull request #10944 from cniethammer/fix-resource-leak
awlauria Oct 18, 2022
b12e4ed
Merge pull request #10939 from hppritcha/session_finalize_after_final…
awlauria Oct 18, 2022
3aff737
Update MCA mutexes to use the qthreads ULT backend
janciesko Sep 20, 2022
699828d
Replace the use of opal_output by opal_show_help in supported threadi…
janciesko Sep 22, 2022
20745d8
Merge pull request #10946 from janciesko/qthreads_atomics_5.0.x
awlauria Oct 18, 2022
0519596
opal_check_cuda.m4: Add AC_ARG_WITH for libdir option
wckzhang Oct 18, 2022
efd1276
Merge pull request #10951 from wckzhang/v5.0.x
awlauria Oct 19, 2022
b67b983
config: add accelerator summary to configure output
edgargabriel Oct 18, 2022
a97d810
Merge pull request #10936 from drwootton/external_symbol_pollution
gpaulsen Oct 20, 2022
1ad09b1
Merge pull request #10957 from edgargabriel/pr/accelerator-summary-v5…
gpaulsen Oct 20, 2022
cbfacba
Reduce the overhead of UCX add_procs with intercommunicators
jjhursey Oct 19, 2022
b248eff
Update OpenPMIx and PRRTe submodule pointers.
awlauria Oct 21, 2022
d4efccd
Merge pull request #10966 from awlauria/prte_pmix_update_now
awlauria Oct 21, 2022
4c3dca0
Update news in preparation for v5.0.0rc9.
awlauria Oct 21, 2022
3b748fd
Update VERSION to v5.0.0rc9.
awlauria Oct 21, 2022
faca831
Merge pull request #10968 from awlauria/news_v5.0.x_rc9_
awlauria Oct 21, 2022
47b7197
Merge pull request #10960 from jjhursey/v5-ucx-add-procs-fix
awlauria Oct 24, 2022
e4aad52
Part 2 of resolving global symbol name pollution issue #10708
Sep 27, 2022
de9efb9
oshmem: Fix unsigned type comparisons
gleon99 Oct 13, 2022
436cb73
Merge pull request #10956 from drwootton/symbol_pollution_part2_v5
awlauria Oct 24, 2022
b2545fb
Fix memory leak in ompi_report_comm_methods by removing redundant
Oct 17, 2022
caeab82
Merge pull request #10979 from drwootton/coverity_bug_1515837_v5
awlauria Oct 25, 2022
48266a5
Part 3 of resolving global symbol name pollution #10708 - treematch
Sep 28, 2022
b607a95
Fix potential oob read in mca_coll_han_query_module_from_mca
devreal Oct 21, 2022
5a4ee72
Merge pull request #10982 from devreal/fix-han-mod_id-overflow-5.0.x
gpaulsen Oct 26, 2022
2b638cf
Merge pull request #10981 from open-mpi/symbol_pollution_treematch_v5
gpaulsen Oct 26, 2022
98bfc4b
sharedfp/sm: convert output to use verbosity levels
edgargabriel Oct 24, 2022
409467e
coll/basic: Fix clang warnings
jjhursey Oct 24, 2022
ec3592c
Update OpenPMIx and PRRTe submodule pointers.
awlauria Oct 28, 2022
3284555
Merge pull request #10998 from awlauria/v5.0.x-prte-pmix-up-10-28
awlauria Oct 28, 2022
ea4f102
Merge pull request #10995 from jjhursey/v5-fix-coll-basic-leak
awlauria Oct 31, 2022
ecdb3ba
Merge pull request #10989 from edgargabriel/pr/silence-sharedfp-sm-v5.0
awlauria Oct 31, 2022
58b0ced
lanl-ci: sync v5.0.x yml file with main
hppritcha Mar 25, 2021
b33da77
btl/ofi: fix an uninitialized variable
hppritcha Oct 27, 2022
8f3de5c
doc/mpirun: Fixup the rankfile documentation
jjhursey Oct 28, 2022
2e764e2
Fix incorrect URL in the configure output
jjhursey Oct 28, 2022
f4278b5
Merge pull request #11003 from jjhursey/v5-doc-rankfile
awlauria Nov 1, 2022
0ada41c
Merge pull request #11002 from hppritcha/fix_for_issue_10986_v50x
awlauria Nov 1, 2022
d0de208
Fix potential memory leak in opal/mca/base/mca_base_var_enum.c
Oct 26, 2022
baa3bcd
Fix multiple potential memory leaks in mca_topo_base_cart_allocate
Oct 26, 2022
24b7407
Fix possible memory leak in ompi_dpm_dyn_finalize
Oct 26, 2022
47d9f2c
ofi/common: release opal memory base framework
hppritcha Oct 24, 2022
d4f380a
mpi4: fix init check for a few more functions
hppritcha Oct 27, 2022
2212157
Merge pull request #11000 from hppritcha/lanl_ci_v5.0.x_update
awlauria Nov 1, 2022
57c2d60
docs: trivial typo fix
jsquyres Oct 31, 2022
20a53a4
Merge pull request #11014 from jsquyres/pr/v5.0.x/trivial-docs-fix
awlauria Nov 1, 2022
ce5ddb1
opal_setup_sphinx.m4: make version check better
jsquyres Nov 1, 2022
a4a5d3d
docs: conditionally install wrapper compiler man pages
jsquyres Oct 31, 2022
bf4b6f1
autogen.pl: remove most --no-FOO options
jsquyres Nov 1, 2022
c43f7eb
Merge pull request #11011 from hppritcha/more_funcs_before_init_v5.0.x
awlauria Nov 2, 2022
2b339e7
Merge pull request #11009 from drwootton/dpm_dyn_leak_v5
awlauria Nov 2, 2022
b06b1e4
Merge pull request #11007 from drwootton/mca_base_var_enum_leak_v5
awlauria Nov 2, 2022
6f31889
Merge pull request #11008 from drwootton/topo_base_cart_leak_v5
awlauria Nov 2, 2022
14ca488
Merge pull request #11019 from jsquyres/pr/v5.0.x/make-sphinx-version…
awlauria Nov 2, 2022
920770f
Merge pull request #11021 from jsquyres/pr/v5.0.x/conditionally-insta…
awlauria Nov 2, 2022
3c4b007
Merge pull request #10973 from gleon99/gleon99/shmem-comparisons-5.0.x
awlauria Nov 2, 2022
f0a6867
Merge pull request #11010 from hppritcha/ofi_release_memory_base_fram…
awlauria Nov 2, 2022
8d620a7
OSC/UCX: Adding the following optimzations: 1) Reuse the same worker/…
Aug 4, 2022
a33ec38
Merge pull request #11025 from MamziB/mamzi/single-thread-enhancement…
awlauria Nov 2, 2022
117eab0
Fix resource leak reported by coverity scan report CID 1515756
cniethammer Oct 18, 2022
21b5994
Merge pull request #11027 from cniethammer/pr/v5.0.x/fix-resource-leak
awlauria Nov 3, 2022
8b2a82b
mtl/ofi: Fix FI_HMEM bit check
wckzhang Nov 1, 2022
3e69984
Merge pull request #11039 from wckzhang/v5.0.x
awlauria Nov 7, 2022
9e89875
Fix potential memory leak in opal_init_gethostname
Nov 2, 2022
27ca7c9
doc: Update mpirun map/rank/bind to reflect PRRTE
jjhursey Nov 2, 2022
0466c02
Merge pull request #11041 from drwootton/opal_hostname_leak_v5
awlauria Nov 8, 2022
7991a1d
smsc/xpmem: Fix bound alignment
devreal Mar 16, 2022
cf89e67
smsc/xpmem: retry with page upper bound if aligned range cannot be ma…
devreal Mar 16, 2022
1eda8a0
docs: fix typo
jsquyres Nov 10, 2022
92fd75d
OSC/UCX: Fix coverity issues and missing extend var in nb acc
Nov 3, 2022
4d2e59a
ofi/common: close memory base in an error path
hppritcha Nov 5, 2022
c7d831f
Merge pull request #11051 from jsquyres/pr/v5.0.x/sphinx-docs-typo
awlauria Nov 11, 2022
c2c0c7b
Merge pull request #11050 from devreal/smsc-xpmem-fixes-v5.0.x
awlauria Nov 11, 2022
36309c4
Merge pull request #11044 from jjhursey/v5-doc-mpirun
awlauria Nov 11, 2022
fff0944
Merge pull request #11054 from MamziB/mamzi/osc-coverity-fixes-v5
janjust Nov 14, 2022
4019623
coll/basic CID 1516465: remove bad free
jsquyres Nov 11, 2022
3587835
mca/base CID 1516779: fix possible NULL usage
jsquyres Nov 11, 2022
524e559
mpool/hugepage CID 1516781: fix strtok usage
jsquyres Nov 11, 2022
2226c33
opal/util/ethtool.c: fix interface name len
jsquyres Nov 11, 2022
5bda71e
ompi/instant CID 1516780: remove double unlock
jsquyres Nov 11, 2022
567d31e
coll/base CID 1516784+1516786: fix bad shift
jsquyres Nov 11, 2022
ba72243
Remove function opal_list_insert()
jywangx Nov 12, 2022
fc40853
Coll han: fix allreduce dynamic calling internal han algo on sub_comm
FlorentGermain-Bull Jun 8, 2022
e2d58a0
coll/HAN: Don't DQ HAN dynamic @ intra-node subcomm + typo fixes
gkatev Jun 8, 2022
8b0e609
Coll/han Improvements:
FlorentGermain-Bull Sep 21, 2022
dcf6307
Properly free variables in coll_han_dynamic_file
devreal Oct 26, 2022
425b3eb
Fix access after free in han dynamic rules parser (CID 1516459)
devreal Oct 27, 2022
98c117f
MCA/PML/UCX: Enabled multi_send_nb option by default.
rakhmets Nov 8, 2022
a1192c4
Update man pages to describe error handling in MPI-4
Nov 9, 2022
61e158c
Change top level heading in man pages to upper case for consistency
Nov 15, 2022
8b901c2
mtl: Fix datatype offsetting
wckzhang Nov 15, 2022
358ad03
singleton: add field to opal_process_info
hppritcha Nov 11, 2022
c617784
test/datatype/partial.c: fix compiler warnings
jsquyres Nov 18, 2022
6770e2c
Update OpenPMIx and PRRTE submodule pointers
jsquyres Nov 21, 2022
3c5f2ef
Merge pull request #11096 from jsquyres/pr/v5.0.x/pmix-prte-submodule…
awlauria Nov 22, 2022
a17a7d4
Merge pull request #11043 from FlorentGermain-Bull/coll_han_update_fi…
awlauria Nov 22, 2022
5b39624
Merge pull request #11057 from hppritcha/close_memory_base_in_ofi_err…
awlauria Nov 22, 2022
0a8912a
Merge pull request #11069 from jsquyres/pr/5.0.x/cid-fixes
awlauria Nov 22, 2022
5f65ff6
Merge pull request #11072 from jywangx/pr/v5.0.x/opal_list_bug_fix
awlauria Nov 22, 2022
3cf739a
Merge pull request #11078 from wckzhang/v5.0.x
awlauria Nov 22, 2022
3b2dc8c
Merge pull request #11074 from rakhmets/topic/pml_ucx_multi_send_nb_v5.0
janjust Nov 22, 2022
7717606
Merge pull request #11079 from drwootton/man_error_updates
awlauria Nov 22, 2022
7705221
Merge pull request #11085 from hppritcha/move_singleton_to_opal_v50x
awlauria Nov 22, 2022
11b872c
Merge pull request #11090 from jsquyres/pr/v5.0.x/test-compiler-warni…
awlauria Nov 22, 2022
a5af105
io/romio341: fix support for GCC 4.8 compilers
ggouaillardet Nov 18, 2022
23d5cca
docs: update "Getting help"
jsquyres Nov 19, 2022
6ec559d
oshmem: Fix segment reset
brminich Nov 19, 2022
883a448
singleton: better fix for quieting ofi common
hppritcha Nov 22, 2022
46c845c
singleton: reduce chattiness under slurm
hppritcha Nov 4, 2022
c6b7752
MPI_Comm_create_from_group: fix help message
hppritcha Nov 1, 2022
f676109
Fix undefined file offset in mca_io_ompio_file_seek.
Nov 23, 2022
7addc83
Fix divide by zero error in calculate_num_nodes_up_to_level
Nov 22, 2022
479e4d3
OSC/UCX: Properly releasing the resources and adding
Nov 10, 2022
9793aac
OSC/UCX: Allow nonblocking get_accumulate to be called with results_addr
Nov 24, 2022
7860ba0
OSC/UCX: Fix for issues/11114 (non-debug build failure)
Nov 28, 2022
ffcb755
OSC/UCX: Adding the unpacked rkey counter for dynamic windows
Nov 28, 2022
c82396e
Merge pull request #11121 from MamziB/mamzi/osc-finalize-2-v5
janjust Nov 28, 2022
6448d57
Correctly return error codes in get_dynamic_win_info()
s417-lama Nov 21, 2022
a9ff6bc
Add missing dynamic_lock member in ompi_osc_ucx_state struct
s417-lama Nov 21, 2022
1a25196
Remove unnecessary (and incorrect) dynamic window checking for rget/rput
s417-lama Nov 21, 2022
93fe7ad
Merge pull request #11104 from jsquyres/pr/v5.0.x/docs-getting-help
awlauria Nov 29, 2022
476c6c7
Merge pull request #11105 from brminich/v5.0_ds_reset
awlauria Nov 29, 2022
1eb82f3
Merge pull request #11113 from hppritcha/singleton_noise_reduction_v5…
awlauria Nov 29, 2022
f0cc528
Merge pull request #11117 from hppritcha/fix_comm_create_from_group_h…
awlauria Nov 29, 2022
0ddf6b0
Merge pull request #11119 from drwootton/undefined_eof_offset_v5
awlauria Nov 29, 2022
55503bb
Merge pull request #11120 from drwootton/mpi_bcast_zerodivide_v5
awlauria Nov 29, 2022
9c2418e
Merge pull request #11115 from s417-lama/fix_dynamic_window_v5
janjust Nov 29, 2022
f6157bf
Move debugging section of OpenMPI faq to new section in user document…
Nov 16, 2022
d3a8e0e
Merge pull request #11124 from MamziB/mamzi/nonblocking-acc-dt-v5
janjust Nov 30, 2022
2309743
lanl/ci: fix a typpo
hppritcha Nov 30, 2022
e847e44
coll/ucc: add support for gather(v), scatterv, reduce_scatter
Sergei-Lebedev Nov 18, 2022
e176552
coll/ucc: use size_t to avoid overflow
Sergei-Lebedev Nov 29, 2022
e9e0fce
coll/ucc: add scatter
Sergei-Lebedev Dec 1, 2022
ea3d47b
Merge pull request #11141 from Sergei-Lebedev/coll_ucc_gather_scatter…
janjust Dec 2, 2022
b3f34fe
common/ucx: call opal_progress when waiting for pmix_fence
bureddy Dec 2, 2022
366357c
Merge pull request #11146 from bureddy/v5.0.x
janjust Dec 6, 2022
7da71e3
Merge pull request #11128 from drwootton/v5-faq-debug
janjust Dec 6, 2022
bb1ff91
Merge pull request #11101 from jsquyres/pr/v5.0.x/romio-fix
janjust Dec 6, 2022
db4ea67
Merge pull request #11131 from hppritcha/lanl_ci_minor_v50x_typo_fix
janjust Dec 6, 2022
f8613da
Explicitly test num_nodes and tree_order to avoid divide by zero since
Nov 29, 2022
791a849
Fix memory leak in mca_base_alias_register
Nov 30, 2022
ce96907
Fix memory use after free in dpm_convert
Dec 5, 2022
415abb6
Fix memory leaks due to missing free() on error returns from block where
Nov 30, 2022
a4a1de9
Fix possible invalid memory access in add_string_to_conversion_struct
Dec 5, 2022
a0ebbaf
Fix memory leak in mca_spml_ucx_register
Dec 5, 2022
c6fa679
Properly set 'line' prior to error branch for each failing function c…
Dec 5, 2022
bc2fa81
Fix memory leak in do_recv (memheap_base_mkey.c)
Dec 5, 2022
5dc54bc
Fix possible memory leaks in component_select (osc_sm_component.c)
Dec 5, 2022
a62f50f
Merge pull request #11154 from drwootton/netpatterns_zerodivide_v5
awlauria Dec 6, 2022
6c077b3
Merge pull request #11155 from drwootton/mca_alias_leak_v5
awlauria Dec 6, 2022
0630bde
Merge pull request #11156 from drwootton/dpm_use_after_free_v5
awlauria Dec 6, 2022
67a3cae
Merge pull request #11157 from drwootton/sm_component_leak_v5
awlauria Dec 6, 2022
1c5a198
Merge pull request #11158 from drwootton/hook_memory_overlay_v5
janjust Dec 7, 2022
e84e7ee
Merge pull request #11159 from drwootton/spml_ucx_leak_v5
janjust Dec 7, 2022
6cbd1df
Fix additional memory leaks in component_select
Dec 6, 2022
fd1019c
Merge pull request #11112 from hppritcha/better_ofi_common_quiet_v50x
awlauria Dec 7, 2022
cd10c9c
Merge pull request #11160 from drwootton/alltoallv_invalid_line_v5
awlauria Dec 7, 2022
130019a
Merge pull request #11161 from drwootton/memheap_mkey_leak_v5
awlauria Dec 7, 2022
af13188
Fix double free in opal_common_ucx_wpool_init
Dec 7, 2022
d9cf081
Merge pull request #11162 from drwootton/component_selected_leak_v5
awlauria Dec 8, 2022
cdff221
common/ompio: implement pipelined read and write operation
edgargabriel Dec 5, 2022
34569a9
OSC/UCX: preserve the accumulate ordering for overlapping buffers during
Dec 7, 2022
b4b279f
OSC/UCX: Change the declaration of wpool ctx mutex to recursive. Note
Dec 8, 2022
7a26418
Merge pull request #11186 from MamziB/mamzi/outstanding-nb-acc-v5
janjust Dec 9, 2022
336727f
Merge pull request #11180 from drwootton/wpool_double_free_v5
janjust Dec 9, 2022
41de2c8
Fix memory leak in start_dvm
Dec 7, 2022
897475f
Fix memory leak in mca_base_alias_register
Dec 7, 2022
97582fb
Fix memory leak in mca_coll_han_scatter_intra_simple
Dec 6, 2022
fd509df
fs/lustre: fix assignment of info objects to lustre args
edgargabriel Dec 4, 2022
f4a34a7
Fix missing lock release in ompi_coll_adapt_ibcast_generic
Dec 12, 2022
08a6a8f
accelerator/rocm: fix check_addr function
edgargabriel Dec 13, 2022
907f84f
fixed typo in documentation
Dec 15, 2022
0a51198
Merge pull request #11202 from drwootton/coll_adapt_missing_lock_v5
janjust Dec 15, 2022
8582ecc
Merge pull request #11192 from drwootton/dpm_dvm_leak_v5
janjust Dec 15, 2022
e0d968a
Merge pull request #11194 from drwootton/base_alias_leak_v5
janjust Dec 15, 2022
39362e9
Merge pull request #11195 from drwootton/han_scatter_leak_v5
janjust Dec 15, 2022
ac12c36
Merge pull request #11211 from classicsman/patchv5.0
janjust Dec 15, 2022
b02fc71
Fix invalid access after free in do_recv: Coverity CID 1517308
Dec 7, 2022
8795454
Merge pull request #11172 from drwootton/osc_comp_select_leak_v5
janjust Dec 19, 2022
af2958e
Fix memory leak in dpm_convert (dpm.c)
Dec 12, 2022
ab3a104
Fixing missing lock release in mca_pml_ob1_record_htod_event
Dec 12, 2022
3d9abd2
Fix missing lock release in oshmem_proc_group_create
Dec 12, 2022
fd26b15
Fix memory leak in mca_coll_han_init_dynamic_rules: Coverity CID 1516452
Dec 13, 2022
4b97bd0
Fix uninitialized pointer in mca_smpl_ucx_register
Dec 13, 2022
0956585
Merge pull request #11223 from jjhursey/v5-cp-11208
janjust Dec 20, 2022
189101e
Merge pull request #11219 from jjhursey/v5-cp-11196
janjust Dec 20, 2022
c726177
Merge pull request #11220 from jjhursey/v5-cp-11198
janjust Dec 20, 2022
efb1beb
Merge pull request #11221 from jjhursey/v5-cp-11199
janjust Dec 20, 2022
b186193
Merge pull request #11222 from jjhursey/v5-cp-11203
janjust Dec 20, 2022
39e379c
Merge pull request #11218 from jjhursey/v5-cp-11170
janjust Dec 20, 2022
641e809
Fix singleton operations
rhc54 Dec 18, 2022
61fc6bd
Merge pull request #11227 from rhc54/cmr50/sing
janjust Dec 20, 2022
ee0a0d4
Fix memory leak in mca_btl_tcp_proc_handle_modex_addresses
Dec 7, 2022
0706610
Increment the PMIx/PRRTE submodule pointers
rhc54 Dec 21, 2022
b37b8db
update bml.h: BML is short for BTL Management Layer instead of BML Ma…
jo-pillar Dec 23, 2022
48b1197
opal/common/ofi: add net to provider exclude list
wzamazon Dec 22, 2022
9e4c9cd
Merge pull request #11230 from jjhursey/v5-pr-11179
janjust Jan 3, 2023
8d5b97f
Merge pull request #11232 from rhc54/cmr50/ptr
awlauria Jan 4, 2023
d80c428
pml/ucx: move pmix finalize to the end of ompi_rte_finalize()
Dec 20, 2022
bf24f0f
LANL/CI: workaround for aocc module
hppritcha Jan 3, 2023
9522acd
Merge pull request #11185 from edgargabriel/pr/ompio-pipeline-read-wr…
gpaulsen Jan 6, 2023
c1b995a
Merge pull request #11197 from edgargabriel/pr/lustre-info-swap-v5.0
gpaulsen Jan 6, 2023
26246f3
Merge pull request #11209 from edgargabriel/pr/rocm-check-addr-fix-v5.0
gpaulsen Jan 6, 2023
8dbfd1a
Merge pull request #11249 from jo-pillar/v5.0.x
gpaulsen Jan 6, 2023
a6b1327
Merge pull request #11252 from wzamazon/v5.0.x_exclude_net_provider
gpaulsen Jan 6, 2023
1b10b9b
Merge pull request #11268 from hppritcha/lanl_ci_fix_for_aocc_module_…
gpaulsen Jan 6, 2023
dcba3d8
OSC/UCX: avoid creating ucp context if osc init is not called by
Dec 21, 2022
625d9fe
Merge pull request #11281 from MamziB/mamzi/osc-ucx-support-level-v5
janjust Jan 10, 2023
9643281
Merge pull request #11263 from MamziB/mamzi/pmix-finalize-v5
janjust Jan 10, 2023
f858cf2
accelerator/cuda: Add delayed initialization logic
wckzhang Dec 28, 2022
d7ed103
accelerator/cuda: Return OPAL error codes instead of CUresult
wckzhang Jan 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
18 changes: 3 additions & 15 deletions .ci/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,4 @@
# Open MPI Continuous Integration (CI) Services
## Mellanox Open MPI CI
### Scope
[Mellanox](https://www.mellanox.com/) Open MPI CI is intended to verify Open MPI with recent Mellanox SW components ([Mellanox OFED](https://www.mellanox.com/page/products_dyn?product_family=26), [UCX](https://www.mellanox.com/page/products_dyn?product_family=281&mtag=ucx) and other [HPC-X](https://www.mellanox.com/page/products_dyn?product_family=189&mtag=hpc-x) components) in the Mellanox lab environment.
Top-level directory for CI tests.

CI is managed by [Azure Pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/?view=azure-devops) service.

Mellanox Open MPI CI includes:
* Open MPI building with internal stable engineering versions of UCX and HCOLL. The building is run in Docker-based environment.
* Sanity functional testing.
### How to Run CI
Mellanox Open MPI CI is triggered upon the following events:
* Create a pull request (PR). CI status is visible in the PR status. CI is restarted automatically upon each new commit within the PR. CI status and log files are also available on the Azure DevOps server.
* Trigger CI with special PR comments (for example, `/azp run`). Comment triggers are available only if the comment author has write permission to the PR target repo. Detailed information about comment triggers is available in the official Azure DevOps [documentation](https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/github?view=azure-devops&tabs=yaml#comment-triggers).
### Support
In case of any issues, questions or suggestions please contact to [Mellanox Open MPI CI support team](mailto:[email protected]).
Feel free to make your own subdirectory (e.g., for your organization)
and put CI tests and supporting infrastructure here.
301 changes: 301 additions & 0 deletions .ci/lanl/gitlab-darwin-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
variables:
SCHEDULER_PARAMETERS: "-pgeneral -t 4:00:00 -N 1 --ntasks-per-node=16"
GIT_STRATEGY: clone
NPROCS: 4

stages:
- build
- test

build:intel:
stage: build
tags: [darwin-slurm-shared]
script:
- module load intel/2022.0.1
- rm .gitmodules
- cp $GITSUBMODULEPATCH .gitmodules
- git submodule update --init --recursive
- ./autogen.pl
- ./configure CC=icx FC=ifx CXX=icpx --prefix=$PWD/install_test --with-libevent=internal
- make -j 8 install
- make check
- export PATH=$PWD/install_test/bin:$PATH
- cd examples
- make
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
untracked: true
paths:
- examples
- install_test
expire_in: 1 week

build:ibm:
stage: build
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-ppower9 -t 4:00:00 -N 1 --ntasks-per-node=16"
script:
- module load ibm
- rm .gitmodules
- cp $GITSUBMODULEPATCH .gitmodules
- git submodule update --init --recursive
- ./autogen.pl
- ./configure CC=xlc FC=xlf CXX=xlc++ --prefix=$PWD/install_test --with-libevent=internal
- make -j 8 install
- make check
- export PATH=$PWD/install_test/bin:$PATH
- cd examples
- make
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
untracked: true
paths:
- examples
- install_test
expire_in: 1 week

build:amd:
stage: build
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-pamd-rome -t 4:00:00 -N 1 --ntasks-per-node=16"
script:
- module load aocc/3.0.0
- rm .gitmodules
- cp $GITSUBMODULEPATCH .gitmodules
- git submodule update --init --recursive
- ./autogen.pl
- ./configure CC=clang FC=flang CXX=clang++ --prefix=$PWD/install_test --with-libevent=internal LIBS="-lucm -lucs"
- make -j 8 install
- make check
- export PATH=$PWD/install_test/bin:$PATH
- cd examples
- make
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
untracked: true
paths:
- examples
- install_test
expire_in: 1 week

build:gnu:
stage: build
tags: [darwin-slurm-shared]
script:
- module load gcc
- rm .gitmodules
- cp $GITSUBMODULEPATCH .gitmodules
- git submodule update --init --recursive
- ./autogen.pl
- ./configure --prefix=$PWD/install_test --with-libevent=internal
- make -j 8 install
- make check
- export PATH=$PWD/install_test/bin:$PATH
- cd examples
- make
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
untracked: true
paths:
- examples
- install_test
expire_in: 1 week

build:cce:
stage: build
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-pcrossroads-dev -t 4:00:00 -N 1 --ntasks-per-node=16"
script:
- hostname
- module use --append /opt/cray/pe/modulefiles
- module avail cce
- module load cce
- module unload libsci
- module unload cray-mvapich2_nogpu
- rm .gitmodules
- cp $GITSUBMODULEPATCH .gitmodules
- git submodule update --init --recursive
- ./autogen.pl
- ./configure CC=cc FTN=ftn CXX=CC --with-ucx=no --prefix=$PWD/install_test --with-libevent=internal
- make -j 8 install
- make check
- export PATH=$PWD/install_test/bin:$PATH
- cd examples
- make
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
untracked: true
paths:
- examples
- install_test
expire_in: 1 week


test:intel:
stage: test
tags: [darwin-slurm-shared]
dependencies:
- build:intel
needs: ["build:intel"]
script:
- pwd
- ls
- module load intel/2022.0.1
- export PATH=$PWD/install_test/bin:$PATH
- which mpirun
- cd examples
- mpirun -np 4 hostname
- mpirun -np 4 ./hello_c
- mpirun -np 4 ./ring_c
- mpirun -np 4 ./hello_mpifh
- mpirun -np 4 ./ring_mpifh
- mpirun -np 4 ./hello_usempi
- mpirun -np 4 ./ring_usempi
- mpirun -np 4 ./hello_usempif08
- mpirun -np 4 ./ring_usempif08
- mpirun -np 4 ./connectivity_c
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
expire_in: 1 week

test:ibm:
stage: test
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-ppower9 -t 2:00:00 -N 1 --ntasks-per-node=16"
dependencies:
- build:ibm
needs: ["build:ibm"]
script:
- pwd
- ls
- module load ibm
- export PATH=$PWD/install_test/bin:$PATH
- which mpirun
- pushd examples
- mpirun -np 4 hostname
- mpirun -np 4 ./hello_c
- mpirun -np 4 ./ring_c
- mpirun -np 4 ./hello_mpifh
- mpirun -np 4 ./ring_mpifh
- mpirun -np 4 ./hello_usempi
- mpirun -np 4 ./ring_usempi
- mpirun -np 4 ./hello_usempif08
- mpirun -np 4 ./ring_usempif08
- mpirun -np 4 ./connectivity_c
- popd
- mkdir osu-tests
- pushd osu-tests
- cp -p -r $OSU_TESTS_FOLDER/* .
- ./configure CC=mpicc FC=mpifort F77=mpifort CXX=mpiCC && make -j 8 clean && make -j 8
- pushd mpi/pt2pt
- mpirun -np 2 ./osu_latency
- mpirun -np 2 ./osu_latency D H
- mpirun -np 2 ./osu_latency H D
- mpirun -np 2 ./osu_latency H H
- mpirun -np 2 ./osu_bw
- mpirun -np 2 ./osu_bw D H
- mpirun -np 2 ./osu_bw H D
- mpirun -np 2 ./osu_bw H H
- mpirun -np 2 ./osu_bibw
- mpirun -np 2 ./osu_bibw D H
- mpirun -np 2 ./osu_bibw H D
- mpirun -np 2 ./osu_bibw H H
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
expire_in: 1 week

test:amd:
stage: test
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-pamd-rome -t 2:00:00 -N 1 --ntasks-per-node=16"
dependencies:
- build:amd
needs: ["build:amd"]
script:
- pwd
- ls
- module load aocc/3.0.0
- export PATH=$PWD/install_test/bin:$PATH
- export LD_LIBRARY_PATH=$PWD/install_test/lib:$LD_LIBRARY_PATH
- which mpirun
- cd examples
- mpirun -np 4 hostname
- mpirun -np 4 ./hello_c
- mpirun -np 4 ./ring_c
- mpirun -np 4 ./hello_mpifh
- mpirun -np 4 ./ring_mpifh
- mpirun -np 4 ./hello_usempi
- mpirun -np 4 ./ring_usempi
- mpirun -np 4 ./hello_usempif08
- mpirun -np 4 ./ring_usempif08
- mpirun -np 4 ./connectivity_c
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
expire_in: 1 week

test:gnu:
stage: test
tags: [darwin-slurm-shared]
dependencies:
- build:gnu
needs: ["build:gnu"]
script:
- pwd
- ls
- module load gcc
- export PATH=$PWD/install_test/bin:$PATH
- which mpirun
- cd examples
- mpirun -np 4 hostname
- mpirun -np 4 ./hello_c
- mpirun -np 4 ./ring_c
- mpirun -np 4 ./hello_mpifh
- mpirun -np 4 ./ring_mpifh
- mpirun -np 4 ./hello_usempi
- mpirun -np 4 ./ring_usempi
- mpirun -np 4 ./hello_usempif08
- mpirun -np 4 ./ring_usempif08
- mpirun -np 4 ./connectivity_c
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
expire_in: 1 week

test:cce:
stage: test
tags: [darwin-slurm-shared]
variables:
SCHEDULER_PARAMETERS: "-pcrossroads-dev -t 4:00:00 -N 1 --ntasks-per-node=16"
dependencies:
- build:cce
needs: ["build:cce"]
script:
- pwd
- ls
- hostname
- module use --append /opt/cray/pe/modulefiles
- module avail cce
- module load cce
- module unload libsci
- module unload cray-mvapich2_nogpu
- export PATH=$PWD/install_test/bin:$PATH
- which mpirun
- cd examples
- mpirun -np 4 hostname
- mpirun -np 4 ./hello_c
- mpirun -np 4 ./ring_c
- mpirun -np 4 ./hello_mpifh
- mpirun -np 4 ./ring_mpifh
- mpirun -np 4 ./hello_usempi
- mpirun -np 4 ./ring_usempi
- mpirun -np 4 ./hello_usempif08
- mpirun -np 4 ./ring_usempif08
- mpirun -np 4 ./connectivity_c
artifacts:
name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"
expire_in: 1 week

16 changes: 16 additions & 0 deletions .ci/mellanox/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Open MPI Continuous Integration (CI) Services
## Mellanox Open MPI CI
### Scope
[Mellanox](https://www.mellanox.com/) Open MPI CI is intended to verify Open MPI with recent Mellanox SW components ([Mellanox OFED](https://www.mellanox.com/page/products_dyn?product_family=26), [UCX](https://www.mellanox.com/page/products_dyn?product_family=281&mtag=ucx) and other [HPC-X](https://www.mellanox.com/page/products_dyn?product_family=189&mtag=hpc-x) components) in the Mellanox lab environment.

CI is managed by [Azure Pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/?view=azure-devops) service.

Mellanox Open MPI CI includes:
* Open MPI building with internal stable engineering versions of UCX and HCOLL. The building is run in Docker-based environment.
* Sanity functional testing.
### How to Run CI
Mellanox Open MPI CI is triggered upon the following events:
* Create a pull request (PR). CI status is visible in the PR status. CI is restarted automatically upon each new commit within the PR. CI status and log files are also available on the Azure DevOps server.
* Trigger CI with special PR comments (for example, `/azp run`). Comment triggers are available only if the comment author has write permission to the PR target repo. Detailed information about comment triggers is available in the official Azure DevOps [documentation](https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/github?view=azure-devops&tabs=yaml#comment-triggers).
### Support
In case of any issues, questions or suggestions please contact to [Mellanox Open MPI CI support team](mailto:[email protected]).
2 changes: 1 addition & 1 deletion .ci/mellanox/azure-pipelines.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
trigger: none
pr:
- master
- main
- v*.*.x

pool:
Expand Down
Loading