Skip to content

Commit d55b666

Browse files
authored
Topic/monitoring (#3109)
Add a monitoring PML, OSC and IO. They track all data exchanges between processes, with capability to include or exclude collective traffic. The monitoring infrastructure is driven using MPI_T, and can be tuned of and on any time o any communicators/files/windows. Documentations and examples have been added, as well as a shared library that can be used with LD_PRELOAD and that allows the monitoring of any application. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Clement Foyer <[email protected]> * add ability to querry pml monitorinting results with MPI Tools interface using performance variables "pml_monitoring_messages_count" and "pml_monitoring_messages_size" Signed-off-by: George Bosilca <[email protected]> * Fix a convertion problem and add a comment about the lack of component retain in the new component infrastructure. Signed-off-by: George Bosilca <[email protected]> * Allow the pvar to be written by invoking the associated callback. Signed-off-by: George Bosilca <[email protected]> * Various fixes for the monitoring. Allocate all counting arrays in a single allocation Don't delay the initialization (do it at the first add_proc as we know the number of processes in MPI_COMM_WORLD) Add a choice: with or without MPI_T (default). Signed-off-by: George Bosilca <[email protected]> * Cleanup for the monitoring module. Fixed few bugs, and reshape the operations to prepare for global or communicator-based monitoring. Start integrating support for MPI_T as well as MCA monitoring. Signed-off-by: George Bosilca <[email protected]> * Adding documentation about how to use pml_monitoring component. Document present the use with and without MPI_T. May not reflect exactly how it works right now, but should reflects how it should work in the end. Signed-off-by: Clement Foyer <[email protected]> * Change rank into MPI_COMM_WORLD and size(MPI_COMM_WORLD) to global variables in pml_monitoring.c. Change mca_pml_monitoring_flush() signature so we don't need the size and rank parameters. Signed-off-by: George Bosilca <[email protected]> * Improve monitoring support (including integration with MPI_T) Use mca_pml_monitoring_enable to check status state. Set mca_pml_monitoring_current_filename iif parameter is set Allow 3 modes for pml_monitoring_enable_output: - 1 : stdout; - 2 : stderr; - 3 : filename Fix test : 1 for differenciated messages, >1 for not differenciated. Fix output. Add documentation for pml_monitoring_enable_output parameter. Remove useless parameter in example Set filename only if using mpi tools Adding missing parameters for fprintf in monitoring_flush (for output in std's cases) Fix expected output/results for example header Fix exemple when using MPI_Tools : a null-pointer can't be passed directly. It needs to be a pointer to a null-pointer Base whether to output or not on message count, in order to print something if only empty messages are exchanged Add a new example on how to access performance variables from within the code Allocate arrays regarding value returned by binding Signed-off-by: Clement Foyer <[email protected]> * Add overhead benchmark, with script to use data and create graphs out of the results Signed-off-by: Clement Foyer <[email protected]> * Fix segfault error at end when not loading pml Signed-off-by: Clement Foyer <[email protected]> * Start create common monitoring module. Factorise version numbering Signed-off-by: Clement Foyer <[email protected]> * Fix microbenchmarks script Signed-off-by: Clement Foyer <[email protected]> * Improve readability of code NULL can't be passed as a PVAR parameter value. It must be a pointer to NULL or an empty string. Signed-off-by: Clement Foyer <[email protected]> * Add osc monitoring component Signed-off-by: Clement Foyer <[email protected]> * Add error checking if running out of memory in osc_monitoring Signed-off-by: Clement Foyer <[email protected]> * Resolve brutal segfault when double freeing filename Signed-off-by: Clement Foyer <[email protected]> * Moving to ompi/mca/common the proper parts of the monitoring system Using common functions instead of pml specific one. Removing pml ones. Signed-off-by: Clement Foyer <[email protected]> * Add calls to record monitored data from osc. Use common function to translate ranks. Signed-off-by: Clement Foyer <[email protected]> * Fix test_overhead benchmark script distribution Signed-off-by: Clement Foyer <[email protected]> * Fix linking library with mca/common Signed-off-by: Clement Foyer <[email protected]> * Add passive operations in monitoring_test Signed-off-by: Clement Foyer <[email protected]> * Fix from rank calculation. Add more detailed error messages Signed-off-by: Clement Foyer <[email protected]> * Fix alignments. Fix common_monitoring_get_world_rank function. Remove useless trailing new lines Signed-off-by: Clement Foyer <[email protected]> * Fix osc_monitoring mget_message_count function call Signed-off-by: Clement Foyer <[email protected]> * Change common_monitoring function names to respect the naming convention. Move to common_finalize the common parts of finalization. Add some comments. Signed-off-by: Clement Foyer <[email protected]> * Add monitoring common output system Signed-off-by: Clement Foyer <[email protected]> * Add error message when trying to flush to a file, and open fails. Remove erroneous info message when flushing wereas the monitoring is already disabled. Signed-off-by: Clement Foyer <[email protected]> * Consistent output file name (with and without MPI_T). Signed-off-by: Clement Foyer <[email protected]> * Always output to a file when flushing at pvar_stop(flush). Signed-off-by: Clement Foyer <[email protected]> * Update the monitoring documentation. Complete informations from HowTo. Fix a few mistake and typos. Signed-off-by: Clement Foyer <[email protected]> * Use the world_rank for printf's. Fix name generation for output files when using MPI_T. Minor changes in benchmarks starting script Signed-off-by: Clement Foyer <[email protected]> * Clean potential previous runs, but keep the results at the end in order to potentially reprocess the data. Add comments. Signed-off-by: Clement Foyer <[email protected]> * Add security check for unique initialization for osc monitoring Signed-off-by: Clement Foyer <[email protected]> * Clean the amout of symbols available outside mca/common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Remove use of __sync_* built-ins. Use opal_atomic_* instead. Signed-off-by: Clement Foyer <[email protected]> * Allocate the hashtable on common/monitoring component initialization. Define symbols to set the values for error/warning/info verbose output. Use opal_atomic instead of built-in function in osc/monitoring template initialization. Signed-off-by: Clement Foyer <[email protected]> * Deleting now useless file : moved to common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Add histogram ditribution of message sizes Signed-off-by: Clement Foyer <[email protected]> * Add histogram array of 2-based log of message sizes. Use simple call to reset/allocate arrays in common_monitoring.c Signed-off-by: Clement Foyer <[email protected]> * Add informations in dumping file. Separate per category (pt2pt/osc/coll (to come)) monitored data Signed-off-by: Clement Foyer <[email protected]> * Add coll component for collectives communications monitoring Signed-off-by: Clement Foyer <[email protected]> * Fix warning messages : use c_name as the magic id is not always defined. Moreover, there was a % missing. Add call to release underlying modules. Add debug info messages. Add warning which may lead to further analysis. Signed-off-by: Clement Foyer <[email protected]> * Fix log10_2 constant initialization. Fix index calculation for histogram array. Signed-off-by: Clement Foyer <[email protected]> * Add debug info messages to follow more easily initialization steps. Signed-off-by: Clement Foyer <[email protected]> * Group all the var/pvar definitions to common_monitoring. Separate initial filename from the current on, to ease its lifetime management. Add verifications to ensure common is initialized once only. Move state variable management to common_monitoring. monitoring_filter only indicates if filtering is activated. Fix out of range access in histogram. List is not used with the struct mca_monitoring_coll_data_t, so heritate only from opal_object_t. Remove useless dead code. Signed-off-by: Clement Foyer <[email protected]> * Fix invalid memory allocation. Initialize initial_filename to empty string to avoid invalid read in mca_base_var_register. Signed-off-by: Clement Foyer <[email protected]> * Don't install the test scripts. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Clement Foyer <[email protected]> * Fix missing procs in hashtable. Cache coll monitoring data. * Add MCA_PML_BASE_FLAG_REQUIRE_WORLD flag to the PML layer. * Cache monitoring data relative to collectives operations on creation. * Remove double caching. * Use same proc name definition for hash table when inserting and when retrieving. Signed-off-by: Clement Foyer <[email protected]> * Use intermediate variable to avoid invalid write while retrieving ranks in hashtable. Signed-off-by: Clement Foyer <[email protected]> * Add missing release of the last element in flush_all. Add release of the hashtable in finalize. Signed-off-by: Clement Foyer <[email protected]> * Use a linked list instead of a hashtable to keep tracks of communicator data. Add release of the structure at finalize time. Signed-off-by: Clement Foyer <[email protected]> * Set world_rank from hashtable only if found Signed-off-by: Clement Foyer <[email protected]> * Use predefined symbol from opal system to print int Signed-off-by: Clement Foyer <[email protected]> * Move collective monitoring data to a hashtable. Add pvar to access the monitoring_coll_data. Move functions header to a private file only to be used in ompi/mca/common/monitoring Signed-off-by: Clement Foyer <[email protected]> * Fix pvar registration. Use OMPI_ERROR isntead of -1 as returned error value. Fix releasing of coll_data_t objects. Affect value only if data is found in the hashtable. Signed-off-by: Clement Foyer <[email protected]> * Add automated check (with MPI_Tools) of monitoring. Signed-off-by: Clement Foyer <[email protected]> * Fix procs list caching in common_monitoring_coll_data_t * Fix monitoring_coll_data type definition. * Use size(COMM_WORLD)-1 to determine max number of digits. Signed-off-by: Clement Foyer <[email protected]> * Add linking to Fortran applications for LD_PRELOAD usage of monitoring_prof Signed-off-by: Clement Foyer <[email protected]> * Add PVAR's handles. Clean up code (visibility, add comments...). Start updating the documentation Signed-off-by: Clement Foyer <[email protected]> * Fix coll operations monitoring. Update check_monitoring accordingly to the added pvar. Fix monitoring array allocation. Signed-off-by: Clement Foyer <[email protected]> * Documentation update. Update and then move the latex and README documentation to a more logical place Signed-off-by: Clement Foyer <[email protected]> * Aggregate monitoring COLL data to the generated matrix. Update documentation accordingly. Signed-off-by: Clement Foyer <[email protected]> * Fix monitoring_prof (bad variable.vector used, and wrong array in PMPI_Gather). Signed-off-by: Clement Foyer <[email protected]> * Add reduce_scatter and reduce_scatter_block monitoring. Reduce memory footprint of monitoring_prof. Unify OSC related outputs. Signed-off-by: Clement Foyer <[email protected]> * Add the use of a machine file for overhead benchmark Signed-off-by: Clement Foyer <[email protected]> * Check for out-of-bound write in histogram Signed-off-by: Clement Foyer <[email protected]> * Fix common_monitoring_cache object init for MPI_COMM_WORLD Signed-off-by: Clement Foyer <[email protected]> * Add RDMA benchmarks to test_overhead Add error file output. Add MPI_Put and MPI_Get results analysis. Add overhead computation for complete sending (pingpong / 2). Signed-off-by: Clement Foyer <[email protected]> * Add computation of average and median of overheads. Add comments and copyrigths to the test_overhead script Signed-off-by: Clement Foyer <[email protected]> * Add technical documentation Signed-off-by: Clement Foyer <[email protected]> * Adapt to the new definition of communicators Signed-off-by: Clement Foyer <[email protected]> * Update expected output in test/monitoring/monitoring_test.c Signed-off-by: Clement Foyer <[email protected]> * Add dumping histogram in edge case Signed-off-by: Clement Foyer <[email protected]> * Adding a reduce(pml_monitoring_messages_count, MPI_MAX) example Signed-off-by: Clement Foyer <[email protected]> * Add consistency in header inclusion. Include ompi/mpi/fortran/mpif-h/bindings.h only if needed. Add sanity check before emptying hashtable. Fix typos in documentation. Signed-off-by: Clement Foyer <[email protected]> * misc monitoring fixes * test/monitoring: fix test when weak symbols are not available * monitoring: fix a typo and add a missing file in Makefile.am and have monitoring_common.h and monitoring_common_coll.h included in the distro * test/monitoring: cleanup all tests and make distclean a happy panda * test/monitoring: use gettimeofday() if clock_gettime() is unavailable * monitoring: silence misc warnings (#3) Signed-off-by: Gilles Gouaillardet <[email protected]> * Cleanups. Signed-off-by: George Bosilca <[email protected]> * Changing int64_t to size_t. Keep the size_t used accross all monitoring components. Adapt the documentation. Remove useless MPI_Request and MPI_Status from monitoring_test.c. Signed-off-by: Clement Foyer <[email protected]> * Add parameter for RMA test case Signed-off-by: Clement Foyer <[email protected]> * Clean the maximum bound computation for proc list dump. Use ptrdiff_t instead of OPAL_PTRDIFF_TYPE to reflect the changes from commit fa5cd0d. Signed-off-by: Clement Foyer <[email protected]> * Add communicator-specific monitored collective data reset Signed-off-by: Clement Foyer <[email protected]> * Add monitoring scripts to the 'make dist' Also install them in the build and the install directories. Signed-off-by: George Bosilca <[email protected]>
1 parent b1e639e commit d55b666

File tree

65 files changed

+8216
-684
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+8216
-684
lines changed

configure.ac

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1409,6 +1409,10 @@ AC_CONFIG_FILES([
14091409
test/util/Makefile
14101410
])
14111411
m4_ifdef([project_ompi], [AC_CONFIG_FILES([test/monitoring/Makefile])])
1412+
m4_ifdef([project_ompi], [
1413+
m4_ifdef([MCA_BUILD_ompi_pml_monitoring_DSO_TRUE],
1414+
[AC_CONFIG_LINKS(test/monitoring/profile2mat.pl:test/monitoring/profile2mat.pl
1415+
test/monitoring/aggregate_profile.pl:test/monitoring/aggregate_profile.pl)])])
14121416
14131417
AC_CONFIG_FILES([contrib/dist/mofed/debian/rules],
14141418
[chmod +x contrib/dist/mofed/debian/rules])

ompi/mca/coll/base/coll_base_find_available.c

Lines changed: 19 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
33
* University Research and Technology
44
* Corporation. All rights reserved.
5-
* Copyright (c) 2004-2005 The University of Tennessee and The University
5+
* Copyright (c) 2004-2017 The University of Tennessee and The University
66
* of Tennessee Research Foundation. All rights
77
* reserved.
88
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
@@ -46,9 +46,6 @@
4646
static int init_query(const mca_base_component_t * ls,
4747
bool enable_progress_threads,
4848
bool enable_mpi_threads);
49-
static int init_query_2_0_0(const mca_base_component_t * ls,
50-
bool enable_progress_threads,
51-
bool enable_mpi_threads);
5249

5350
/*
5451
* Scan down the list of successfully opened components and query each of
@@ -105,6 +102,20 @@ int mca_coll_base_find_available(bool enable_progress_threads,
105102
}
106103

107104

105+
/*
106+
* Query a specific component, coll v2.0.0
107+
*/
108+
static inline int
109+
init_query_2_0_0(const mca_base_component_t * component,
110+
bool enable_progress_threads,
111+
bool enable_mpi_threads)
112+
{
113+
mca_coll_base_component_2_0_0_t *coll =
114+
(mca_coll_base_component_2_0_0_t *) component;
115+
116+
return coll->collm_init_query(enable_progress_threads,
117+
enable_mpi_threads);
118+
}
108119
/*
109120
* Query a component, see if it wants to run at all. If it does, save
110121
* some information. If it doesn't, close it.
@@ -138,33 +149,11 @@ static int init_query(const mca_base_component_t * component,
138149
}
139150

140151
/* Query done -- look at the return value to see what happened */
141-
142-
if (OMPI_SUCCESS != ret) {
143-
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
144-
"coll:find_available: coll component %s is not available",
145-
component->mca_component_name);
146-
} else {
147-
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
148-
"coll:find_available: coll component %s is available",
149-
component->mca_component_name);
150-
}
151-
152-
/* All done */
152+
opal_output_verbose(10, ompi_coll_base_framework.framework_output,
153+
"coll:find_available: coll component %s is %savailable",
154+
component->mca_component_name,
155+
(OMPI_SUCCESS == ret) ? "": "not ");
153156

154157
return ret;
155158
}
156159

157-
158-
/*
159-
* Query a specific component, coll v2.0.0
160-
*/
161-
static int init_query_2_0_0(const mca_base_component_t * component,
162-
bool enable_progress_threads,
163-
bool enable_mpi_threads)
164-
{
165-
mca_coll_base_component_2_0_0_t *coll =
166-
(mca_coll_base_component_2_0_0_t *) component;
167-
168-
return coll->collm_init_query(enable_progress_threads,
169-
enable_mpi_threads);
170-
}

ompi/mca/coll/monitoring/Makefile.am

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#
2+
# Copyright (c) 2016 Inria. All rights reserved.
3+
# $COPYRIGHT$
4+
#
5+
# Additional copyrights may follow
6+
#
7+
# $HEADER$
8+
#
9+
10+
monitoring_sources = \
11+
coll_monitoring.h \
12+
coll_monitoring_allgather.c \
13+
coll_monitoring_allgatherv.c \
14+
coll_monitoring_allreduce.c \
15+
coll_monitoring_alltoall.c \
16+
coll_monitoring_alltoallv.c \
17+
coll_monitoring_alltoallw.c \
18+
coll_monitoring_barrier.c \
19+
coll_monitoring_bcast.c \
20+
coll_monitoring_component.c \
21+
coll_monitoring_exscan.c \
22+
coll_monitoring_gather.c \
23+
coll_monitoring_gatherv.c \
24+
coll_monitoring_neighbor_allgather.c \
25+
coll_monitoring_neighbor_allgatherv.c \
26+
coll_monitoring_neighbor_alltoall.c \
27+
coll_monitoring_neighbor_alltoallv.c \
28+
coll_monitoring_neighbor_alltoallw.c \
29+
coll_monitoring_reduce.c \
30+
coll_monitoring_reduce_scatter.c \
31+
coll_monitoring_reduce_scatter_block.c \
32+
coll_monitoring_scan.c \
33+
coll_monitoring_scatter.c \
34+
coll_monitoring_scatterv.c
35+
36+
if MCA_BUILD_ompi_coll_monitoring_DSO
37+
component_noinst =
38+
component_install = mca_coll_monitoring.la
39+
else
40+
component_noinst = libmca_coll_monitoring.la
41+
component_install =
42+
endif
43+
44+
mcacomponentdir = $(ompilibdir)
45+
mcacomponent_LTLIBRARIES = $(component_install)
46+
mca_coll_monitoring_la_SOURCES = $(monitoring_sources)
47+
mca_coll_monitoring_la_LDFLAGS = -module -avoid-version
48+
mca_coll_monitoring_la_LIBADD = \
49+
$(OMPI_TOP_BUILDDIR)/ompi/mca/common/monitoring/libmca_common_monitoring.la
50+
51+
noinst_LTLIBRARIES = $(component_noinst)
52+
libmca_coll_monitoring_la_SOURCES = $(monitoring_sources)
53+
libmca_coll_monitoring_la_LDFLAGS = -module -avoid-version

0 commit comments

Comments
 (0)