Skip to content

symbol name pollution #3258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 12, 2017
Merged

Conversation

markalle
Copy link
Contributor

Adding a test into "make check" that uses "nm" to look at the symbol names in the main libraries (libmpi, libmpi_fh, libmpi_usempi, libopen-rte, libopen-pal) that user apps are directly/indirectly linked against. It expects everything to have some prefix like ompi_, opal_, orte_ etc.

The second commit in this pull request adds "ompi_" onto a handful of symbols the test identified as bad. There are still a couple categories I didn't change that I found iffy: mca_, netpatterns_, and a few that seem related to libevent. I left those pieces untouched and for now have them being accepted by the testcase.

The third commit is just a minor enhancement to update-my-copyright.pl to add a --manual-list=file option.

@rhc54
Copy link
Contributor

rhc54 commented Mar 29, 2017

Ummm...some of these changes aren't correct. The symbol should have just been made static instead of being left global and prefixed.

@markalle
Copy link
Contributor Author

Ah yes, in general I erred on the side of leaving things global and adding prefixes. There were only a couple symbols where I switched it to static. I figured this way is less error-prone, but it is possible to add more statics instead.

Would you still go static for symbols that exist in two files, foo.h and foo.c? To my eye that would be a symbol that could potentially later be used from another file even if it's only used from one file now.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markalle Thanks for doing all this cleanup. I'd marginally prefer more statics than prefixing, just because we can always make something non-static later. I.e., I'd err on the side of less publics.

@@ -89,28 +90,28 @@ static int extent_intercept_fn(MPI_Datatype type_c, MPI_Aint *file_extent,
/* Data structure passed to the intercepts (see below). It is an OPAL
list_item_t so that we can clean this memory up during
MPI_FINALIZE. */
typedef struct intercept_extra_state {
typedef struct ompi_intercept_extra_state {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised that this type name would be exposed externally. Are we using it outside of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that one actually came from
OBJ_CLASS_DECLARATION(intercept_extra_state_t);
and maybe I could have done a more targeted change but I figured all the OBJ_ macros probably work together and would all likely need the same name change. Another option is to fix it at the OBJ_CLASS_DECLARATION macro:

#define OBJ_CLASS_DECLARATION(NAME)
extern opal_class_t NAME ## _class

maybe that should be extern opal_class_t opal_ ## NAME ## _class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, I think OBJ_CLASS OBJ_CLASS_INSTANCE and OBJ_CLASS_DECLARATION could all get an opal_ prefix in opal_object.h and then I think they'd all match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would seem really weird - so instead of just putting "static" in front of the OBJ_CLASS_foo macro, we would put "opal_" in front of every class, regardless of what layer they are declared in?

Copy link
Contributor Author

@markalle markalle Mar 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a script to check and unless I did something wrong there are 300 occurrences of OBJ_CLASS_INSTANCE(name,,,) and every one of them has an OBJ_NEW(name) that occurs in a different file than the OBJ_CLASS_INSTANCE(). So I think that means they really are all globals.

I don't fully understand the OBJ_*() macros, but it looks like the OBJ_CLASS_INSTANCE() is where the global is created, and OBJ_NEW() is a user of that same global.

Oops, update: I did have a bug in the script... okay, so you're right, most are more isolated than that, that makes more sense: 207/300 have OBJ_NEW exclusively in the same file as OBJ_CLASS_INSTANCE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for OBJ_CLASS(name) as well as OBJ_NEW(name) outside the file where OBJ_CLASS_INSTANCE(name,,,) occurred. Now I'm getting:
150 self contained cases that could be static
17 self contained but used OBJ_CLASS_DELARATION() "extern" for some reason
133 actually span multiple files

As far as coding for this I still think adding an opal_ prefix makes sense. It's what the 133 cases need, the others could in principle be switched to
static OBJ_CLASS_INSTANCE(name,,,)
and wouldn't strictly need the opal_ prefix, but it wouldn't hurt anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree the specific string opal_ is weird though... not sure what to do about that. I don't really want this to turn into a manual name change spanning all those files. Is there a var that indicates ompi_ / opal_ / orte_ etc```?

@@ -43,7 +44,7 @@ static opal_dss_buffer_type_t default_buf_type = OPAL_DSS_BUFFER_NON_DESC;
/* variable group id */
static int opal_dss_group_id = -1;

mca_base_var_enum_value_t buffer_type_values[] = {
mca_base_var_enum_value_t opal_buffer_type_values[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can probably also be static.

@@ -75,7 +76,7 @@ typedef struct ompi_wait_sync_t {
(sync)->signaling = false; \
}

OPAL_DECLSPEC int sync_wait_mt(ompi_wait_sync_t *sync);
OPAL_DECLSPEC int opal_sync_wait_mt(ompi_wait_sync_t *sync);
static inline int sync_wait_st (ompi_wait_sync_t *sync)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for symmetry -- even if the _st symbol isn't actually a problem -- should probably make both the _mt and _st versions be prefixed.

@@ -89,7 +90,7 @@ static opal_event_t int_handler;
static opal_event_t epipe_handler;
static opal_event_t sigusr1_handler;
static opal_event_t sigusr2_handler;
char *log_path = NULL;
char *orte_log_path = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be ok for this one to be static, too.

@@ -79,13 +80,13 @@ typedef struct {
int tli_count_since_last_display;
/* Do we want to display these? */
bool tli_display;
} tuple_list_item_t;
} opal_tuple_list_item_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ok to be static.

@symbols = grep(!/^mpimsgq_dll_locations$/, @symbols);

@symbols = grep(!/^mca_/, @symbols); # I'm tempted to call these bad
@symbols = grep(!/^netpatterns_/, @symbols); # I'm tempted to call these bad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netpatterns_ should definitely be bad.

I agree that mca_ is borderline (it unfortunately has its origins at the beginning of the project, and is likely deeply embedded throughout the code base).

@symbols = grep(!/^orted_/i, @symbols);
@symbols = grep(!/^mpi_/i, @symbols);
@symbols = grep(!/^pmpi_/i, @symbols);
@symbols = grep(!/^pmix_/i, @symbols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason it's ok to expose pmix_ public symbols?

@symbols = grep(!/^pmpi_/i, @symbols);
@symbols = grep(!/^pmix_/i, @symbols);
@symbols = grep(!/^mpit_/, @symbols);
@symbols = grep(!/^ompit_/, @symbols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What symbols are mpit_ or ompit_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like libmpi.so had
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
ompit_opal_to_mpit_error
ompit_var_type_to_datatype
so, looks like half of them are in some way thread related, so maybe I'll rename them to ompi_td_big_lock etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if those are just poorly named. Perhaps they should all be ompi_. I.e., they have to do with the MPI_T implementation inside the OMPI layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I guess some of these are MPI_T_ related as well. So we could claim mpit_* for that, or name them mpi_t_. I don't really like how mpi_t_ looks like a fortran entrypoint, I'd kind of prefer all mpi_* be actual MPI calls. But all of MPI_T_ starts with MPI_ already...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we lost that battle in the Forum (i.e., the prefix is MPI_T_, even though it's ugly). 😦

But perhaps those mpit_ symbols should really be MPI_T_ or ompi_mpi_t_ (they look like symbols that are internal to OMPI's MPI_T implementation).

@symbols = grep(!/^event_enable_debug_output$/, @symbols);
@symbols = grep(!/^event_global_current_base_$/, @symbols);
@symbols = grep(!/^event_module_include$/, @symbols);
@symbols = grep(!/^eventops$/, @symbols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 Can you confirm: I think that these should probably all be prefixed with opal_. I.e., we shouldn't be exposing any public symbols from libevent that aren't prefixed.

extern struct evthread_lock_callbacks opal_evthread_lock_fns;
extern struct evthread_condition_callbacks opal_evthread_cond_fns;
extern unsigned long (*opal_evthread_id_fn)(void);
extern int opal_evthread_lock_debugging_enabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably put all libevent-based renamings in opal/mca/event/libevent2022/libevent/opal_rename.h.

@markalle markalle force-pushed the pr/symbol_name_pollution branch from 63e09d7 to fee6f5a Compare March 31, 2017 02:01
@markalle
Copy link
Contributor Author

markalle commented Mar 31, 2017

I think I've addressed all the above.

I didn't aggressively search for things that could have been static, but I changed the ones you mentioned above.

For OBJ_CLASS_*() in opal/class/opal_object.h where it creates a global var I made the var have opal_ prefixed onto its name, so opal_<NAME>_class. That let me revert some of the changes that had otherwise spiraled into a bunch of changes in some of the OBJ_CLASS_*() users to keep the names consistent.

I added more changes for mpit_, ompit_, and netpatterns_. Those changes touched a lot of files.

For libevent you were right, the opal_rename.h made that part way easier.


Question: the Travis CI build failed at distcheck saying
make[5]: *** No rule to make target `nmcheck_prefix', needed by `nmcheck_prefix.log'. Stop.
Is there something wrong with my Makefile.am for test/symbol_name/?

@jjhursey
Copy link
Member

@markalle Yeah, I think the Makefile.am is not quite right. make distcheck is failing while trying to find the compilation target, and since it's a script there is nothing to compile. I'll have to refresh my knowledge of how to address that. I can help you take a look at it today - just ping me.

@jjhursey
Copy link
Member

@markalle Give this patch a try:

diff --git a/test/symbol_name/Makefile.am b/test/symbol_name/Makefile.am
index ba3f8670..11668a65 100644
--- a/test/symbol_name/Makefile.am
+++ b/test/symbol_name/Makefile.am
@@ -7,6 +7,7 @@
 # $HEADER$
 #
 
-TESTS = nmcheck_prefix
+check_SCRIPTS = nmcheck_prefix nmcheck_prefix.pl
+TESTS = $(check_SCRIPTS)
 
 AM_TESTS_ENVIRONMENT = MYBASE='$(top_builddir)'; OMPI_LIBMPI_NAME=@OMPI_LIBMPI_NAME@; export MYBASE OMPI_LIBMPI_NAME;

if(netpatterns_base_verbose > 0) { \
netpatterns_base_err("[%s]%s[%s:%d:%s] ",\
if(ompi_netpatterns_base_verbose > 0) { \
ompi_netpatterns_base_err("[%s]%s[%s:%d:%s] ",\
ompi_process_info.nodename, \
OMPI_NAME_PRINT(OMPI_PROC_MY_NAME), \
__FILE__, __LINE__, __func__); \
netpatterns_base_err args; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one: ompi_netpatterns_base_err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how I missed that in the macro, but what's weird is it didn't show up as "U netpatterns_base_err" in the nm output. Anyway I've changed it.

@rhc54
Copy link
Contributor

rhc54 commented Mar 31, 2017

My concern is that this PR doesn't feel like the right approach. If we have symbols that weren't correctly prefixed, the solution should be to fix those specific symbols - not to automatically throw additional prefixes on top of every symbol. If you do, then we wind up having to search for symbols like "opal_orte_my_object" in the debugger - which is not only annoying, but non-intuitive.

So why don't we just fix the problems, as we have always done before?

@rhc54
Copy link
Contributor

rhc54 commented Mar 31, 2017

I would change this PR to take a different route. Simply highlight to the community the symbols that aren't properly prefixed, and ask that the respective code "owners" fix it. This has worked multiple times in the past, and I see no evidence as to why it wouldn't work again. I'm happy to fix the ones in ORTE and relevant parts of OPAL.

If we all just do a little, the problem will be solved. The contribution here is the scripts that report which symbols we missed so we know what needs fixing.

This, IMO, would be a much preferable solution than automatically slapping prefixes on variables, even those that don't need it (which are likely to be the majority).

@markalle markalle force-pushed the pr/symbol_name_pollution branch from fee6f5a to d5ba69c Compare March 31, 2017 22:59
@markalle
Copy link
Contributor Author

I pushed again with the test/symbol_name/Makefile.am change and the netpatterns_ fix Josh pointed out.

I agree with Ralph's concern that putting opal_ in front of everything that uses OBJ_CLASS_() is weird looking when the users might be ompi_ or orte_. But there are lots of uses of OBJ_CLASS_() that can't just become static either. It would be a lot of code changes to modify all the names in all those files (and to add static to all the ones that didn't really need to be global).

@rhc54
Copy link
Contributor

rhc54 commented Mar 31, 2017

I disagree with your last statement - it is a trivial thing to do. We have a script that does it in seconds:

$ search_replace.pl <current_symbol_name> orte_<current_symbol_name>

Done. You could even have your symbol checking script execute it. What is so hard???

@jjhursey
Copy link
Member

jjhursey commented Apr 3, 2017

I'm having trouble reproducing the CI failures. I did a make distcheck and a make check on a local checkout of Mark's branch and they passed fine. Here is what I'm running:

./autogen.pl
./configure --prefix=/tmp/jjhursey/bogus
make -j 20
make distcheck
make check

Anyone have an idea about how I might be able to reproduce that so I can take a look at what's wrong with the build.

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this the right way by renaming symbols, as we have always done. Force-prefixing is a bad way of solving the problem.

@markalle
Copy link
Contributor Author

markalle commented Apr 5, 2017

That's fine, a large scale search and replace just requires more review to avoid errors, and is a little more work for people rebasing around this checkin. Without enough review, such a script might accidentally change MCA parameter names if their name happened to include the same string as a variable being changed. I don't think that will happen because of how the mca-var-register function is written, but that's the type of thing I'd worry about for a large search and replace.

For the mca_ family would you rather:

  1. leave them mca_
  2. change just the globals to ompi_/opal_/orte_mca_
  3. change them all, global or not

As an example, mca_pml is a global whose name would change, but mca_pml_cm_component_init is a static that doesn't necessarily need to change but could if we want it all to match (and it would be an extra step to keep a global search and replace from hitting the extras like that).

Another question: if we address mca_, should they each get their own prefix based on where they're at ompi_/opal_/orte_ so for example my script might look up the header declaration of each symbol being changed to decide which prefix it should receive, or can all MCA vars get a single prefix. Right now there's an asprintf(&name, "mca_%s_%s",) opal_dl_lookup(,name) that wouldn't currently know which prefix to look for in front if it wasn't uniform. I think things would work overall with multiple-prefixes, but it's an extra step to automate the decision of what is the right prefix for every replacement. And although it's unlikely they'd differ, mca_foo and mca_foo_bar aren't guaranteed to have the same decision for what prefix they need.

@rhc54
Copy link
Contributor

rhc54 commented Apr 5, 2017

Perhaps you could just post on gist (or a file here, or email) the list of variables you've identified so we can look them over? Not everything has to be automated - we just need to fix this once, and then have an automated method for reporting if/when it gets violated again.

@markalle
Copy link
Contributor Author

markalle commented Apr 5, 2017

Okay, I made two lists:
mcasymbols.global.linked.txt = just the globals, and just from libmpi.so etc (not MCAs)
mcasymbols.all.txt = all libs, MCAs included, all mca_ whether global or static
https://gist.github.com/markalle/164fb4662f0f6c9d97aaf36335636daa

A simple search-and-replace that processes only the strings from the globals list might as a side effect hit a portion of the other symbols that are static or just in MCAs. Do we want the result to be a mix of ompi_mca_* and mca_* or do we want to make all the un-needed changes too so things match?

I don't yet have the output ready that includes a guess for each mca_ symbol whether it should have ompi_ or opal_ or orte_ etc.

@rhc54
Copy link
Contributor

rhc54 commented Apr 5, 2017

Well, you cannot rename the mca_foo_component variables as those are what the MCA system is looking for when opening a dll. So maybe the best answer is to simply declare "mca" to be a reserved prefix and be done with it.

@bwbarrett
Copy link
Member

I thought mca was already in our list of reserved namespacing? We've always treated it that way (or at least, did before I left 3 years ago...)

@markalle markalle force-pushed the pr/symbol_name_pollution branch from d5ba69c to 98a0aa9 Compare April 5, 2017 18:55
@markalle
Copy link
Contributor Author

markalle commented Apr 5, 2017

I think I'm okay with declaring mca_ to be reserved.

@markalle markalle force-pushed the pr/symbol_name_pollution branch 7 times, most recently from 950e45d to 1e6629d Compare April 7, 2017 02:00
@markalle
Copy link
Contributor Author

markalle commented Apr 7, 2017

I finally got the testcase to run on the LANL-distcheck and travis-ci systems. I don't really feel like I'm using automake correctly, but at least it's working.

Previous attempts at using DIST, EXTRA_DIST, and check_SCRIPTS didn't result in the script nmcheck_prefix.pl being located in the test directory. I used what might be considered a hack, and am launching ${srcdir}/nmcheck_prefix.pl and now it's able to find the test and run it.

@hppritcha
Copy link
Member

botany bay is having a bad day. try again.
bot:lanl:retest

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/b5373acbf8ec2301067f9ae48996c582

@markalle markalle force-pushed the pr/symbol_name_pollution branch from 1e6629d to 4b4e457 Compare April 12, 2017 18:29
@markalle
Copy link
Contributor Author

rebased again, some commit had used the old name for one of the changed symbols

I think I've addressed all the concerns, the OBJ_CLASS_*() aren't auto-adding any prefixes, the test is accepting "mca_" as an acceptable prefix. I didn't aggressively look for opportunities to make vars static instead of using the prefixes added in the second commit, but some of it has been made static.

@jjhursey
Copy link
Member

@markalle What's the state of this PR? Did we decide to go a different route or is this ready to go (once rebased)?

@markalle
Copy link
Contributor Author

I'd like to get this checkin in, but it touches so many files it goes stale fast. I can rebase it again but it'll become stale within a few days after that if we don't merge it right away.

I could use a different approach if people would be more comfortable: currently this is a single commit that's a combination of changes done by hand (changing vars to static) and scripted name changes. The result I think is too big for people to really review in detail.

I could probably split it out into two commits, one for the by-hand changing which would cover the small-ish number of vars where it was easy to tell it could just become static. Then the scripted changes where I could include the script used.

@bwbarrett
Copy link
Member

Why is an iterative approach not the right one? You could do one framework at a time or one library or something that you can get a handle on. You're going to continue to be unsuccessful with an all-in-one approach. You also need something more than good intentions if this is important (ie, a test that can be run on every PR/commit).

@rhc54
Copy link
Contributor

rhc54 commented Jun 28, 2017

From what I see, it looks okay to me - but I'd echo Brian's comments. At least split it up by library now that we've agreed on approach, and then get it thru the CI and commit.

@markalle
Copy link
Contributor Author

I see value in splitting manual from scripted changes, but is there really value in splitting a list of symbols for scripted changes into multiple sets?

I'm re-doing it now, so if you want these symbols changes separated out from each other, now's the time. And you want a separate pull request for each as opposed to just separate commits?

(libmpi.so) :

coll_base_comm_get_reqs
comm_allgather_pml
comm_allreduce_pml
comm_bcast_pml
fcoll_base_coll_allgather_array
fcoll_base_coll_allgatherv_array
fcoll_base_coll_bcast_array
fcoll_base_coll_gather_array
fcoll_base_coll_gatherv_array
fcoll_base_coll_scatterv_array
fcoll_base_sort_iovec
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
netpatterns_base_err
netpatterns_base_verbose
netpatterns_cleanup_narray_knomial_tree
netpatterns_cleanup_recursive_doubling_tree_node
netpatterns_cleanup_recursive_knomial_allgather_tree_node
netpatterns_cleanup_recursive_knomial_tree_node
netpatterns_init
netpatterns_register_mca_params
netpatterns_setup_multinomial_tree
netpatterns_setup_narray_knomial_tree
netpatterns_setup_narray_tree
netpatterns_setup_narray_tree_contigous_ranks
netpatterns_setup_recursive_doubling_n_tree_node
netpatterns_setup_recursive_doubling_tree_node
netpatterns_setup_recursive_knomial_allgather_tree_node
netpatterns_setup_recursive_knomial_tree_node
ompit_opal_to_mpit_error
ompit_var_type_to_datatype
pml_v_output_close
pml_v_output_open

(libmpi_mpifh.so) :

intercept_extra_state_t_class

(libopen-rte.so) :

odls_base_default_wait_local_proc

(libopen-pal.so) :

_event_debug_mode_on
_evthread_cond_fns
_evthread_id_fn
_evthread_lock_debugging_enabled
_evthread_lock_fns
cmd_line_option_t_class
cmd_line_param_t_class
crs_base_self_checkpoint_fn
crs_base_self_continue_fn
crs_base_self_restart_fn
event_enable_debug_output
event_global_current_base_
event_module_include
eventops
sync_wait_mt
trigger_user_inc_callback
var_type_names
var_type_sizes

@rhc54
Copy link
Contributor

rhc54 commented Jun 28, 2017

Truly? I honestly don't care - I was just suggesting a way for you to get things committed without them constantly becoming stale. One thing is certain: waiting for several weeks before circling back to it will never work.

@markalle markalle force-pushed the pr/symbol_name_pollution branch 2 times, most recently from 4a251cd to 9ca3e6c Compare June 30, 2017 01:40
@markalle
Copy link
Contributor Author

Thanks, I'll take another stab or two at the large blob approach since I still think hitting this once is less of a nuisance than hitting it multiple times. But I agree piecemeal would work fine, and if this doesn't go through soon I'll switch to that approach (dropping the 4th commit above, and submitting a handful of smaller pull requests).

@markalle markalle force-pushed the pr/symbol_name_pollution branch 2 times, most recently from 1e9636d to a6490a3 Compare July 6, 2017 21:34
@markalle
Copy link
Contributor Author

markalle commented Jul 6, 2017

I only just realized (because of PRBC) that there's an option --disable-dlopen that piles way more symbols into libmpi.so that would have normally been safely partitioned off in various mca_*.so. And possibly --enable-mca-static=name that does so more individually

So I've updated the nmcheck_prefix.pl test to be more generous with symbol name pollution when it appears to be coming from an MCA that was built into one of the main libs that would have normally been its own separate mca_*.so. Otherwise there would be a lot more symbols to change, and I think pollution at the MCA .so level is pretty safe.

@gpaulsen
Copy link
Member

Lets discuss at Face to face and try to get this merged in before it becomes stale again.

@markalle
Copy link
Contributor Author

Thanks, fwiw the big commit is now completely scripted so it's not too big a deal if it does go stale again, just takes a few minutes to rerun the script. But having said that, yeah, I'd love for it to go in.

markalle added 4 commits July 11, 2017 02:13
This checks the main libs that would be directly or indirectly linked
against the users executable (libmpi.so, libmpi_mpifh.so, libmpi_usempi.so,
libopen-rte, libopen-pal) using "nm" and looking for symbols without ompi_
opal_ mpi_ etc prefixes.

Signed-off-by: Mark Allen <[email protected]>
As part of addressing symbol name pollution, I'm switching a few
vars/functions to static.

Signed-off-by: Mark Allen <[email protected]>
Along with using git status and related commands to find a list of
modified files to update the copyright on, this adds the option of
using a manually created list from a file (one filename per line).

Signed-off-by: Mark Allen <[email protected]>
Passed the below set of symbols into a script that added ompi_ to them all.

Note that if processing a symbol named "foo" the script turns
    foo  into  ompi_foo
but doesn't turn
    foobar  into  ompi_foobar

But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.

    coll_base_comm_get_reqs
    comm_allgather_pml
    comm_allreduce_pml
    comm_bcast_pml
    fcoll_base_coll_allgather_array
    fcoll_base_coll_allgatherv_array
    fcoll_base_coll_bcast_array
    fcoll_base_coll_gather_array
    fcoll_base_coll_gatherv_array
    fcoll_base_coll_scatterv_array
    fcoll_base_sort_iovec
    mpit_big_lock
    mpit_init_count
    mpit_lock
    mpit_unlock
    netpatterns_base_err
    netpatterns_base_verbose
    netpatterns_cleanup_narray_knomial_tree
    netpatterns_cleanup_recursive_doubling_tree_node
    netpatterns_cleanup_recursive_knomial_allgather_tree_node
    netpatterns_cleanup_recursive_knomial_tree_node
    netpatterns_init
    netpatterns_register_mca_params
    netpatterns_setup_multinomial_tree
    netpatterns_setup_narray_knomial_tree
    netpatterns_setup_narray_tree
    netpatterns_setup_narray_tree_contigous_ranks
    netpatterns_setup_recursive_doubling_n_tree_node
    netpatterns_setup_recursive_doubling_tree_node
    netpatterns_setup_recursive_knomial_allgather_tree_node
    netpatterns_setup_recursive_knomial_tree_node
    pml_v_output_close
    pml_v_output_open
    intercept_extra_state_t
    odls_base_default_wait_local_proc
    _event_debug_mode_on
    _evthread_cond_fns
    _evthread_id_fn
    _evthread_lock_debugging_enabled
    _evthread_lock_fns
    cmd_line_option_t
    cmd_line_param_t
    crs_base_self_checkpoint_fn
    crs_base_self_continue_fn
    crs_base_self_restart_fn
    event_enable_debug_output
    event_global_current_base_
    event_module_include
    eventops
    sync_wait_mt
    trigger_user_inc_callback
    var_type_names
    var_type_sizes

Signed-off-by: Mark Allen <[email protected]>
@markalle markalle force-pushed the pr/symbol_name_pollution branch from a6490a3 to 552216f Compare July 11, 2017 06:13
Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for your patience!

@jsquyres jsquyres merged commit ccf1780 into open-mpi:master Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants