-
Notifications
You must be signed in to change notification settings - Fork 901
symbol name pollution #3258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
symbol name pollution #3258
Conversation
Ummm...some of these changes aren't correct. The symbol should have just been made |
Ah yes, in general I erred on the side of leaving things global and adding prefixes. There were only a couple symbols where I switched it to static. I figured this way is less error-prone, but it is possible to add more statics instead. Would you still go static for symbols that exist in two files, foo.h and foo.c? To my eye that would be a symbol that could potentially later be used from another file even if it's only used from one file now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markalle Thanks for doing all this cleanup. I'd marginally prefer more statics than prefixing, just because we can always make something non-static later. I.e., I'd err on the side of less publics.
@@ -89,28 +90,28 @@ static int extent_intercept_fn(MPI_Datatype type_c, MPI_Aint *file_extent, | |||
/* Data structure passed to the intercepts (see below). It is an OPAL | |||
list_item_t so that we can clean this memory up during | |||
MPI_FINALIZE. */ | |||
typedef struct intercept_extra_state { | |||
typedef struct ompi_intercept_extra_state { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little surprised that this type name would be exposed externally. Are we using it outside of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that one actually came from
OBJ_CLASS_DECLARATION(intercept_extra_state_t);
and maybe I could have done a more targeted change but I figured all the OBJ_ macros probably work together and would all likely need the same name change. Another option is to fix it at the OBJ_CLASS_DECLARATION macro:
#define OBJ_CLASS_DECLARATION(NAME)
extern opal_class_t NAME ## _class
maybe that should be extern opal_class_t opal_ ## NAME ## _class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance, I think OBJ_CLASS OBJ_CLASS_INSTANCE and OBJ_CLASS_DECLARATION could all get an opal_ prefix in opal_object.h and then I think they'd all match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would seem really weird - so instead of just putting "static" in front of the OBJ_CLASS_foo macro, we would put "opal_" in front of every class, regardless of what layer they are declared in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a script to check and unless I did something wrong there are 300 occurrences of OBJ_CLASS_INSTANCE(name,,,) and every one of them has an OBJ_NEW(name) that occurs in a different file than the OBJ_CLASS_INSTANCE(). So I think that means they really are all globals.
I don't fully understand the OBJ_*() macros, but it looks like the OBJ_CLASS_INSTANCE() is where the global is created, and OBJ_NEW() is a user of that same global.
Oops, update: I did have a bug in the script... okay, so you're right, most are more isolated than that, that makes more sense: 207/300 have OBJ_NEW exclusively in the same file as OBJ_CLASS_INSTANCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check for OBJ_CLASS(name) as well as OBJ_NEW(name) outside the file where OBJ_CLASS_INSTANCE(name,,,) occurred. Now I'm getting:
150 self contained cases that could be static
17 self contained but used OBJ_CLASS_DELARATION() "extern" for some reason
133 actually span multiple files
As far as coding for this I still think adding an opal_
prefix makes sense. It's what the 133 cases need, the others could in principle be switched to
static OBJ_CLASS_INSTANCE(name,,,)
and wouldn't strictly need the opal_
prefix, but it wouldn't hurt anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree the specific string opal_
is weird though... not sure what to do about that. I don't really want this to turn into a manual name change spanning all those files. Is there a var that indicates ompi_ / opal_ / orte_
etc```?
opal/dss/dss_open_close.c
Outdated
@@ -43,7 +44,7 @@ static opal_dss_buffer_type_t default_buf_type = OPAL_DSS_BUFFER_NON_DESC; | |||
/* variable group id */ | |||
static int opal_dss_group_id = -1; | |||
|
|||
mca_base_var_enum_value_t buffer_type_values[] = { | |||
mca_base_var_enum_value_t opal_buffer_type_values[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can probably also be static.
opal/threads/wait_sync.h
Outdated
@@ -75,7 +76,7 @@ typedef struct ompi_wait_sync_t { | |||
(sync)->signaling = false; \ | |||
} | |||
|
|||
OPAL_DECLSPEC int sync_wait_mt(ompi_wait_sync_t *sync); | |||
OPAL_DECLSPEC int opal_sync_wait_mt(ompi_wait_sync_t *sync); | |||
static inline int sync_wait_st (ompi_wait_sync_t *sync) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for symmetry -- even if the _st
symbol isn't actually a problem -- should probably make both the _mt
and _st
versions be prefixed.
@@ -89,7 +90,7 @@ static opal_event_t int_handler; | |||
static opal_event_t epipe_handler; | |||
static opal_event_t sigusr1_handler; | |||
static opal_event_t sigusr2_handler; | |||
char *log_path = NULL; | |||
char *orte_log_path = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be ok for this one to be static, too.
orte/util/show_help.c
Outdated
@@ -79,13 +80,13 @@ typedef struct { | |||
int tli_count_since_last_display; | |||
/* Do we want to display these? */ | |||
bool tli_display; | |||
} tuple_list_item_t; | |||
} opal_tuple_list_item_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ok to be static.
test/symbol_name/nmcheck_prefix.pl
Outdated
@symbols = grep(!/^mpimsgq_dll_locations$/, @symbols); | ||
|
||
@symbols = grep(!/^mca_/, @symbols); # I'm tempted to call these bad | ||
@symbols = grep(!/^netpatterns_/, @symbols); # I'm tempted to call these bad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netpatterns_
should definitely be bad.
I agree that mca_
is borderline (it unfortunately has its origins at the beginning of the project, and is likely deeply embedded throughout the code base).
test/symbol_name/nmcheck_prefix.pl
Outdated
@symbols = grep(!/^orted_/i, @symbols); | ||
@symbols = grep(!/^mpi_/i, @symbols); | ||
@symbols = grep(!/^pmpi_/i, @symbols); | ||
@symbols = grep(!/^pmix_/i, @symbols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason it's ok to expose pmix_
public symbols?
test/symbol_name/nmcheck_prefix.pl
Outdated
@symbols = grep(!/^pmpi_/i, @symbols); | ||
@symbols = grep(!/^pmix_/i, @symbols); | ||
@symbols = grep(!/^mpit_/, @symbols); | ||
@symbols = grep(!/^ompit_/, @symbols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What symbols are mpit_
or ompit_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like libmpi.so had
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
ompit_opal_to_mpit_error
ompit_var_type_to_datatype
so, looks like half of them are in some way thread related, so maybe I'll rename them to ompi_td_big_lock etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if those are just poorly named. Perhaps they should all be ompi_
. I.e., they have to do with the MPI_T implementation inside the OMPI layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I guess some of these are MPI_T_ related as well. So we could claim mpit_* for that, or name them mpi_t_. I don't really like how mpi_t_ looks like a fortran entrypoint, I'd kind of prefer all mpi_* be actual MPI calls. But all of MPI_T_ starts with MPI_ already...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we lost that battle in the Forum (i.e., the prefix is MPI_T_
, even though it's ugly). 😦
But perhaps those mpit_
symbols should really be MPI_T_
or ompi_mpi_t_
(they look like symbols that are internal to OMPI's MPI_T implementation).
test/symbol_name/nmcheck_prefix.pl
Outdated
@symbols = grep(!/^event_enable_debug_output$/, @symbols); | ||
@symbols = grep(!/^event_global_current_base_$/, @symbols); | ||
@symbols = grep(!/^event_module_include$/, @symbols); | ||
@symbols = grep(!/^eventops$/, @symbols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 Can you confirm: I think that these should probably all be prefixed with opal_
. I.e., we shouldn't be exposing any public symbols from libevent that aren't prefixed.
extern struct evthread_lock_callbacks opal_evthread_lock_fns; | ||
extern struct evthread_condition_callbacks opal_evthread_cond_fns; | ||
extern unsigned long (*opal_evthread_id_fn)(void); | ||
extern int opal_evthread_lock_debugging_enabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably put all libevent-based renamings in opal/mca/event/libevent2022/libevent/opal_rename.h
.
63e09d7
to
fee6f5a
Compare
I think I've addressed all the above. I didn't aggressively search for things that could have been static, but I changed the ones you mentioned above. For I added more changes for For libevent you were right, the opal_rename.h made that part way easier. Question: the Travis CI build failed at distcheck saying |
@markalle Yeah, I think the |
@markalle Give this patch a try: diff --git a/test/symbol_name/Makefile.am b/test/symbol_name/Makefile.am
index ba3f8670..11668a65 100644
--- a/test/symbol_name/Makefile.am
+++ b/test/symbol_name/Makefile.am
@@ -7,6 +7,7 @@
# $HEADER$
#
-TESTS = nmcheck_prefix
+check_SCRIPTS = nmcheck_prefix nmcheck_prefix.pl
+TESTS = $(check_SCRIPTS)
AM_TESTS_ENVIRONMENT = MYBASE='$(top_builddir)'; OMPI_LIBMPI_NAME=@OMPI_LIBMPI_NAME@; export MYBASE OMPI_LIBMPI_NAME; |
ompi/patterns/net/netpatterns.h
Outdated
if(netpatterns_base_verbose > 0) { \ | ||
netpatterns_base_err("[%s]%s[%s:%d:%s] ",\ | ||
if(ompi_netpatterns_base_verbose > 0) { \ | ||
ompi_netpatterns_base_err("[%s]%s[%s:%d:%s] ",\ | ||
ompi_process_info.nodename, \ | ||
OMPI_NAME_PRINT(OMPI_PROC_MY_NAME), \ | ||
__FILE__, __LINE__, __func__); \ | ||
netpatterns_base_err args; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed one: ompi_netpatterns_base_err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how I missed that in the macro, but what's weird is it didn't show up as "U netpatterns_base_err" in the nm output. Anyway I've changed it.
My concern is that this PR doesn't feel like the right approach. If we have symbols that weren't correctly prefixed, the solution should be to fix those specific symbols - not to automatically throw additional prefixes on top of every symbol. If you do, then we wind up having to search for symbols like "opal_orte_my_object" in the debugger - which is not only annoying, but non-intuitive. So why don't we just fix the problems, as we have always done before? |
I would change this PR to take a different route. Simply highlight to the community the symbols that aren't properly prefixed, and ask that the respective code "owners" fix it. This has worked multiple times in the past, and I see no evidence as to why it wouldn't work again. I'm happy to fix the ones in ORTE and relevant parts of OPAL. If we all just do a little, the problem will be solved. The contribution here is the scripts that report which symbols we missed so we know what needs fixing. This, IMO, would be a much preferable solution than automatically slapping prefixes on variables, even those that don't need it (which are likely to be the majority). |
fee6f5a
to
d5ba69c
Compare
I pushed again with the test/symbol_name/Makefile.am change and the netpatterns_ fix Josh pointed out. I agree with Ralph's concern that putting |
I disagree with your last statement - it is a trivial thing to do. We have a script that does it in seconds: $ search_replace.pl <current_symbol_name> orte_<current_symbol_name> Done. You could even have your symbol checking script execute it. What is so hard??? |
I'm having trouble reproducing the CI failures. I did a ./autogen.pl
./configure --prefix=/tmp/jjhursey/bogus
make -j 20
make distcheck
make check Anyone have an idea about how I might be able to reproduce that so I can take a look at what's wrong with the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix this the right way by renaming symbols, as we have always done. Force-prefixing is a bad way of solving the problem.
That's fine, a large scale search and replace just requires more review to avoid errors, and is a little more work for people rebasing around this checkin. Without enough review, such a script might accidentally change MCA parameter names if their name happened to include the same string as a variable being changed. I don't think that will happen because of how the mca-var-register function is written, but that's the type of thing I'd worry about for a large search and replace. For the
As an example, Another question: if we address |
Perhaps you could just post on gist (or a file here, or email) the list of variables you've identified so we can look them over? Not everything has to be automated - we just need to fix this once, and then have an automated method for reporting if/when it gets violated again. |
Okay, I made two lists: A simple search-and-replace that processes only the strings from the globals list might as a side effect hit a portion of the other symbols that are static or just in MCAs. Do we want the result to be a mix of I don't yet have the output ready that includes a guess for each |
Well, you cannot rename the |
I thought mca was already in our list of reserved namespacing? We've always treated it that way (or at least, did before I left 3 years ago...) |
d5ba69c
to
98a0aa9
Compare
I think I'm okay with declaring |
950e45d
to
1e6629d
Compare
I finally got the testcase to run on the LANL-distcheck and travis-ci systems. I don't really feel like I'm using automake correctly, but at least it's working. Previous attempts at using DIST, EXTRA_DIST, and check_SCRIPTS didn't result in the script nmcheck_prefix.pl being located in the test directory. I used what might be considered a hack, and am launching ${srcdir}/nmcheck_prefix.pl and now it's able to find the test and run it. |
botany bay is having a bad day. try again. |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/b5373acbf8ec2301067f9ae48996c582 |
1e6629d
to
4b4e457
Compare
rebased again, some commit had used the old name for one of the changed symbols I think I've addressed all the concerns, the OBJ_CLASS_*() aren't auto-adding any prefixes, the test is accepting "mca_" as an acceptable prefix. I didn't aggressively look for opportunities to make vars static instead of using the prefixes added in the second commit, but some of it has been made static. |
@markalle What's the state of this PR? Did we decide to go a different route or is this ready to go (once rebased)? |
I'd like to get this checkin in, but it touches so many files it goes stale fast. I can rebase it again but it'll become stale within a few days after that if we don't merge it right away. I could use a different approach if people would be more comfortable: currently this is a single commit that's a combination of changes done by hand (changing vars to static) and scripted name changes. The result I think is too big for people to really review in detail. I could probably split it out into two commits, one for the by-hand changing which would cover the small-ish number of vars where it was easy to tell it could just become static. Then the scripted changes where I could include the script used. |
Why is an iterative approach not the right one? You could do one framework at a time or one library or something that you can get a handle on. You're going to continue to be unsuccessful with an all-in-one approach. You also need something more than good intentions if this is important (ie, a test that can be run on every PR/commit). |
From what I see, it looks okay to me - but I'd echo Brian's comments. At least split it up by library now that we've agreed on approach, and then get it thru the CI and commit. |
I see value in splitting manual from scripted changes, but is there really value in splitting a list of symbols for scripted changes into multiple sets? I'm re-doing it now, so if you want these symbols changes separated out from each other, now's the time. And you want a separate pull request for each as opposed to just separate commits? (libmpi.so) :
(libmpi_mpifh.so) :
(libopen-rte.so) :
(libopen-pal.so) :
|
Truly? I honestly don't care - I was just suggesting a way for you to get things committed without them constantly becoming stale. One thing is certain: waiting for several weeks before circling back to it will never work. |
4a251cd
to
9ca3e6c
Compare
Thanks, I'll take another stab or two at the large blob approach since I still think hitting this once is less of a nuisance than hitting it multiple times. But I agree piecemeal would work fine, and if this doesn't go through soon I'll switch to that approach (dropping the 4th commit above, and submitting a handful of smaller pull requests). |
1e9636d
to
a6490a3
Compare
I only just realized (because of PRBC) that there's an option --disable-dlopen that piles way more symbols into libmpi.so that would have normally been safely partitioned off in various mca_*.so. And possibly --enable-mca-static=name that does so more individually So I've updated the nmcheck_prefix.pl test to be more generous with symbol name pollution when it appears to be coming from an MCA that was built into one of the main libs that would have normally been its own separate mca_*.so. Otherwise there would be a lot more symbols to change, and I think pollution at the MCA .so level is pretty safe. |
Lets discuss at Face to face and try to get this merged in before it becomes stale again. |
Thanks, fwiw the big commit is now completely scripted so it's not too big a deal if it does go stale again, just takes a few minutes to rerun the script. But having said that, yeah, I'd love for it to go in. |
This checks the main libs that would be directly or indirectly linked against the users executable (libmpi.so, libmpi_mpifh.so, libmpi_usempi.so, libopen-rte, libopen-pal) using "nm" and looking for symbols without ompi_ opal_ mpi_ etc prefixes. Signed-off-by: Mark Allen <[email protected]>
As part of addressing symbol name pollution, I'm switching a few vars/functions to static. Signed-off-by: Mark Allen <[email protected]>
Along with using git status and related commands to find a list of modified files to update the copyright on, this adds the option of using a manually created list from a file (one filename per line). Signed-off-by: Mark Allen <[email protected]>
Passed the below set of symbols into a script that added ompi_ to them all. Note that if processing a symbol named "foo" the script turns foo into ompi_foo but doesn't turn foobar into ompi_foobar But beyond that the script is blind to C syntax, so it hits strings and comments etc as well as vars/functions. coll_base_comm_get_reqs comm_allgather_pml comm_allreduce_pml comm_bcast_pml fcoll_base_coll_allgather_array fcoll_base_coll_allgatherv_array fcoll_base_coll_bcast_array fcoll_base_coll_gather_array fcoll_base_coll_gatherv_array fcoll_base_coll_scatterv_array fcoll_base_sort_iovec mpit_big_lock mpit_init_count mpit_lock mpit_unlock netpatterns_base_err netpatterns_base_verbose netpatterns_cleanup_narray_knomial_tree netpatterns_cleanup_recursive_doubling_tree_node netpatterns_cleanup_recursive_knomial_allgather_tree_node netpatterns_cleanup_recursive_knomial_tree_node netpatterns_init netpatterns_register_mca_params netpatterns_setup_multinomial_tree netpatterns_setup_narray_knomial_tree netpatterns_setup_narray_tree netpatterns_setup_narray_tree_contigous_ranks netpatterns_setup_recursive_doubling_n_tree_node netpatterns_setup_recursive_doubling_tree_node netpatterns_setup_recursive_knomial_allgather_tree_node netpatterns_setup_recursive_knomial_tree_node pml_v_output_close pml_v_output_open intercept_extra_state_t odls_base_default_wait_local_proc _event_debug_mode_on _evthread_cond_fns _evthread_id_fn _evthread_lock_debugging_enabled _evthread_lock_fns cmd_line_option_t cmd_line_param_t crs_base_self_checkpoint_fn crs_base_self_continue_fn crs_base_self_restart_fn event_enable_debug_output event_global_current_base_ event_module_include eventops sync_wait_mt trigger_user_inc_callback var_type_names var_type_sizes Signed-off-by: Mark Allen <[email protected]>
a6490a3
to
552216f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for your patience!
Adding a test into "make check" that uses "nm" to look at the symbol names in the main libraries (libmpi, libmpi_fh, libmpi_usempi, libopen-rte, libopen-pal) that user apps are directly/indirectly linked against. It expects everything to have some prefix like ompi_, opal_, orte_ etc.
The second commit in this pull request adds "ompi_" onto a handful of symbols the test identified as bad. There are still a couple categories I didn't change that I found iffy: mca_, netpatterns_, and a few that seem related to libevent. I left those pieces untouched and for now have them being accepted by the testcase.
The third commit is just a minor enhancement to update-my-copyright.pl to add a --manual-list=file option.