-
Notifications
You must be signed in to change notification settings - Fork 900
mca: Dynamic components link against project lib #4121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ompi/mca/bml/r2/Makefile.am
Outdated
@@ -37,6 +38,9 @@ mcacomponentdir = $(ompilibdir) | |||
mcacomponent_LTLIBRARIES = $(component_install) | |||
mca_bml_r2_la_SOURCES = $(r2_sources) | |||
mca_bml_r2_la_LDFLAGS = -module -avoid-version | |||
mca_bml_r2_la_LIBADD = $(top_builddir)/ompi/lib@[email protected] \ | |||
$(top_builddir)/orte/lib@[email protected] \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blah; First, we should find a way to macro-ize this, so there's only one thing in LIBADD.
But second, is a dependency on all three libraries the right thing here? It means that a bump in the major version of any one of the three will result in a load failure, which I guess is fine?
Note the WIP tag. Initial commit basically reversed this commit with a script. Initial PR posted to give folks an early chance to test. This PR needs some cleanup still including:
If you have more suggestions for changes feel free to keep the conversation going here or on the issue. |
@jjhursey I confirm all of mpi4py testsuite passes. I have two minor regressions, but they seem to be unrelated to this PR (these failures occur even if I dlopen libmpi). |
@jjhursey You really cannot do this - building "-no-orte" and using your own RTE will cause all of these components to fail. You can only be certain that the project the component belongs to will have built - you cannot assume any other project did. |
I'm not sure I completely agree with @rhc54. I agree that we probably shouldn't explicitly include the liborte / libopal paths in every OMPI makefile (or libopal in every ORTE makefile). But I think at configure time, you can know which liborte you're going to link with (at least, whether it's a relative path or out-of-tree libopen-rte), since you can know when you link libmpi. So that's solvable, but with more work than we have today. It's also worth noting that not all components have a build-time dependency on libopen-rte, so it's not clear that we should be linking libopen-rte into every component. Probably safest to do that during the initial pass, but definitely not necessary in general. |
I could try a version where each project level only links to it's project level library (and not that of the one(s) beneath it). I suspect that will still work for the driving use case since |
@bwbarrett I'm afraid that isn't true as the RTE enters the MPI layer solely thru the ompi/mca/rte framework. There is nothing in the configure line to tell you which RTE component is going to be selected, and only the RTE component actually links in the RTE library. |
This is why there are no RTE symbols anywhere in the OMPI layer - not in the base, nor in any component. |
@rhc54, ah, if we've actually completed that transition, then I agree with you. We shouldn't link libopen-rte.so into the OMPI modules (nor should we need to). Still an open question about open-pal.so, but it's not needed, so seems simpler to avoid it and just link in the project library. |
a61e2f7
to
822d749
Compare
I just re-posted the commit where each component only links against it's project's lib. I'm rebuilding now to confirm it still fixes the problem. |
I confirmed that this linking model works for the python binding example. |
Per teleconf - This is looking good. A few notes:
I'll try to polish this off today/tomorrow. |
@jjhursey I've rebuilt with your latest changes, mpi4py worked just fine. Do you still need help with the Java bindings? I can contribute a patch, but I have no clue how to test it. |
be33adf
to
f04dc6f
Compare
@jjhursey ...
opal/mca/btl/openib/Makefile.am:121: warning: mca_btl_openib_la_LIBADD was already defined in condition TRUE, which includes condition OPAL_cuda_support ...
opal/mca/btl/openib/Makefile.am:117: ... 'mca_btl_openib_la_LIBADD' previously defined here
configure.ac: installing 'config/ylwrap'
opal/mca/btl/smcuda/Makefile.am:57: warning: mca_btl_smcuda_la_LIBADD was already defined in condition TRUE, which includes condition OPAL_cuda_support ...
opal/mca/btl/smcuda/Makefile.am:52: ... 'mca_btl_smcuda_la_LIBADD' previously defined here
opal/mca/rcache/gpusm/Makefile.am:54: warning: mca_rcache_gpusm_la_LIBADD was already defined in condition TRUE, which includes condition OPAL_cuda_support ...
opal/mca/rcache/gpusm/Makefile.am:51: ... 'mca_rcache_gpusm_la_LIBADD' previously defined here
opal/mca/rcache/grdma/Makefile.am:53: warning: mca_rcache_grdma_la_LIBADD was already defined in condition TRUE, which includes condition OPAL_cuda_support ...
opal/mca/rcache/grdma/Makefile.am:50: ... 'mca_rcache_grdma_la_LIBADD' previously defined here
opal/mca/rcache/rgpusm/Makefile.am:52: warning: mca_rcache_rgpusm_la_LIBADD was already defined in condition TRUE, which includes condition OPAL_cuda_support ...
opal/mca/rcache/rgpusm/Makefile.am:49: ... 'mca_rcache_rgpusm_la_LIBADD' previously defined here
... |
@dalcinl Ah good catch. I'll fix that in the next series of pushes (missed the |
f04dc6f
to
95a0552
Compare
@dalcinl Unfortunately, I do not think we'll ever be able to remove that call to |
04e7fb3
to
d6f5de3
Compare
I just pushed an update to the Java bindings commit that removed the For the |
@matcabral Can you test the Java bindings on this branch, since it has PSM[2] implications? Thanks! |
I'll only modify that "never" a bit to be "for the next few years". AFAIK, the problem in the PSM library has been resolved, and so we only need it to protect against earlier versions until "everyone" has upgraded. |
@jsquyres I'm not sure you understood my question. Do you need to call |
Hi all, |
@jsquyres I just remembered the JAVA bindings are expected to not work with PSM: |
@matcabral Maybe that comment in the FAQ is actually outdated? OK, so now I think you should keep the |
Once I get the 👍 from the reviewers we can merge, and then consider the change for the release branches. |
@bwbarrett Is this too late for v3.0.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty minor nit-picks in the python.
contrib/libadd_mca_comp_update.py
Outdated
'oshmem' : ["$(top_builddir)/oshmem/liboshmem.la"], | ||
} | ||
# projects= {'ompi' : ["$(top_builddir)/ompi/lib@[email protected]"], | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably remove this commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll fix.
print("") | ||
|
||
print("="*40); | ||
print("{:>3} : Files skipped".format(len(skipped_files))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why len(..)
here, but str(len(...))
elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just need the len(
version here since I'm using the new .format
mode that doesn't need it to be converted to a string first. I'll update the others.
I think it's probably too late for 3.0.0, but since this doesn't break ABI, I think I'd be ok pulling it in for 3.0.1. Let's see how rc4 goes; if there are other major issues, we can re-evaluate. |
* This script will search for all of the `Makefile.am` files in each of the project-level components. Then it adds the project-level library to `mca_FRAMEWORK_COMPONENT_la_LIBADD`. - If the library is already in the LIBADD list then it's skipped. So it is safe to run multiple times on the same codebase. Signed-off-by: Joshua Hursey <[email protected]>
* Resolves open-mpi#3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@[email protected] MCA components in orte/ $(top_builddir)/orte/lib@[email protected] MCA components in opal/ $(top_builddir)/opal/lib@[email protected] MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <[email protected]>
* See discussion on Issue open-mpi#3705 regarding why this is no longer needed. Signed-off-by: Joshua Hursey <[email protected]>
d6f5de3
to
49c40f0
Compare
@jsquyres I just re-pushed fixing the python items you mentioned. |
We'd love this to go into v3.0.0 (If there's going to be another RC of substantial content). If v3.0.0 ships, we're excited about that too. 🥇 v3.0.1 will work in that case. |
@rhc or @bwbarrett Any other comments on this PR before I merge? |
ok with me! |
@jjhursey, Works for me. |
Ouch - I have found one downside to this PR. Now every top-level compile will rerun thru every component and relink it, even if nothing changed in it or anywhere in the code base. Significantly slower build times. |
support
dlopen
withRTLD_LOCAL
.using the
mca_FRAMEWORK_COMPONENT_la_LIBADD
in theMakefile.am
.Note: The changes in this commit were automated. Some components
were not included because they are statically built only.