-
Notifications
You must be signed in to change notification settings - Fork 900
Fixing VERSION file for v5.0.0rc10 #11365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
cec1cd5
to
f5ebffb
Compare
This comment was marked as resolved.
This comment was marked as resolved.
f5ebffb
to
b94bc01
Compare
I tested the PR by compiling a hello world with v4.1.x mpicc/mpifort and running it in a 5.0.x environment. v5.0.x with this PR succeeds and without this PR fails with unable to find shared lib. |
@janjust which Fortran interface did you use (since they all use the mpifort compiler)? |
f90 |
My testing in
Which ran on the following examples:
*I removed spc_example from the list, as I couldn't get it to run correctly in my 4.1.4 version anyways. Looking at the output of the runs (which was too long to post here) all* the runs succeed although there are warnings in the c examples (not in the fortran ones). I must pass in LD_LIBRARY_PATH explicitly, even with --prefix or /absolute/path/to/mpirun otherwise I get:
So I guess my conclusion is that the Fortran ABI is probably stable. The c interface I'm not sure about given the warnings. |
Yes, we need to dig further; I'm surprised we changed MCW's size; that needs investigating. |
I used the c interfaces, a simple hello world example.
|
@janjust I'm not sure what you mean -- you say you used the C interfaces, but you showed an example Fortran program...? |
nvm, fortran and me are in different universes, I thought that there may be different MPI interfaces for Fortran programs and that's what Brian was asking about? Either way, I used the above program, and mpif90 to build it and test it. |
We have a bunch of top-level MPI interfaces:
|
This PR is attempting to resolve issue #11347. Linking here. |
I think Tommy's example is using the mpif.h include file, but without explicit "external" statements or MPI_ADDRESS_KIND constants. Fortran will let you call subroutines it knows nothing about, and it just hopes you know what you are doing, and looks for the symbol during linking. For example I think you could add a few extra integer arguments to MPI_Init there, and compile and link will complete without error, but runtime could fail mysteriously. |
It looks like PREDEFINED_COMMUNICATOR_PAD went from 512 to 1024 This is the source of the size warning. |
Correction, #9097 is what changed it from 512->1024 |
His example used
Ah yes; we all forgot about that. So here's the question (and I don't remember): did we need to do that, or was that just a precautionary bump in size back when we thought we were ok with breaking ABI between 4.x and 5.x? I.e., if we move the PAD back to 512, is it still big enough? |
Ha, oops, I missed that line Looks like the ompi_communicator_t struct is 538 bytes when I compile in main. 😞 |
I think Sessions put us over the edge. It might be time for another layer of indirection; I believe some of the big pieces in the communicator could survive being moved to an auxiliary structure that we reference via pointer. I don't have time this week to take on any of that work, but it does seem like that's our path forward. |
@open-mpi/ompi-rm-5-0-x I filed #11373 to track this issue. |
Now that #11373 has been merged to |
👍 looks good to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we commit this, can we either do the testing or remove the "must be tested" part of the commit message?
What's the status on the testing? I can do some compatibility testing if it hasn't been done so far |
bot:notacherrypick Open MPI v5.0.0 shared libraries are ABI compatible with v4.1.x with a few subtle possible exceptions for Fortran. In the rare case that you compile your application in such a way that the size of an integer for C is different than the size of an integer in Fortran, you'll need to rebuild and relink your application. There are some additional Fortran API changes involving intents and asyncs, along with changing some interfaces from named to unamed. Resetting the age of all internal libraries to 0 for v5.0.0 Refer to https://docs.open-mpi.org/en/v5.0.x/version-numbering.html#shared-library-version-number for policy Signed-off-by: Geoffrey Paulsen <[email protected]>
Retesting. Hope to merge today assuming all goes well. |
b94bc01
to
6aaa8b5
Compare
Testing succeeded. Only updated commit comment (no code change) in force push. |
@gpaulsen, Did you build any test libraries? Wei (@wzamazon ) did some testing on a related issue here. I'll relay his results here:
But I don't think we install We found this because we were attempting to run (closed source) Ansys Fluent using Open MPI 5.0.x, but found it's .so needed these files. |
No, I built tests from examples, and our internal test harness with tip of v4.0.x and then I |
OK, I did some testing as well and confirmed ompi 4x is NOT directly linking those libraries as I reported in my last post. sorry for the confusion (Ansus Fluent, however is, but we will follow up with them separately). I have double-checked all the executables in the examples folders, and with this PR they all run successfully. |
For posterity, it's worth noting that ldd does the full recursion of library chain. So if you run ldd on a binary that links against libmpi.so, you will see libopen-rte and libopen-pal appear, as they are pulled in as dependencies. I did check that mpicc and ompi-c.pc in versions since 3.0.0 has not included libopen-rte or libopen-pal for dynamic linking. So while Fluent isn't backwards compatible, I don't think that is actually due to Open MPI; we shouldn't be held to them linking against internal libraries. |
bot:notacherrypick
Open MPI v5.0.0 shared libraries are ABI compatible with v4.1.x with a few subtle possible exceptions for Fortran.
In the rare case that you compile your application in such a way that the size of an integer for C is different than the size of an integer in Fortran, you'll need to rebuild and relink your application.
There are some additional Fortran API changes involving intents and asyncs, along with changing some interfaces from named to unamed.
Refer to https://docs.open-mpi.org/en/v5.0.x/version-numbering.html#shared-library-version-number for policy
Signed-off-by: Geoffrey Paulsen [email protected]