Skip to content

Fixed C/Fortran linking to include module parameters in Psend/Precv_init #9543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

EmmanuelBRELLE
Copy link
Contributor

Signed-off-by: Emmanuel Brelle [email protected]

Suggestion to fix #9534

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks correct (good thing we caught it before 5.0.0 -- I'm guessing that the info arg was added to these APIs after the OMPI prototype was completed...?). However, this PR raises two additional questions:

  1. I notice that there are no partitioned APIs in the TKR mpi module bindings (i.e., ompi/mpi/fortran/use-mpi-tkr). Are we doing that intentionally, or is that an oversight? I think it's an oversight, and it really should be fixed before 5.0.0.
  2. I notice that the implementation of ompi/mpi/fortran/use-mpi-f08/parrived_f08.F90 is different than the others; it calls the PMPI subroutine from the mpi module, rather than directly invoking the back-end ompi_parrived_f() function (which this PR fixes adding the missing declaration for). Practically, there's likely no difference in how it operates at run time. But it's an odd disparity in the implementation in an already-confusing set of Fortran bindings.

@jsquyres
Copy link
Member

ok to test

@jsquyres
Copy link
Member

@mdosanjh Can you please comment on my questions in #9543 (review)?

@jsquyres
Copy link
Member

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2050d8bbd31f966b5ba4ea2a1215716e

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/91d045f80eb8905915c5112c6780d3c3

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/03fa7e42c4a924e8277bfaa0216e06d2

@mdosanjh
Copy link
Contributor

This PR looks correct (good thing we caught it before 5.0.0 -- I'm guessing that the info arg was added to these APIs after the OMPI prototype was completed...?). However, this PR raises two additional questions:

  1. I notice that there are no partitioned APIs in the TKR mpi module bindings (i.e., ompi/mpi/fortran/use-mpi-tkr). Are we doing that intentionally, or is that an oversight? I think it's an oversight, and it really should be fixed before 5.0.0.

  2. I notice that the implementation of ompi/mpi/fortran/use-mpi-f08/parrived_f08.F90 is different than the others; it calls the PMPI subroutine from the mpi module, rather than directly invoking the back-end ompi_parrived_f() function (which this PR fixes adding the missing declaration for). Practically, there's likely no difference in how it operates at run time. But it's an odd disparity in the implementation in an already-confusing set of Fortran bindings.

Both of these are likely oversights on my end; my fortran experience/knowledge is minimal.

@EmmanuelBRELLE EmmanuelBRELLE force-pushed the FIX-9534-Info-in-partitioned branch from dd214df to a4d4e2f Compare October 15, 2021 13:33
@EmmanuelBRELLE
Copy link
Contributor Author

@EmmanuelBRELLE Got some legit CI failures -- can you check? See https://jenkins.open-mpi.org/jenkins/job/open-mpi.build.autogen_options/9173/AUTOGEN_OPTIONS=--no-oshmem/console, for example.

Sorry, fixed in a4d4e2f

@jsquyres
Copy link
Member

bot:ompi:retest

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/75d7d62be842dd4da5c2443adc65aab5

implicit none
INTEGER, INTENT(IN) :: request
INTEGER, INTENT(IN) :: partition
LOGICAL, INTENT(OUT) :: flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is my bad -- I set you down a wrong path here.

After seeing the last CI failure, I know/remember why the mpi_f08 MPI_Parrived() f08 subroutine is implemented as a call to PMPI_Parrived: it's because of the LOGICAL parameter.

Here's the short version:

  1. BIND(C,...) is Fortran's language interop mechanism to C. It's convenient in a lot of ways to make Fortran and C play nice with each other.
  2. Fortran's LOGICAL type does not have a corresponding C type, mainly because the Fortran values for .TRUE. and .FALSE. are not mandated by the language, and vary from compiler to compiler (!).
  3. Hence, you cannot make a subroutine that has a LOGICAL parameter be BIND(C...).

That's why whoever implemented the mpi_f08 MPI_PARRIVED function used the sneaky trick of calling PMPI_Parrived() instead of calling ompi_parrived_f(). What the MPI_PARRIVED mpi_f08 implementation was missing was a comment saying, "Hey, we did this on purpose" so that we wouldn't have exactly this situation (someone -- me -- says, "hey, that's weird -- why are you calling a PMPI function from mpi_f08?). The mpi_f08 MPI_TEST is implemented this way, for example, and has a handy comment:

! See note in mpi-f-interfaces-bind.h for why we include an
! interface here and call a PMPI_* subroutine below.

Can you make the mpi_f08 MPI_PARRIVED like the mpi_f08 MPI_TEST (including the comment)? You can then remove ompi_parrived_f from here, since it can't be both BIND(C...) and include a LOGICAL param.

If you care for a longer explanation for this issue, see here:

! Most of the "wrapper" subroutines in the mpi_f08 module (i.e., all
! the ones prototyped in this file) are simple routines that simply
! invoke a back-end ompi_*_f() subroutine, which is BIND(C)-bound to a
! back-end C function. Easy-peasy.
!
! However, a bunch of MPI Fortran subroutines use LOGICAL dummy
! parameters, and Fortran disallows passing LOGICAL parameters to
! BIND(C) routines (because the .TRUE. and .FALSE. values are not
! standardized (!)). Hence, for these
! subroutines-with-LOGICAL-params, we have to be creative on how to
! invoke the back-end ompi_*_f() C function. There are 2 cases:
! 1. If the Fortran interface has a LOGICAL parameter and no
! TYPE(MPI_Status) parameter, the individual wrapper implementation
! files (e.g., finalized_f08.F90) use the "mpi" module to get a
! interface for the subroutine and call the PMPI_* name of the
! function, which then invokes the corresponding function in the
! ompi/mpi/fortran/mpif-h directory.
!
! This is a bit of a hack: the "mpi" module will provide the right
! Fortran interface so that the compiler can verify that we're passing
! the right types (e.g., convert MPI handles from comm to
! comm%MPI_VAL). But here's the hack part: when we pass *unbounded
! arrays* of handles (e.g., the sendtypes and recvtypes arrays
! MPI_Alltoallw), we declare that the corresponding ompi_*_f()
! subroutine takes a *scalar*, and then we pass sendtypes(0)%MPI_VAL.
!
! >>>THIS IS A LIE!<<< We're passing a scalar to something that
! expects an array.
!
! However, remember that Fortran passes by reference. So the compiler
! will pass a pointer to sendtypes(0)%MPI_VAL (i.e., the first integer
! in the array). And since the mpi_f08 handles were cleverly designed
! to be exactly equivalent to a single INTEGER, an array of mpi_f08
! handles is exactly equivalent to an array of INTEGERS. So passing
! an address to the first MPI_VAL is exactly the same as passing an
! array of INTEGERS.
!
! Specifically: the back-end C function (in *.c files in
! ompi/mpi/fortran/mpif-h) gets an (MPI_Fint*), and it's all good.
!
! The key here is that there is a disconnect between Fortran and C:
! we're telling the Fortran compiler what the C interface is, and
! we're lying. But just a little bit. No one gets hurt.
!
! Yes, this is a total hack. But Craig Rasumussen tells me that this
! is actually quite a common hack in the Fortran developer world, so
! we shouldn't feel bad about using it. Shrug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I didn't read the comments, my bad. Declaration removed and comment updated in 3d49403

@EmmanuelBRELLE EmmanuelBRELLE force-pushed the FIX-9534-Info-in-partitioned branch from a4d4e2f to 3d49403 Compare October 18, 2021 08:09
@jsquyres
Copy link
Member

@EmmanuelBRELLE Can you add a comment to use-mpi-f08/parrived_f08.F90 similar to the one in use-mpi-f08/finalized_f08.F90 so that Future People don't end up having this conversation again?

@EmmanuelBRELLE EmmanuelBRELLE force-pushed the FIX-9534-Info-in-partitioned branch from 3d49403 to cfe0906 Compare October 19, 2021 08:00
@EmmanuelBRELLE
Copy link
Contributor Author

@EmmanuelBRELLE Can you add a comment to use-mpi-f08/parrived_f08.F90 similar to the one in use-mpi-f08/finalized_f08.F90 so that Future People don't end up having this conversation again?

done in cfe0906

@jsquyres jsquyres merged commit 6611a63 into open-mpi:master Oct 19, 2021
@jsquyres
Copy link
Member

@EmmanuelBRELLE Thanks! Can you cherry pick to v5.0.x?

@EmmanuelBRELLE
Copy link
Contributor Author

@EmmanuelBRELLE Thanks! Can you cherry pick to v5.0.x?
None of the work for partitioned communications (for example ompi/part/persist) has been move to v5.0.x yet.
This patch cannot be applied. I guess commit such as d79e13d (pushed by @mdosanjh) and 91c4c29 (that you pushed) must be cherry-picked first.

@EmmanuelBRELLE EmmanuelBRELLE deleted the FIX-9534-Info-in-partitioned branch February 11, 2022 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI_Precv_init() and MPI_Psend_init() - Missing info parameter
5 participants