-
Notifications
You must be signed in to change notification settings - Fork 900
ompi_fortran_check_ignore_tkr.m4: fix fortran test errors #9812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ThemosTsikas Per your most recent config.log (#9795 (comment)), I see the "use...only" test failed. Can you tell me if our test is invalid, or if it is supposed to work in Fortran?
The
But if I compile these with gfortran 10.2.0, I get success:
I'm not trying to say "gfortran is right!" I'm just illustrating that I thought this test was correct because the GNU, Intel, and Absoft Fortran compilers pass this test. I am absolutely not a Fortran expert. Can you provide some clarity? |
Thank you Jeff, I am going to take it to Malcolm Cohen and it would help if I could understand what this test is trying to achieve. It doesn't seem to check that the intended semantics are respected. Would it help if I included here the Note from the Fortran Standard regarding C interoperability of global variables and common blocks?
The rules under "Interoperation with C global variables" (18.9) include
That second paragraph makes me suspect that this is why our compiler rejects the code but I need to look closer. For now, I need to understand what is actually meant to be achieved by this coding, as there may be a better way to express it. |
The test is just trying to check that I'm not entirely sure why I chose to use common blocks for this test; we don't use common blocks anywhere in the Open MPI Fortran code (these tests were written years ago). I suppose that I could just have something like: MODULE aaa
INTEGER :: common1
REAL :: common2
INTEGER :: global_aaa
END MODULE aaa MODULE bbb
INTEGER :: common1
COMPLEX :: common2
INTEGER :: global_bbb
END MODULE bbb I.e., don't use I'm really not sure why I used |
Perhaps the way forward is to relax the test to not use binding labels or COMMON. These are "global entities" in Fortran and the following rules apply to them. I am inclined to think that your existing test violates them in the use of the name cmmon_ as can be seen by adding the line
|
Fix bugs in the Fortran test code: 1. Called with "type(*)" instead of "type(*), dimension(*)" 2. Subroutine type was "real" when it should have been "complex" 3. Don't use a common block + BIND in the "use...only" test. Per open-mpi#9812 (comment), this seems to violate the Fortran standard. Instead, just use conflicting variables in the aaa and bbb modules, and trust "use...only" to avoid the conflicts. Signed-off-by: Jeff Squyres <[email protected]>
dfce12f
to
b784f3f
Compare
@ThemosTsikas Ok, I updated this PR and pushed a new unofficial tarball made from this PR: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812-2.tar.bz2. Could you give it a try? |
Post pr9812-2 config.log and nagdbg* files here. Progress but not success. |
I think LOC needs to be C_LOC and pointer arithmetic must be done on the C side, to comply with standard Fortran. |
@ThemosTsikas Can you test if you can compile these two files together with your compiler into an executable, and running that executable produces a module alignment_mod
type, BIND(C) :: test_mpi_handle
integer :: MPI_VAL
end type test_mpi_handle
type(test_mpi_handle), target :: t1
type(test_mpi_handle), target :: t2
end module alignment_mod
program falignment
use alignment_mod
external align
call align(t1, t2)
end program falignment #include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
void align_(char *t1, char *t2)
{
FILE *fp = fopen("conftestval", "w");
if (!fp) exit(1);
ptrdiff_t x;
if (t1 > t2) {
x = t1 - t2;
} else {
x = t2 - t1;
}
fprintf(fp, "%d\n", (int) x);
fclose(fp);
} |
@jsquyres I just caught up on all my on-leave email and: ❯ nagfor -V
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
Product NPMI671NA for Apple Intel Mac OSX 64-bit
Copyright 1990-2020 The Numerical Algorithms Group Ltd., Oxford, U.K.
❯ clang --version
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
❯ clang -c align.c
❯ nagfor falignment.f90 align.o
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
[NAG Fortran Compiler normal termination]
❯ ./a.out
❯ cat conftestval
4 ETA: I get the same result with gcc-gfortran 11.2 and Intel Fortran 2022.0 with clang. |
Thanks Matthew, I get the same here. |
@ThemosTsikas @mathomp4 Excellent; thanks for testing. I ended up going a slightly different direction, but your test gives me hope that it'll succeed, anyway. Here's a new tarball to test: https://aws.open-mpi.org/~jsquyres/unofficial/openmpi-gitclone-pr9812-3.tar.bz2 |
Much better progress. configure succeeds, enabling use-mpi-f08. But make falls over because bad options were passed to the compiler. Would you like to get a trial licence for the NAG Fortran Compiler so that you can try this yourself? You will be sent a trial key automatically by email. |
I have time this week to look into this because I'm technically on vacation; I got a key and will look at it this afternoon. It would be best if NAG could add Open MPI to its regular testing (Absoft does this, for example). This is how vendor involvement / support typically works in the Open MPI community: those who care (i.e., the vendors themselves) provide the work. In Absoft's case, for example, the run automated testing that reports up to the Open MPI community development database so that we, the Open MPI dev community, know when we have broken something with their compiler. Absoft then helps us debug and fix the issue. I only cite Absoft because they're another Fortran compiler vendor; the same is generally true for all vendors in the Open MPI community (e.g., network and server vendors). |
That is a good idea, I will look into implementing it. |
@ThemosTsikas What version of the NAG compiler are you using? The version I downloaded from https://www.nag.com/content/getting-started-nag-fortran-compiler is:
When I run through Open MPI's
The users mailing list specifically mentioned the NAG compiler v7.2 as being the first F2008-compliant release (https://www.mail-archive.com/[email protected]/msg34686.html). Do I need a different version/build of NAG? |
There is no 7.2, that was an error. The latest is 7.1 and is (claimed to be) F2008 compliant. You can pick up the latest bug-fixed Builds from http://monet.nag.co.uk/compiler/r71download/. The main website's Build is updated less frequently. Build 7102 should exhibit the behaviour I described. |
@ThemosTsikas Ok, I got the 7102 build and got farther:
Here's the simplest example I could come up with showing (part of) what the MODULE mpi_types
type, BIND(C) :: MPI_Comm
integer :: MPI_VAL
end type MPI_Comm
END MODULE mpi_types
MODULE mpi
use mpi_types
! This is one type of MPI_UNWEIGHTED
integer MPI_UNWEIGHTED(1)
common/mpi_fortran_unweighted/MPI_UNWEIGHTED
interface
subroutine PMPI_Cart_create(old_comm, ndims, dims, periods, reorder, &
comm_cart, ierror)
integer, intent(in) :: old_comm
integer, intent(in) :: ndims
integer, dimension(*), intent(in) :: dims
logical, dimension(*), intent(in) :: periods
logical, intent(in) :: reorder
integer, intent(out) :: comm_cart
integer, intent(out) :: ierror
end subroutine PMPI_Cart_create
end interface
END MODULE mpi
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
MODULE mpi_f08_types
use mpi_types
! This is the other type for MPI_UNWEIGHTED
integer, dimension(1), bind(C, name="mpi_fortran_unweighted_") :: MPI_UNWEIGHTED
END MODULE mpi_f08_types
MODULE mpi_f08_interfaces
interface MPI_Cart_create
subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror)
use :: mpi_f08_types, only : MPI_Comm
implicit none
TYPE(MPI_Comm), INTENT(IN) :: comm_old
INTEGER, INTENT(IN) :: ndims, dims(ndims)
LOGICAL, INTENT(IN) :: periods(ndims), reorder
TYPE(MPI_Comm), INTENT(OUT) :: comm_cart
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
end subroutine MPI_Cart_create_f08
end interface MPI_Cart_create
END MODULE mpi_f08_interfaces
MODULE mpi_f08
use mpi_f08_types
use mpi_f08_interfaces
END MODULE mpi_f08
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine MPI_Cart_create_f08(comm_old,ndims,dims,periods,reorder,comm_cart,ierror)
! This appears to be where the problem arises.
! We use two modules that (ultimately) have conflicting types for
! MPI_UNWEIGHTED.
use :: mpi_f08_types, only : MPI_Comm
use :: mpi, only : PMPI_Cart_create
implicit none
TYPE(MPI_Comm), INTENT(IN) :: comm_old
INTEGER, INTENT(IN) :: ndims
INTEGER, INTENT(IN) :: dims(ndims)
LOGICAL, INTENT(IN) :: periods(ndims), reorder
TYPE(MPI_Comm), INTENT(OUT) :: comm_cart
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
integer :: c_ierror
call PMPI_Cart_create(comm_old%MPI_VAL,ndims,dims,periods,&
reorder,comm_cart%MPI_VAL,c_ierror)
if (present(ierror)) ierror = c_ierror
end subroutine MPI_Cart_create_f08
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! This part is just a sample program so that I can invoke an interface
! from the mpi_f08 module.
PROGRAM test
use mpi_f08
implicit none
type(MPI_Comm) :: comm_a, comm_b
integer :: ndims
integer, dimension(3) :: dims
logical, dimension(3) :: periods
logical :: reorder
ndims = 3
dims(1) = 1
dims(2) = 2
dims(3) = 3
periods(1) = .true.
periods(2) = .true.
periods(3) = .true.
reorder = .true.
call MPI_Cart_create_f08(comm_a, ndims, dims, periods, reorder, comm_b)
END PROGRAM test With the following #include <stdio.h>
#pragma weak PMPI_CART_CREATE = ompi_cart_create_f
#pragma weak pmpi_cart_create = ompi_cart_create_f
#pragma weak pmpi_cart_create_ = ompi_cart_create_f
#pragma weak pmpi_cart_create__ = ompi_cart_create_f
#pragma weak PMPI_Cart_create_f = ompi_cart_create_f
#pragma weak PMPI_Cart_create_f08 = ompi_cart_create_f
typedef int MPI_Fint;
typedef int ompi_fortran_logical_t;
void ompi_cart_create_f(MPI_Fint *old_comm, MPI_Fint *ndims, MPI_Fint *dims,
ompi_fortran_logical_t *periods, ompi_fortran_logical_t *reorder,
MPI_Fint *comm_cart, MPI_Fint *ierr)
{
printf("In ompi_cart_create_f\n");
} I can compile+link the above Fortran and C with gfortran 10.2 and ifort 19.0.4:
But
The error message appears to be incorrect: The error message also does not tell exactly where the conflict occurred; I suspect that it's in I know it's convoluted, but is that valid Fortran? (I really hope so, because we need this...) |
You notice the line? if errors happen subsequent to that, they are errors by the underlying C compiler or linker. In this case it is the C compiler. This is a nagfor bug as we should never generate bad C. I will log it for fixing. |
Sorry, I have just woken up here. Isn’t this the same issue we had earlier at the configure stage, the one about global identifiers? |
Isn't the way forward to not repeat global identifiers but reuse them, like so:
|
And to show the decoupling of names better:
|
You know, I made a statement to you recently that we didn't use common blocks anywhere in OMPI code recently; oops -- I guess that was wrong. But this is an interesting point: I just tested, and gcc 4.8.5 (which is as far back as Open MPI v5.0.x supports) supports: use, intrinsic :: iso_c_binding
integer, dimension(1), bind(C, name="foo") :: MPI_UNWEIGHTED So perhaps we should ditch all of our common blocks and replace them with BIND(C). The common blocks were only so that we could compare (in C code) to know when users passed in sentinel Fortran constants (i.e., the constants themselves, not just equivalent constant values). Seems like this functionality is exactly what BIND(C) is for. Let me go run with that... |
Fingers crossed here. |
Fix bugs in the Fortran test code: 1. Called with "type(*)" instead of "type(*), dimension(*)" 2. Subroutine type was "real" when it should have been "complex" 3. Don't use a common block + BIND in the "use...only" test. Per open-mpi#9812 (comment), this seems to violate the Fortran standard. Instead, just use conflicting variables in the aaa and bbb modules, and trust "use...only" to avoid the conflicts. Signed-off-by: Jeff Squyres <[email protected]>
LOC() is not a universal Fortran function (perhaps it's a GNU extension that at least some Fortran compilers have adopted? It doesn't seem to exist in the NAG compiler, at least). Additionally, doing the pointer arithmetic is not technically valid Fortran. So make the test for mpi_f08 handle alignment be similar to the other alignment tests: make instances of the variable in a common block and use C to compute the alignment. Signed-off-by: Jeff Squyres <[email protected]>
Instead of solely relying on common blocks for the MPI Fortran sentinels (e.g., MPI_BOTTOM), also use a bind(C) statement to bind the back-end symbol name to a specific name. There are two benefits to this: 1. We don't have to have 4 back-end symbols (i.e., CAPS, lower, one_underscore_, and two_underscore__) 2. The mpi_f08 and mpi modules can share the same back-end symbol (because they'll both bind(C) to the same back-end symbol) However, keep definition of MPI_STATUS[ES]_IGNORE in its own separate header files/modules because they are different types in the mpi module vs. the mpi_f08 module. We cannot include the mpi module in the mpi_f08 module, or will get confused by seeing MPI_STATUS[ES]_IGNORE of two different types (even if you "use :: mpi, only : ..." and do not include MPI_STATUS[ES]_IGNORE in the imports list). Fixing point 2 avoided some subtle linker issues (see open-mpi#9812 (comment) for an explanation). Signed-off-by: Jeff Squyres <[email protected]>
The CPPFLAGS may not be understood by the Fortran compiler. Signed-off-by: Jeff Squyres <[email protected]>
Remove some extra "use" imports that weren't actually used (i.e., remove harmless kruft). Signed-off-by: Jeff Squyres <[email protected]>
Signed-off-by: Jeff Squyres <[email protected]>
Signed-off-by: Jeff Squyres <[email protected]>
Two changes that are related to each other: 1. Since Wtick/Wtime are simple interfaces (not wrappers) to the back-end C MPI_Wtick/MPI_Wtime functions, just declare them as bind(C) in the mpi module and then have the mpi_f08 module use just those 2 symbols in the mpi_f08 module (ditto for the PMPI versions). 2. Similar for the special "sentinel" values (e.g., MPI_BOTTOM). No longer generate them via gen-mpi-mangling.pl; just BIND(C) them in the mpi module and then "use" them in the mpi_f08 module. Signed-off-by: Jeff Squyres <[email protected]>
Since mpi_f08 now references the mpi module, we need to provide the compiler flag to find the mpi module in the build tree. Signed-off-by: Jeff Squyres <[email protected]>
The mpi and mpi_f08 versions of MPI_Sizeof have a critical difference: ierr is required in the mpi subroutines, but optional in mpi_f08 subroutines. Make sure these specific subroutines therefore have different names. For the mpi_f08 version, add "_opt" in the specific subroutine name. Signed-off-by: Jeff Squyres <[email protected]>
69eb335
to
4de6c2a
Compare
I spent a bunch of time on this and was unable to bring it to completion. I have pushed the latest commits that I have on this branch, but I stress that they do not work yet. I only pushed here so that the work was not accidentally lost. We have a somewhat complicated scheme for trying to share code between the I think that I am coming to the conclusion that this is too complicated, and should be simplified somehow. I'm not entirely sure what the Right way is to do that, and my window of availability to work on this just got drastically reduced. Here's the current problems with the commits as they currently are on this PR:
That is a correct error message: That being said, |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/1d1543d7bdb4ae684e8714ce5c0c8a6f |
@hppritcha and I chatted on the phone about the MPI_STATUS[ES]_IGNORE issue: it seems like we can just Still need to dig into the nagfor error from above -- there's likely some unintentional using of both |
Fix two bugs in the Fortran test code:
Signed-off-by: Jeff Squyres [email protected]
Refs #9795