Skip to content

need a way to disable REAL16 support with configure #8616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeffhammond opened this issue Mar 15, 2021 · 33 comments
Closed

need a way to disable REAL16 support with configure #8616

jeffhammond opened this issue Mar 15, 2021 · 33 comments

Comments

@jeffhammond
Copy link
Contributor

jeffhammond commented Mar 15, 2021

I am trying to build e604107 on an AArch64 system with NVHPC Fortran 21.2-0.

It appears that REAL16 support in NVFortran isn't what Open-MPI is looking for, but I do not care about REAL16 support and would prefer to just not have it in the build.

I see this is not a new problem (#63) but cannot figure out how to disable REAL16 now.

$ make V=1 VERBOSE=1
/bin/sh ../../../../../libtool  --tag=FC   --mode=compile nvfortran   -c -o psizeof_f.lo  psizeof_f.f90
libtool: compile:  nvfortran -c psizeof_f.f90  -o .libs/psizeof_f.o
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 641)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_scalar
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 651)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r1
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 661)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r2
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 671)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r3
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 681)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r4
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 691)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r5
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 701)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r6
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 711)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r7
make: *** [Makefile:2369: psizeof_f.lo] Error 1
$ grep -i real16 psizeof_f.f90
      SUBROUTINE PMPI_Sizeof_real16_scalar(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16)::x
      END SUBROUTINE PMPI_Sizeof_real16_scalar
      SUBROUTINE PMPI_Sizeof_real16_r1(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r1
      SUBROUTINE PMPI_Sizeof_real16_r2(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r2
      SUBROUTINE PMPI_Sizeof_real16_r3(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r3
      SUBROUTINE PMPI_Sizeof_real16_r4(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,1,1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r4
      SUBROUTINE PMPI_Sizeof_real16_r5(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,1,1,1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r5
      SUBROUTINE PMPI_Sizeof_real16_r6(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,1,1,1,1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r6
      SUBROUTINE PMPI_Sizeof_real16_r7(x, size, ierror)
        USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
      REAL(REAL16), DIMENSION(1,1,1,1,1,1,*)::x
      END SUBROUTINE PMPI_Sizeof_real16_r7

Reproduce

You need to use GCC for C/C++ and NVIDIA Fortran to reproduce this.

$ ../configure CC=gcc CXX=g++ FC=nvfortran  --disable-man-pages  && make -j100
commit 1da3f93c7a0b8acff53637a3510cb745e71e1b16 (HEAD -> master, origin/master, origin/HEAD)
Merge: 23833935f3 a2cd6a95e2
Author: Jeff Squyres <[email protected]>
Date:   Tue Mar 16 10:26:37 2021 -0400

    Merge pull request #8626 from mkurnosov/scatter-bmtree-fix-tmpbuf
    
    coll/base: reduce memory consumption in Scatter
@jeffhammond
Copy link
Contributor Author

I will add that this Fortran compiler does not support REAL*16 (https://forums.developer.nvidia.com/t/real-16/134046) and configure knows this, so I am confused why it's trying to compile such code anyways.

configure:52398: WARNING: *** Fortran REAL*16 does not have expected size!
configure:52400: WARNING: *** Expected 16, got 4
configure:52402: WARNING: *** Disabling MPI support for Fortran REAL*16
configure:53055: checking for C type matching bit representation of REAL*16
configure:53683: result: skipped (no REAL*16)
configure:53693: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled

@bosilca
Copy link
Member

bosilca commented Mar 15, 2021

Definitively not new, but your reference dates from 2014 and it related to a different topic (the MPI_Op not being correctly implemented and not the type support itself).

That put aside, the F90 code is autogenerated (gen-mpi-sizeof.pl) and the REAL16 part should not be inside. I tried on few machines without being able to replicate, but I do not have access to an AArch64 with the NVidia compilers. Could this file be a leftover from a prior build ? If not can you remove ompi/mpi/fortran/mpif-h/sizeof_f.f90 in build directory, and do a make V=1. We would need to know how gen-mpi-sizeof.pl is invoked.

@jeffhammond
Copy link
Contributor Author

This is a brand new build on a system I started using today so it's clean.

NVHPC compilers are free. You can download and install using the shell commands provided on https://developer.nvidia.com/nvidia-hpc-sdk-downloads. You do not need GPUs attached to use them.

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Mar 15, 2021

Is REAL16 even standard Fortran? The 16-byte REAL is REAL128 not REAL16.

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Mar 15, 2021

The problem is that real2 is enabled, if that is the thing that translates to REAL16.

make[3]: Entering directory '~/MPI/ompi-github/build-gcc-nvf-arm/ompi/include'
../../../ompi/mpi/fortran/base/gen-mpi-sizeof.pl \
    --header=mpif-sizeof.h --ierror=mandatory \
    --maxrank=7 \
    --generate=1 \
    --real2=1 \
    --real16=0 \
    --complex4=0 \
    --complex32=0
../../../ompi/mpi/fortran/base/gen-mpi-mangling.pl \
    --caps 0 \
    --plain 0 \
    --single 1 \
    --double 0
ln -s ../../../opal/include/opal/opal_portable_platform.h mpi_portable_platform.h

@jeffhammond
Copy link
Contributor Author

As #6205 appears to be the cause here, maybe @kawashima-fj can take a look. I just need a way to disable real2 support completely in Fortran. If --enable-alt-short-float does that, it is not clear from the description configure --help provides.

@bosilca
Copy link
Member

bosilca commented Mar 15, 2021

There seems to be a real issue with the generator, as in the command line you provided the real16 is clearly turned off, so the code should not have been generated (independently of the real2 support).

@jeffhammond
Copy link
Contributor Author

REAL16 here means 16-bit real not 16-byte real. The problem is that Open-MPI is enabling 2-byte floats when the NVF compiler doesn't support them, because they are not standard Fortran.

@bosilca
Copy link
Member

bosilca commented Mar 15, 2021

You're right, short float enables the float16. What is configure saying about the support of short float ?

@ggouaillardet
Copy link
Contributor

@jeffhammond can you give this a try

configure --enable-mpi-ext=affinity,cuda,pcollreq ...

(the default is --enable-mpi-ext=all feel free not to include some of these extensions is they cause any issue on your platform)

@jeffhammond
Copy link
Contributor Author

Having extensions that require non-standard Fortran language features enabled by default seems like a bad decision. How many Fortran compilers support 2-byte floats?

@ggouaillardet
Copy link
Contributor

First things first: did this help?

At first glance, it seems the configury logic of this extension fails to disqualify it (or at least the Fortran extension),
so that sounds like a bug that will be fixed.
(fwiw, installing nvidia compilers require root access - i currently do not have, so I am trying to setup a Linux VM on M1 ...)

@jeffhammond
Copy link
Contributor Author

I install NVHPV compilers without root every day.

  1. wget the tar
  2. tar -xaf the tar
  3. ./install
  4. tell it to install in $HOME/someplace

That's all.

I'll test configure later. Had to be AFK before I could try it.

@ggouaillardet
Copy link
Contributor

thanks for the info! nvidia website says to sudo install, so I simply assumed root was needed ...

@ggouaillardet
Copy link
Contributor

it seems there is something wrong with our logic:

we only test Fortran REAL*2 and if it works, assumes that
USE, INTRINSIC :: iso_fortran_env, ONLY: REAL16
will work too.

with nvfortran, the former is supported, but the latter is not, and compilation fails ...

@bosilca
Copy link
Member

bosilca commented Mar 16, 2021

It will not be easy to change this, because this would force us to change the meaning of our defines and REAL16 is already used for REAL*16. But if we want to have this change in, we should amend it globally before the 5.0.

@jeffhammond
Copy link
Contributor Author

I would argue that you should stop using iso_fortran_env, ONLY: REAL16 since that is not a thing (it's not a standard thing, at least) and implement 16-bit float support in Fortran using REAL*2, since that appears to be more widely supported, even if none of the REAL*N types are standard Fortran.

@bosilca
Copy link
Member

bosilca commented Mar 16, 2021

@jeffhammond until we come up with a proper patch you can just use:

diff --git a/ompi/mpi/fortran/mpif-h/Makefile.am b/ompi/mpi/fortran/mpif-h/Makefile.am
index b8225bf889..7d03514767 100644
--- a/ompi/mpi/fortran/mpif-h/Makefile.am
+++ b/ompi/mpi/fortran/mpif-h/Makefile.am
@@ -121,7 +121,7 @@ sizeof_f.f90:
            --impl=$@ --ierror=mandatory --mpi \
            --maxrank=$(OMPI_FORTRAN_MAX_ARRAY_RANK) \
            --generate=$(OMPI_FORTRAN_BUILD_SIZEOF) \
-           --real2=$(OMPI_HAVE_FORTRAN_REAL2) \
+           --real2=0 \
            --real16=$(OMPI_HAVE_FORTRAN_REAL16) \
            --complex4=$(OMPI_HAVE_FORTRAN_COMPLEX4) \
            --complex32=$(OMPI_HAVE_FORTRAN_COMPLEX32)

@jeffhammond
Copy link
Contributor Author

Yeah I did that already in all the Fortran interface directories.

@jeffhammond
Copy link
Contributor Author

@ggouaillardet regarding --enable-mpi-ext=, it would be nice to know what all is doing. Right now, I have no way to know that anything related to 16-bit floats is enabled automatically.

$ ../configure --help | grep "enable-mpi-ext" -A3
  --enable-mpi-ext=LIST   Comma-separated list of extensions that should be
                          built. Possible values: ompi_mpiext_list. Example:
                          "--enable-mpi-ext=foo,bar" will enable building the
                          MPI extensions "foo" and "bar". If LIST is empty or
                          the special value "all", then all available MPI
                          extensions will be built (default: all).

@jeffhammond
Copy link
Contributor Author

@ggouaillardet disabling extensions is not a solution here...

$ git clean -dfx ; ../configure CC=gcc CXX=g++ FC=nvfortran --enable-mpi-ext=none --disable-man-pages && make -j100
...
  FC       psizeof_f.lo
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 641)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_scalar
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 651)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r1
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 661)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r2
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 671)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r3
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 681)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r4
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 691)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r5
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 701)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r6
NVFORTRAN-S-0087-Non-constant expression where constant expression required (psizeof_f.f90: 711)
  0 inform,   0 warnings,   1 severes, 0 fatal for pmpi_sizeof_real16_r7
make[3]: *** [Makefile:2369: psizeof_f.lo] Error 1
make[3]: Leaving directory '/tmp/ompi/build/ompi/mpi/fortran/mpif-h/profile'
make[2]: *** [Makefile:3139: all-recursive] Error 1
make[2]: Leaving directory '/tmp/ompi/build/ompi/mpi/fortran/mpif-h'
make[1]: *** [Makefile:2674: all-recursive] Error 1
make[1]: Leaving directory '/tmp/ompi/build/ompi'
make: *** [Makefile:1434: all-recursive] Error 1

@jeffhammond
Copy link
Contributor Author

@ggouaillardet The NVHPC download page has been updated to make it clear that sudo is only required when necessary to write to the specified directory.

@ggouaillardet
Copy link
Contributor

@jeffhammond a patchless workaround is to configure --enable-alt-short-float=long ...
(so configure will find that the size of Fortran (2) and C (8) types differ and will set OMPI_HAVE_FORTRAN_REAL2=0 and hence completely disable short float support.

@bosilca @kawashima-fj There are two issues here:

  • there is no trivial way to unconditionnaly disable short float
  • Open MPI uses REAL16 from ISO_FORTRAN_ENV, and though I can name one compiler that does support it, I did not see that in the standard (and have asked Dr Fortran to comment on that), so this is something we will likely have to change.

@bosilca
Copy link
Member

bosilca commented Mar 17, 2021

@ggouaillardet I can name a second compiler that supports it despite not being a standardized type. I can see 2 solutions:

  1. We could name it conditional which would force us to check for both REAL*2 and iso:REAL16 types and generate the code accordingly. Unfortunately, this would create a naming clash in the code base, because up to now everything with REAL16 was indeed talking about 16 bytes. We can name it iso_real16.
  2. We don't generate code for iso:REAL16 and instead we generate it for REAL*2 (which we do not do right now).

@ggouaillardet
Copy link
Contributor

@bosilca this non standard iso:REAL16 thing is only impacting MPI_Sizeof().
Besides some of us thinking this is a controversial subroutine, I am not even sure the standard mandates us to support a non standard (if I understand correctly) REAL*2 argument.

So a third option is to simply get rid of short float in the Fortran MPI_Sizeof() subroutines.
In that case, the patch is pretty trivial

diff --git a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
index 194e7b0e7f..3670744967 100755
--- a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
+++ b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
@@ -153,7 +153,7 @@ sub generate {
 for my $size (qw/8 16 32 64/) {
     queue_sub("integer(int${size})", "int${size}", "int${size}");
 }
-for my $size (qw/16 32 64 128/) {
+for my $size (qw/32 64 128/) {
     if (!($size == 16 && $mpi_real2 == 0) &&
         !($size == 128 &&$mpi_real16 == 0)) {
         queue_sub("real(real${size})", "real${size}", "real${size}");

and unless we want to keep the dead code since iso:REAL16 might be a thing at some point in time, also apply

diff --git a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
index 3670744967..7e6ef55229 100755
--- a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
+++ b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
@@ -154,12 +154,10 @@ for my $size (qw/8 16 32 64/) {
     queue_sub("integer(int${size})", "int${size}", "int${size}");
 }
 for my $size (qw/32 64 128/) {
-    if (!($size == 16 && $mpi_real2 == 0) &&
-        !($size == 128 &&$mpi_real16 == 0)) {
+    if (!($size == 128 &&$mpi_real16 == 0)) {
         queue_sub("real(real${size})", "real${size}", "real${size}");
     }
-    if (!($size == 16 && $mpi_complex4 == 0) &&
-        !($size == 128 && $mpi_complex32 == 0)) {
+    if (!($size == 128 && $mpi_complex32 == 0)) {
         queue_sub("complex(real${size})", "complex${size}", "real${size}");
     }
 }

I tested these patches with nvhpc compilers, and it worked out of the box (read the mpiext/shortfloat extension was successfully built)

@jeffhammond
Copy link
Contributor Author

I think you want to support REAL*2 in MPI_SIZEOF until we can delete that unnecessary procedure. The REAL16 stuff should only exist if the compiler supports it.

@bosilca
Copy link
Member

bosilca commented Mar 17, 2021

Right, it might be simpler to drop iso:real16 from sizeof. But then we should replace iso:real16 with REAL*2 in sizeof as suggested by @jeffhammond

@ggouaillardet
Copy link
Contributor

a PR is coming, I took the long road and will use iso:real16 if available, and use real*2 and complex*4 otherwise.

@bosilca
Copy link
Member

bosilca commented Mar 17, 2021

I was thinking about this, but the fortran documentation is so scattered and incomplete that I had a hard time understanding the difference between the two types. However some of the documentation I found (mostly GNU, Intel and Oracle) seem to indicate that the two types iso:real16 and REAL*2 do not have to be equivalent, the former being an optional 16 bits floating point representation while the latter real of at least 2 bytes.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 17, 2021
Though this has not landed into the Fortran standard,
some compilers support REAL16 in the ISO_FORTRAN_ENV
module.
Use it when available, and fallback to REAL*2
and COMPLEX*4 otherwise.

Thanks Jeff Hammond for reporting this issue

Ref. open-mpi#8616

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor

In that case I can do both REAL*2 and iso:real16 when supported by the compiler.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 18, 2021
Though this has not landed into the Fortran standard,
some compilers support REAL16 in the ISO_FORTRAN_ENV
module.
Use it when available, and fallback to REAL*2
and COMPLEX*4 otherwise.

Thanks Jeff Hammond for reporting this issue

Ref. open-mpi#8616

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 18, 2021
Though this has not landed into the Fortran standard,
some compilers support REAL16 in the ISO_FORTRAN_ENV
module.
Use it when available, and fallback to REAL*2
and COMPLEX*4 otherwise.

Thanks Jeff Hammond for reporting this issue

Ref. open-mpi#8616

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Mar 19, 2021
Though this has not landed into the Fortran standard,
some compilers support REAL16 in the ISO_FORTRAN_ENV
module.
Use it when available, and fallback to REAL*2
and COMPLEX*4 otherwise.

Thanks Jeff Hammond for reporting this issue

Ref. open-mpi#8616

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit 3a6fc24)
@ggouaillardet
Copy link
Contributor

@jeffhammond Fixed in master via #8637

As a side note, feel free to ask the nvhpc compiler team to support the non standard (yet ?) REAL16 in ISO_FORTRAN_ENV (standards are good, but so is symmetry :-))

@awlauria
Copy link
Contributor

Merged into v5.0.x here:
#8653

@awlauria
Copy link
Contributor

Closing. Unless this is desired on the v4 series, in which case please reopen and do the appropriate cherry-picks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants