Add FP16 datatypes #6205

kawashima-fj · 2018-12-19T08:15:53Z

This PR adds the following datatypes for FP16 (half-precision floating point) if a compiler supports the corresponding types.

MPIX_SHORT_FLOAT for C/C++ short float and _Float16
MPIX_C_SHORT_FLOAT_COMPLEX for C short float _Complex
MPIX_CXX_SHORT_FLOAT_COMPLEX for C++ std::complex<short float>
MPI_REAL2 for Fortran REAL*2 (and REAL(kind=2))
MPI_COMPLEX4 for Fortran COMPLEX*4 (and COMPLEX(kind=2))

Datatypes with MPIX_ prefix are available through the MPI extension.

short float is proposed in the C WG and the C++ WG in ISO/IEC.

The background is described in a issue and a slide in the MPI Forum.

MPICH has MPIX_C_FLOAT16. @artpol84 and I are talking with MPI guys to use a same name.

This PR is still WIP. I want comments. The following features will be implemented soon.

MPI_SIZEOF
MPI_MATCH_SIZE
MPIX_SHORT_FLOAT in the mpi_f08_ext module

@bosilca Do you have any comments?

@artpol84 @Sergei-Lebedev Could you add your HCOLL support commit to this PR (with your Signed-off-by)?

If there are no problems, I want to merge this PR next month.

ompi/datatype/ompi_datatype_internal.h

kawashima-fj · 2018-12-23T07:31:36Z

@bosilca @jsquyres Thanks for review, but I believe this PR does not break ABI for MPI programs (mpi.h, mpif.h, mpi.mod, mpi_f08.mod, ...).

OMPI_DATATYPE_MPI_* macros, which I changed values of, are used as values of ompi_datatype_t::id and indices of the following arrays.

The value of ompi_datatype_t::id is not exposed to MPI programs.

ompi_datatype_t::d_f_to_c_index, which you concern, is set in the MOOG macro and
the ompi/include/mpif-values.pl file. I didn't change existing values and only added new values.

I mentioned the ABI compatibility issue in my commit messages.

On the other side, this PR breaks ABI for MCA components (configure --devel-headers). Do we care about it?

I removed the ABI break label. If I miss something, please let me know and readd the label.

bosilca · 2018-12-26T04:34:47Z

@kawashima-fj I thought that the OMPI_DATATYPE_MPI_ values must be in sync with the handles in the mpif-values.pl. I might be wrong, but I wanted a second pair of eyes on this before we break the Fortran layer.

kawashima-fj · 2018-12-26T08:44:01Z

@bosilca OMPI_DATATYPE_MPI_* and mpif-values.pl don't have same values in master though mpif-values.pl and MOOG have same values.

In any case, I'll revert changes of values of OMPI_DATATYPE_MPI_* if you and/or community desires.

kawashima-fj · 2018-12-27T09:10:33Z

Now remaining features are implemented except the mpi-f08-ext bindings, which requires #6210.

If there are no problems, I want to merge this PR in mid-Jan.

kawashima-fj · 2019-01-28T01:11:20Z

bot:ompi:retest

kawashima-fj · 2019-01-30T04:09:51Z

I have completed my tests. I'll merge this PR in this week unless someone has any negative comment.

jsquyres

This is a really nice piece of work. Excellent sequence of commits; thank you for breaking it down!

That being said, I echo the concerns discussed in the MPI Forum in Dec 2018: we're basing this off C types that do not yet exist (and may never exist). That's probably ok from the "MPIX" point of view, but this is a ton of code that may get ripped out someday if short float (and friends) and/or MPI_SHORT_FLOAT (and friends) ultimately do not come to fruition.

It would be one thing if this was entirely an MPI extension, but the vast majority of the code is outside of ompi/mpiext because it has to integrate with the datatype and op infrastructure. That gives me a little pause.

I don't have a strong objection to this, especially since some vendors obviously see some benefit from this (and I assume have customers who want it?). But it does... give me pause.

configure.ac

ompi/datatype/ompi_datatype_internal.h

config/opal_check_alt_short_float.m4

ompi/datatype/ompi_datatype_internal.h

ompi/datatype/ompi_datatype_module.c

ompi/mca/coll/portals4/coll_portals4_component.c

ompi/op/op.c

.gitignore

ompi/mpiext/shortfloat/README.txt

ompi/mpiext/shortfloat/c/mpiext_shortfloat_c.h.in

kawashima-fj · 2019-01-31T11:19:28Z

@jsquyres Thanks a lot for your review! I added comments to your non-trivial reviews. For trivial ones, I agree with you and I'll update the code.

I also attended the MPI Forum meeting in Dec 2018 from Japan via WebEx. I also have the same concern but at least Fujitsu and Mellanox need FP16 support in Open MPI. I'll delay merge and want to hear OMPI developer's opinions.

jsquyres · 2019-01-31T11:32:56Z

I think the only thing meaningful thing left from my review was the configure test update. Easily fixed.

Let's put a timeout on getting comments back from other OMPI developers (e.g., about whether we want to add all this code for a type that is not yet standardized) -- this PR has waited quite a long time, mainly because devs [like me] took forever to look at the details. If there's no deadline, people get caught up in other work and miss PR's like this.

kawashima-fj · 2019-01-31T11:53:00Z

I propose Feb. 7th for the deadline. OK?

All, could you comment if you have opinions? I am about to merge FP16 (half precision floating point) datatype support. Corresponding C/C++ types are not yet standardized but they are proposed in ISO/IEC WGs. The background is described in a issue and a slide in the MPI Forum. Links to related pages are listed in my page.

jsquyres · 2019-01-31T12:03:06Z

I forwarded your note to the devel mailing list.

One thing I forgot to ask: what is MPICH doing in terms of MPIX_ for half precision? If possible, it would be nice if our MPIX_ names/meanings could be the same as theirs.

kawashima-fj · 2019-01-31T12:11:03Z

@jsquyres Thanks. I should have mailed to devel list, not only GitHub.
MPICH has MPIX_C_FLOAT16 for C _Float16. It is compatible with this PR.

`MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <[email protected]>

Signed-off-by: KAWASHIMA Takahiro <[email protected]>

`short float` support of the Intel C++ Compiler (group of C and C++ compilers), at least versions 18.0 and 19.0, is half-baked. It can compile declarations of `short float` variables and expressions of `sizeof(short float)` but cannot compile operations of `short float` variables. In this situation, `AC_CHECK_TYPES(short float)` defines `HAVE_SHORT_FLOAT` as 1 and compilation errors occur in `ompi/mca/op/base/op_base_functions.c`. To avoid this error tentatively, we disable `short float` support when using the Intel C++ Compiler. Signed-off-by: KAWASHIMA Takahiro <[email protected]>

kawashima-fj · 2019-02-01T08:06:57Z

I updated the PR to reflect @jsquyres's review.

jsquyres

As mentioned above, I have minor reservations about basing a big chunk of infrastructure on C/C++ datatypes that are not yet standardized. That being said, I'm still overall in favor of this PR.

kawashima-fj · 2019-02-08T05:52:00Z

Ok, nobody else has a comment. I understand OMPI developers have no negative comments other than @jsquyres's one or don't care. Two developers approved the PR. So I'll merge the PR.

artpol84 · 2019-02-11T15:34:32Z

@kawashima-fj Thanks for the great work!

artpol84 · 2019-02-11T15:36:10Z

@raffenet FYI

ggouaillardet · 2019-02-22T11:15:51Z

@kawashima-fj this PR broke Open MPI compilation on OS-X

the root cause is that there is no object file in ompi/mpiext/shortfloat/* and OSX refuses to create an empty archive (linux has no issue with that)

$ make V=1
Making all in c
/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc-8  -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror-implicit-function-declaration -finline-functions -fno-strict-aliasing -mcx16  -module -avoid-version -Wl,-flat_namespace  -o libmpiext_shortfloat_c.la    -lz 
libtool: link: ar cru .libs/libmpiext_shortfloat_c.a 
ar: no archive members specified
usage:  ar -d [-TLsv] archive file ...
	ar -m [-TLsv] archive file ...
	ar -m [-abiTLsv] position archive file ...
	ar -p [-TLsv] archive [file ...]
	ar -q [-cTLsv] archive file ...
	ar -r [-cuTLsv] archive file ...
	ar -r [-abciuTLsv] position archive file ...
	ar -t [-TLsv] archive [file ...]
	ar -x [-ouTLsv] archive [file ...]
make[1]: *** [libmpiext_shortfloat_c.la] Error 1
make: *** [all-recursive] Error 1

@jsquyres do you know an elegant way for fixing this ?

jsquyres · 2019-02-22T16:17:21Z

I'm looking at https://github.com/open-mpi/ompi/blob/master/ompi/mpiext/shortfloat/c/Makefile.am and I don't see any .c files listed. Is that correct?

Is that .la file there solely because the the mpiext system requires a .la file?

kawashima-fj · 2019-02-25T01:52:34Z

@jsquyres Yes. The extension is required only for header and module files but the mpiext system requires .la files.

The OMPI_EXT_MAKE_LISTS macro in config/ompi_ext.m4 adds ompi/mpiext/COMPONENT/BINDING/libmpiext_COMPONENT_BINDING.la to the list of the OMPI_MPIEXT_C_LIBS output variable and ompi/Makefile.am uses the output variable. Removing .la from ompi/mpiext/shortfloat/c/Makefile.am causes the following make error.

make[2]: Entering directory '/home/tkawa/src/openmpi-master/build/ompi'
make[2]: *** No rule to make target '../ompi/mpiext/shortfloat/c/libmpiext_shortfloat_c.la', needed by 'libmpi.la'.  Stop.

I could not find an elegant way. The pcollreq extension (for MPIX_-prefixed persisistent collectives) has a dummy function. I can take the same way in this extension to work around the error.

@ggouaillardet This extension is not built and the error does not occur unless a C FP16 type (_Float16) is usable or it is explicitly enabled by --enable-alt-short-float=.... Did you enable it by --enable-alt-short-float=...?

ggouaillardet · 2019-02-25T02:09:42Z

@kawashima-fj I was using OS X Mojave (x86 arch) with the default clang compiler LLVM version 10.0.0 (clang-1000.11.45.5)

to my surprise, this compiler does support _Float16 out of the box (fwiw, it does not support short float)

a simple workaround is to add some C files with a dummy global subroutine or global variable.
The right fix is likely not to generate a library if such case, but since this extension is aimed at landing into the main codebase, we might simply want to take the above shortcut for now. Please let me know if you want me to issue a PR for that.

These dummy functions are required for the following reason. - The `libmpiext_shortfloat_{c,mpifh,usempif08}.la` files must be built because the `OMPI_EXT_MAKE_LISTS` macro in the `config/ompi_ext.m4` file adds the files to the lists of the `OMPI_MPIEXT_{C,MPIFH,USEMPIF08}_LIBS` output variables and the following files use the output variable. * `ompi/Makefile.am` * `ompi/mpi/fortran/mpif-h/Makefile.am` * `ompi/mpi/fortran/use-mpi-f08/Makefile.am` - The ar command of OS X refuses to create an archive file which does not contain any object files. The `usempi` binding is not affected because `OMPI_MPIEXT_USEMPIF_LIBS` is not used anywhere by nature. Generally it only includes `mpifh`. See open-mpi#6205 (comment) Signed-off-by: KAWASHIMA Takahiro <[email protected]>

kawashima-fj · 2019-02-25T02:35:55Z

@ggouaillardet Ok, I see. LLVM (Clang) 6 and 7 supports _Float16 even on no-FP16 CPUs. This will be amended in the next LLVM 8.

I've created the shortcut in #6429.

if NOLIB_<component> or NOLIB_<component>_<suffix> is set, do not require ompi/mpiext/<component>/<lang>/libmpiext_<component>_<suffix>.la Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]>

the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]>

Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT macro is defined to 0. See `ompi/mpiext/example/configure.m4`. Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>

the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>

Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT macro is defined to 0. See `ompi/mpiext/example/configure.m4`. Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]> Signed-off-by: Jeff Squyres <[email protected]>

the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>

jladd-mlnx · 2020-06-25T14:42:43Z

@kawashima-fj Congrats on your HPL-AI score 🥇 !! Out of curiosity, did you use this code in your exaflop busting run?

kawashima-fj · 2020-06-26T09:43:23Z

@jladd-mlnx Thank you. We are proud of awards achieved with Open MPI-based Fujitsu MPI. HPL-AI for Fugaku is developed by RIKEN and I don't know the detail. I asked some people in Fujitsu and RIKEN but nobody has the answer. My colleague will contact a developer of Fugaku HPL-AI. When it turns out, I'll share it.

kawashima-fj · 2020-07-07T05:04:29Z

@jladd-mlnx - @Shinji-Sumimoto had contact with developers of Fugaku HPL-AI. They used Fujitsu MPI which is based on this code but did not use this FP16 MPI datatype.

They said, they first tried to communicate FP16 data as unsigned short using MPI because they wanted to compile the same code on no-FP16 machines. Later they rewrote the code to use low-level communication API (uTofu) for communication performance tuning.

jladd-mlnx · 2020-07-07T14:27:55Z

@kawashima-fj , @Shinji-Sumimoto - Thank you very much for your detailed response; it makes perfect sense. Again, congratulations on your HPL and HPL-AI scores.

kawashima-fj added enhancement ⚠️ WIP-DNM! Target: main labels Dec 19, 2018

kawashima-fj self-assigned this Dec 19, 2018

kawashima-fj force-pushed the pr/fp16 branch from 9cd7d54 to 1314516 Compare December 19, 2018 08:29

kawashima-fj mentioned this pull request Dec 20, 2018

Use mpi_f08 module in mpi_f08_ext module #6210

Merged

kawashima-fj force-pushed the pr/fp16 branch 2 times, most recently from 780de96 to 6405ec8 Compare December 20, 2018 15:21

bosilca approved these changes Dec 22, 2018

View reviewed changes

ompi/datatype/ompi_datatype_internal.h Show resolved Hide resolved

jsquyres added 😳 Backward compat break Target: v5.0.x labels Dec 22, 2018

jsquyres added this to the v5.0.0 milestone Dec 22, 2018

kawashima-fj removed the 😳 Backward compat break label Dec 23, 2018

kawashima-fj force-pushed the pr/fp16 branch from 6405ec8 to 1827858 Compare December 23, 2018 17:30

kawashima-fj force-pushed the pr/fp16 branch from 1827858 to 43a046e Compare December 27, 2018 07:47

bwbarrett removed the Target: v5.0.x label Jan 7, 2019

kawashima-fj force-pushed the pr/fp16 branch from 43a046e to 0cf6afc Compare January 23, 2019 06:20

kawashima-fj removed the ⚠️ WIP-DNM! label Jan 24, 2019

jsquyres reviewed Jan 30, 2019

View reviewed changes

kawashima-fj added 3 commits February 1, 2019 14:55

README: Add description of shortfloat MPI extension

9b54967

Signed-off-by: KAWASHIMA Takahiro <[email protected]>

kawashima-fj force-pushed the pr/fp16 branch from cfcd167 to ef4c47d Compare February 1, 2019 06:03

jsquyres approved these changes Feb 1, 2019

View reviewed changes

kawashima-fj merged commit 8bbd201 into open-mpi:master Feb 8, 2019

kawashima-fj mentioned this pull request Feb 15, 2019

Missing C declaration for MPI_REAL2 and implementation for MPI_COMPLEX4 #2653

Closed

kawashima-fj deleted the pr/fp16 branch February 15, 2019 01:26

kawashima-fj mentioned this pull request Feb 25, 2019

mpiext/shortfloat: Work around empty archives #6429

Closed

jeffhammond mentioned this pull request Mar 15, 2021

need a way to disable REAL16 support with configure #8616

Closed

Add FP16 datatypes #6205

Add FP16 datatypes #6205

Uh oh!

Conversation

kawashima-fj commented Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kawashima-fj commented Dec 23, 2018

Uh oh!

bosilca commented Dec 26, 2018

Uh oh!

kawashima-fj commented Dec 26, 2018

Uh oh!

kawashima-fj commented Dec 27, 2018

Uh oh!

kawashima-fj commented Jan 28, 2019

Uh oh!

kawashima-fj commented Jan 30, 2019

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kawashima-fj commented Jan 31, 2019

Uh oh!

jsquyres commented Jan 31, 2019

Uh oh!

kawashima-fj commented Jan 31, 2019

Uh oh!

jsquyres commented Jan 31, 2019

Uh oh!

kawashima-fj commented Jan 31, 2019 • edited by jsquyres Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kawashima-fj commented Feb 1, 2019

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

kawashima-fj commented Feb 8, 2019

Uh oh!

artpol84 commented Feb 11, 2019

Uh oh!

artpol84 commented Feb 11, 2019

Uh oh!

ggouaillardet commented Feb 22, 2019

Uh oh!

jsquyres commented Feb 22, 2019

Uh oh!

kawashima-fj commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggouaillardet commented Feb 25, 2019

Uh oh!

kawashima-fj commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jladd-mlnx commented Jun 25, 2020

Uh oh!

kawashima-fj commented Jun 26, 2020

Uh oh!

kawashima-fj commented Jul 7, 2020

Uh oh!

jladd-mlnx commented Jul 7, 2020

Uh oh!

Uh oh!

kawashima-fj commented Dec 19, 2018 •

edited

Loading

kawashima-fj commented Jan 31, 2019 •

edited by jsquyres

Loading

kawashima-fj commented Feb 25, 2019 •

edited

Loading

kawashima-fj commented Feb 25, 2019 •

edited

Loading