-
Notifications
You must be signed in to change notification settings - Fork 902
Add FP16 datatypes #6205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP16 datatypes #6205
Conversation
9cd7d54
to
1314516
Compare
780de96
to
6405ec8
Compare
@bosilca @jsquyres Thanks for review, but I believe this PR does not break ABI for MPI programs (
The value of
I mentioned the ABI compatibility issue in my commit messages. On the other side, this PR breaks ABI for MCA components ( I removed the |
6405ec8
to
1827858
Compare
@kawashima-fj I thought that the OMPI_DATATYPE_MPI_ values must be in sync with the handles in the mpif-values.pl. I might be wrong, but I wanted a second pair of eyes on this before we break the Fortran layer. |
@bosilca In any case, I'll revert changes of values of |
1827858
to
43a046e
Compare
Now remaining features are implemented except the If there are no problems, I want to merge this PR in mid-Jan. |
bot:ompi:retest |
I have completed my tests. I'll merge this PR in this week unless someone has any negative comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice piece of work. Excellent sequence of commits; thank you for breaking it down!
That being said, I echo the concerns discussed in the MPI Forum in Dec 2018: we're basing this off C types that do not yet exist (and may never exist). That's probably ok from the "MPIX" point of view, but this is a ton of code that may get ripped out someday if short float
(and friends) and/or MPI_SHORT_FLOAT
(and friends) ultimately do not come to fruition.
It would be one thing if this was entirely an MPI extension, but the vast majority of the code is outside of ompi/mpiext
because it has to integrate with the datatype and op infrastructure. That gives me a little pause.
I don't have a strong objection to this, especially since some vendors obviously see some benefit from this (and I assume have customers who want it?). But it does... give me pause.
@jsquyres Thanks a lot for your review! I added comments to your non-trivial reviews. For trivial ones, I agree with you and I'll update the code. I also attended the MPI Forum meeting in Dec 2018 from Japan via WebEx. I also have the same concern but at least Fujitsu and Mellanox need FP16 support in Open MPI. I'll delay merge and want to hear OMPI developer's opinions. |
I think the only thing meaningful thing left from my review was the configure test update. Easily fixed. Let's put a timeout on getting comments back from other OMPI developers (e.g., about whether we want to add all this code for a type that is not yet standardized) -- this PR has waited quite a long time, mainly because devs [like me] took forever to look at the details. If there's no deadline, people get caught up in other work and miss PR's like this. |
I propose Feb. 7th for the deadline. OK? All, could you comment if you have opinions? I am about to merge FP16 (half precision floating point) datatype support. Corresponding C/C++ types are not yet standardized but they are proposed in ISO/IEC WGs. The background is described in a issue and a slide in the MPI Forum. Links to related pages are listed in my page. |
I forwarded your note to the devel mailing list. One thing I forgot to ask: what is MPICH doing in terms of MPIX_ for half precision? If possible, it would be nice if our MPIX_ names/meanings could be the same as theirs. |
@jsquyres Thanks. I should have mailed to devel list, not only GitHub. |
`MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <[email protected]>
Signed-off-by: KAWASHIMA Takahiro <[email protected]>
`short float` support of the Intel C++ Compiler (group of C and C++ compilers), at least versions 18.0 and 19.0, is half-baked. It can compile declarations of `short float` variables and expressions of `sizeof(short float)` but cannot compile operations of `short float` variables. In this situation, `AC_CHECK_TYPES(short float)` defines `HAVE_SHORT_FLOAT` as 1 and compilation errors occur in `ompi/mca/op/base/op_base_functions.c`. To avoid this error tentatively, we disable `short float` support when using the Intel C++ Compiler. Signed-off-by: KAWASHIMA Takahiro <[email protected]>
I updated the PR to reflect @jsquyres's review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I have minor reservations about basing a big chunk of infrastructure on C/C++ datatypes that are not yet standardized. That being said, I'm still overall in favor of this PR.
Ok, nobody else has a comment. I understand OMPI developers have no negative comments other than @jsquyres's one or don't care. Two developers approved the PR. So I'll merge the PR. |
@kawashima-fj Thanks for the great work! |
@raffenet FYI |
@kawashima-fj this PR broke Open MPI compilation on OS-X the root cause is that there is no object file in
@jsquyres do you know an elegant way for fixing this ? |
I'm looking at https://github.com/open-mpi/ompi/blob/master/ompi/mpiext/shortfloat/c/Makefile.am and I don't see any Is that |
@jsquyres Yes. The extension is required only for header and module files but the mpiext system requires The
I could not find an elegant way. The @ggouaillardet This extension is not built and the error does not occur unless a C FP16 type ( |
@kawashima-fj I was using OS X Mojave (x86 arch) with the default clang compiler to my surprise, this compiler does support a simple workaround is to add some C files with a dummy global subroutine or global variable. |
These dummy functions are required for the following reason. - The `libmpiext_shortfloat_{c,mpifh,usempif08}.la` files must be built because the `OMPI_EXT_MAKE_LISTS` macro in the `config/ompi_ext.m4` file adds the files to the lists of the `OMPI_MPIEXT_{C,MPIFH,USEMPIF08}_LIBS` output variables and the following files use the output variable. * `ompi/Makefile.am` * `ompi/mpi/fortran/mpif-h/Makefile.am` * `ompi/mpi/fortran/use-mpi-f08/Makefile.am` - The ar command of OS X refuses to create an archive file which does not contain any object files. The `usempi` binding is not affected because `OMPI_MPIEXT_USEMPIF_LIBS` is not used anywhere by nature. Generally it only includes `mpifh`. See open-mpi#6205 (comment) Signed-off-by: KAWASHIMA Takahiro <[email protected]>
@ggouaillardet Ok, I see. LLVM (Clang) 6 and 7 supports I've created the shortcut in #6429. |
if NOLIB_<component> or NOLIB_<component>_<suffix> is set, do not require ompi/mpiext/<component>/<lang>/libmpiext_<component>_<suffix>.la Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]>
the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]>
Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT macro is defined to 0. See `ompi/mpiext/example/configure.m4`. Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>
the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>
Do not require an archive when the OMPI_MPIEXT_<ext>_HAVE_OBJECT macro is defined to 0. See `ompi/mpiext/example/configure.m4`. Allow some extensions to be built on OS X since the creation of archives with no files is not permitted. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]> Signed-off-by: Jeff Squyres <[email protected]>
the shortfloat extension is only made of header files, and hence do not require a library to be built. Refs. open-mpi#6205 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: KAWASHIMA Takahiro <[email protected]>
@kawashima-fj Congrats on your HPL-AI score 🥇 !! Out of curiosity, did you use this code in your exaflop busting run? |
@jladd-mlnx Thank you. We are proud of awards achieved with Open MPI-based Fujitsu MPI. HPL-AI for Fugaku is developed by RIKEN and I don't know the detail. I asked some people in Fujitsu and RIKEN but nobody has the answer. My colleague will contact a developer of Fugaku HPL-AI. When it turns out, I'll share it. |
@jladd-mlnx - @Shinji-Sumimoto had contact with developers of Fugaku HPL-AI. They used Fujitsu MPI which is based on this code but did not use this FP16 MPI datatype. They said, they first tried to communicate FP16 data as |
@kawashima-fj , @Shinji-Sumimoto - Thank you very much for your detailed response; it makes perfect sense. Again, congratulations on your HPL and HPL-AI scores. |
This PR adds the following datatypes for FP16 (half-precision floating point) if a compiler supports the corresponding types.
MPIX_SHORT_FLOAT
for C/C++short float
and_Float16
MPIX_C_SHORT_FLOAT_COMPLEX
for Cshort float _Complex
MPIX_CXX_SHORT_FLOAT_COMPLEX
for C++std::complex<short float>
MPI_REAL2
for FortranREAL*2
(andREAL(kind=2)
)MPI_COMPLEX4
for FortranCOMPLEX*4
(andCOMPLEX(kind=2)
)Datatypes with
MPIX_
prefix are available through the MPI extension.short float
is proposed in the C WG and the C++ WG in ISO/IEC.The background is described in a issue and a slide in the MPI Forum.
MPICH has
MPIX_C_FLOAT16
. @artpol84 and I are talking with MPI guys to use a same name.This PR is still WIP. I want comments. The following features will be implemented soon.
MPI_SIZEOF
MPI_MATCH_SIZE
MPIX_SHORT_FLOAT
in thempi_f08_ext
module@bosilca Do you have any comments?
@artpol84 @Sergei-Lebedev Could you add your HCOLL support commit to this PR (with your Signed-off-by)?
If there are no problems, I want to merge this PR next month.