-
Notifications
You must be signed in to change notification settings - Fork 902
OPAL assembly handling on multiple archs now fails on 2.1.0 #3442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@hjelmn Looking thru the output, it appears the problem is a failure to identify the environment so the variadic macros can complete:
|
On the ARM64 the architecture was correctly detected. The configure complains about missing support for ARM64, because we ARM64.asm file and the compiler apparently does not support atomic builtins, which suggests we are expected to use opal/include/opal/sys/arm64/atomic.h. However, all the functions defined in this file seems to be missing, which suggest that somehow OPAL_GCC_INLINE_ASSEMBLY has not been correctly set. This is indeed correct as in opal/include/opal/sys/atomic.h we force OPAL_GCC_INLINE_ASSEMBLY for C++ compilers. We could try to fix the assembly inclusion from C++ compilers, or we can drop support for the non-standardized C++ API. |
Asn an experiment I set OPAL_GCC_INLINE_ASSEMBLY in opal/include/opal/sys/arm64/atomic.h and it builds. (Similar patch already present for powerpc) |
There is no guarantee that the C++ compiler supports the same type of assembly as the C compiler. It is happening for you, but to be on the safe side we should add a check during configure time. |
Agreed; to be clear, Debian is only using GCC/G++ at this time (though I did test with g++ 7). |
We decided to isolate the internals of Open MPI from C++. I don't remember whether this change made it onto master but I am sure it isn't on v2.x or v2.0.x. |
This should have clued me in to the isolation code being in there:
Hmm, I wonder why it isn't working as expected on aarch64. |
I don't understand how the isolation layer is supposed to work when cxx_glue.h is included from all the CXX files and it pulls the errorhandler.h (this pull the entire spaghetti header files down to our opal_object.h where we need the atomics). |
Per fc0eeb4, should |
Yeah, we'll remove the C++ bindings... someday. Probably not soon, though. 😈 |
Ok, 2.1.1rc1 builds fine on all archs except x32 when --without-cma is used where appropriate. Work is needed on the x32 patch (its using 64-bit quantities ? (x32 should use 32-bit quantities but the larger register set from x86_64). |
@amckinstry Am I reading your comment correctly: you applied If so, where do we find the full build output from the s390x? I see on https://buildd.debian.org/status/package.php?p=openmpi&suite=experimental the last 10-20 lines of the build that shows that there's some linker error when compiling |
Looking into this now. |
This commit removes a nonexistent function that was causing build problems under certain environments. Reference open-mpi#3442 Signed-off-by: Nathan Hjelm <[email protected]>
Does #3589 help? |
This commit removes a nonexistent function that was causing build problems under certain environments. Reference open-mpi#3442 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit ee9093c) Signed-off-by: Nathan Hjelm <[email protected]>
This commit removes a nonexistent function that was causing build problems under certain environments. Reference open-mpi#3442 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit ee9093c) Signed-off-by: Nathan Hjelm <[email protected]>
This commit removes a nonexistent function that was causing build problems under certain environments. Reference open-mpi#3442 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit ee9093c) Signed-off-by: Nathan Hjelm <[email protected]>
@amckinstry Have you had a chance to check out Nathan's reply #3442 (comment)? |
I haven't. I've just kicked off a build with the commit ee9093c which I'll push to experimental (assuming it builds on x86) |
i've tested the commit in #3442 (log: https://buildd.debian.org/status/fetch.php?pkg=openmpi&arch=s390x&ver=2.1.1-3&stamp=1496331745&raw=0) Unfortunately s390 is still failing. |
This commit removes a nonexistent function that was causing build problems under certain environments. Reference open-mpi#3442 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit ee9093c) Signed-off-by: Nathan Hjelm <[email protected]>
We don't have atomics for s390 and it looks like the gcc builtins are not enabled in your build. Try adding --enable-builtin-atomics to your configure line. |
--enable-builtin-atomics works fine for s390. Thanks. |
Hi.
Openmpi 2.0.2 builds on multiple architectures within Debian:
https://buildd.debian.org/status/package.php?p=openmpi
However changes in 2.1.0 means many are now broken:
https://buildd.debian.org/status/package.php?p=openmpi&suite=experimental
(Ignore the "2.1.0rc2"; thats a Debian-specific version name ; the codebase in 2.1.0 without arch changes)
eg. Arm64 no longer compiles. mips64el compiles but not mipsel.
The text was updated successfully, but these errors were encountered: