Skip to content

OPAL assembly handling on multiple archs now fails on 2.1.0 #3442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amckinstry opened this issue May 3, 2017 · 22 comments
Closed

OPAL assembly handling on multiple archs now fails on 2.1.0 #3442

amckinstry opened this issue May 3, 2017 · 22 comments

Comments

@amckinstry
Copy link

Hi.

Openmpi 2.0.2 builds on multiple architectures within Debian:
https://buildd.debian.org/status/package.php?p=openmpi

However changes in 2.1.0 means many are now broken:
https://buildd.debian.org/status/package.php?p=openmpi&suite=experimental

(Ignore the "2.1.0rc2"; thats a Debian-specific version name ; the codebase in 2.1.0 without arch changes)
eg. Arm64 no longer compiles. mips64el compiles but not mipsel.

@rhc54
Copy link
Contributor

rhc54 commented May 3, 2017

@hjelmn Looking thru the output, it appears the problem is a failure to identify the environment so the variadic macros can complete:

In file included from ../../../opal/class/opal_object.h:128:0,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from intercepts.cc:29:
../../../opal/threads/thread_usage.h: In function 'int32_t opal_thread_add_32(volatile int32_t*, int32_t)':
../../../opal/threads/thread_usage.h:109:55: error: 'opal_atomic_add_32' was not declared in this scope
         return opal_atomic_add_ ## suffix (addr, delta);                \
                                                       ^
../../../opal/threads/thread_usage.h:143:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_ADD'
 OPAL_THREAD_DEFINE_ATOMIC_ADD(int32_t, 32)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../../../opal/threads/thread_usage.h:28:0,
                 from ../../../opal/class/opal_object.h:128,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from intercepts.cc:29:
../../../opal/threads/thread_usage.h: In function 'size_t opal_thread_add_size_t(volatile size_t*, size_t)':
../../../opal/include/opal/sys/atomic.h:484:96: error: 'opal_atomic_add_64' was not declared in this scope
 #define opal_atomic_add_size_t(addr, delta) ((size_t) opal_atomic_add_64((int64_t*) addr, delta))
                                                                                                ^
../../../opal/threads/thread_usage.h:109:16: note: in expansion of macro 'opal_atomic_add_size_t'
         return opal_atomic_add_ ## suffix (addr, delta);                \
                ^~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h:144:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_ADD'
 OPAL_THREAD_DEFINE_ATOMIC_ADD(size_t, size_t)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../../../opal/class/opal_object.h:128:0,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from intercepts.cc:29:
../../../opal/threads/thread_usage.h: In function 'bool opal_thread_cmpset_bool_32(volatile int32_t*, int32_t, int32_t)':
../../../opal/threads/thread_usage.h:119:85: error: 'opal_atomic_cmpset_32' was not declared in this scope
         return opal_atomic_cmpset_ ## suffix ((volatile type *) addr, compare, value); \
                                                                                     ^
../../../opal/threads/thread_usage.h:145:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_CMPSET'
 OPAL_THREAD_DEFINE_ATOMIC_CMPSET(int32_t, int32_t, 32)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h: In function 'bool opal_thread_cmpset_bool_ptr(volatile intptr_t*, void*, void*)':
../../../opal/threads/thread_usage.h:119:85: error: 'opal_atomic_cmpset_ptr' was not declared in this scope
         return opal_atomic_cmpset_ ## suffix ((volatile type *) addr, compare, value); \
                                                                                     ^
../../../opal/threads/thread_usage.h:146:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_CMPSET'
 OPAL_THREAD_DEFINE_ATOMIC_CMPSET(void *, intptr_t, ptr)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h: In function 'int32_t opal_thread_swap_32(volatile int32_t*, int32_t)':
../../../opal/threads/thread_usage.h:134:76: error: 'opal_atomic_swap_32' was not declared in this scope
         return opal_atomic_swap_ ## suffix ((volatile type *) ptr, newvalue); \
                                                                            ^
../../../opal/threads/thread_usage.h:147:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_SWAP'
 OPAL_THREAD_DEFINE_ATOMIC_SWAP(int32_t, int32_t, 32)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h: In function 'void* opal_thread_swap_ptr(volatile intptr_t*, void*)':
../../../opal/threads/thread_usage.h:134:76: error: 'opal_atomic_swap_ptr' was not declared in this scope
         return opal_atomic_swap_ ## suffix ((volatile type *) ptr, newvalue); \
                                                                            ^
../../../opal/threads/thread_usage.h:148:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_SWAP'
 OPAL_THREAD_DEFINE_ATOMIC_SWAP(void *, intptr_t, ptr)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Makefile:1867: recipe for target 'intercepts.lo' failed
make[3]: *** [intercepts.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from ../../../opal/class/opal_object.h:128:0,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from file.cc:19:
../../../opal/threads/thread_usage.h: In function 'int32_t opal_thread_add_32(volatile int32_t*, int32_t)':
../../../opal/threads/thread_usage.h:109:55: error: 'opal_atomic_add_32' was not declared in this scope
         return opal_atomic_add_ ## suffix (addr, delta);                \
                                                       ^
../../../opal/threads/thread_usage.h:143:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_ADD'
 OPAL_THREAD_DEFINE_ATOMIC_ADD(int32_t, 32)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../../../opal/threads/thread_usage.h:28:0,
                 from ../../../opal/class/opal_object.h:128,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from file.cc:19:
../../../opal/threads/thread_usage.h: In function 'size_t opal_thread_add_size_t(volatile size_t*, size_t)':
../../../opal/include/opal/sys/atomic.h:484:96: error: 'opal_atomic_add_64' was not declared in this scope
 #define opal_atomic_add_size_t(addr, delta) ((size_t) opal_atomic_add_64((int64_t*) addr, delta))
                                                                                                ^
../../../opal/threads/thread_usage.h:109:16: note: in expansion of macro 'opal_atomic_add_size_t'
         return opal_atomic_add_ ## suffix (addr, delta);                \
                ^~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h:144:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_ADD'
 OPAL_THREAD_DEFINE_ATOMIC_ADD(size_t, size_t)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../../../opal/class/opal_object.h:128:0,
                 from ../../../ompi/errhandler/errhandler.h:36,
                 from cxx_glue.h:18,
                 from file.cc:19:
../../../opal/threads/thread_usage.h: In function 'bool opal_thread_cmpset_bool_32(volatile int32_t*, int32_t, int32_t)':
../../../opal/threads/thread_usage.h:119:85: error: 'opal_atomic_cmpset_32' was not declared in this scope
         return opal_atomic_cmpset_ ## suffix ((volatile type *) addr, compare, value); \
                                                                                     ^
../../../opal/threads/thread_usage.h:145:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_CMPSET'
 OPAL_THREAD_DEFINE_ATOMIC_CMPSET(int32_t, int32_t, 32)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h: In function 'bool opal_thread_cmpset_bool_ptr(volatile intptr_t*, void*, void*)':
../../../opal/threads/thread_usage.h:119:85: error: 'opal_atomic_cmpset_ptr' was not declared in this scope
         return opal_atomic_cmpset_ ## suffix ((volatile type *) addr, compare, value); \
                                                                                     ^
../../../opal/threads/thread_usage.h:146:1: note: in expansion of macro 'OPAL_THREAD_DEFINE_ATOMIC_CMPSET'
 OPAL_THREAD_DEFINE_ATOMIC_CMPSET(void *, intptr_t, ptr)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../opal/threads/thread_usage.h: In function 'int32_t opal_thread_swap_32(volatile int32_t*, int32_t)':
../../../opal/threads/thread_usage.h:134:76: error: 'opal_atomic_swap_32' was not declared in this scope
         return opal_atomic_swap_ ## suffix ((volatile type *) ptr, newvalue); \

@bosilca
Copy link
Member

bosilca commented May 3, 2017

On the ARM64 the architecture was correctly detected. The configure complains about missing support for ARM64, because we ARM64.asm file and the compiler apparently does not support atomic builtins, which suggests we are expected to use opal/include/opal/sys/arm64/atomic.h. However, all the functions defined in this file seems to be missing, which suggest that somehow OPAL_GCC_INLINE_ASSEMBLY has not been correctly set.

This is indeed correct as in opal/include/opal/sys/atomic.h we force OPAL_GCC_INLINE_ASSEMBLY for C++ compilers. We could try to fix the assembly inclusion from C++ compilers, or we can drop support for the non-standardized C++ API.

@amckinstry
Copy link
Author

Asn an experiment I set OPAL_GCC_INLINE_ASSEMBLY in opal/include/opal/sys/arm64/atomic.h and it builds. (Similar patch already present for powerpc)

@bosilca
Copy link
Member

bosilca commented May 4, 2017

There is no guarantee that the C++ compiler supports the same type of assembly as the C compiler. It is happening for you, but to be on the safe side we should add a check during configure time.

@amckinstry
Copy link
Author

Agreed; to be clear, Debian is only using GCC/G++ at this time (though I did test with g++ 7).

@hjelmn
Copy link
Member

hjelmn commented May 4, 2017

We decided to isolate the internals of Open MPI from C++. I don't remember whether this change made it onto master but I am sure it isn't on v2.x or v2.0.x.

@hjelmn
Copy link
Member

hjelmn commented May 4, 2017

Hmm, or I am wrong. I did PR it to v2.0.x and v2.x. :D #2163 #2162

@hjelmn
Copy link
Member

hjelmn commented May 4, 2017

This should have clued me in to the isolation code being in there:

                 from cxx_glue.h:18,

Hmm, I wonder why it isn't working as expected on aarch64.

@jsquyres jsquyres added this to the v2.1.1 milestone May 4, 2017
@bosilca
Copy link
Member

bosilca commented May 4, 2017

I don't understand how the isolation layer is supposed to work when cxx_glue.h is included from all the CXX files and it pulls the errorhandler.h (this pull the entire spaghetti header files down to our opal_object.h where we need the atomics).

@jsquyres
Copy link
Member

jsquyres commented May 4, 2017

Per fc0eeb4, should ompi_cxx_dummy_fn_t be used in cxx_glue.h instead of ompi_errhandler_generic_handler_fn_t? (assumedly with appropriate casting in cxx_glue.cc)

@hjelmn
Copy link
Member

hjelmn commented May 4, 2017

@bosilca I thought I had prevented pulling everything. It was looking ok on my Mac (it didn't compile before the glue). Looks like we just need to do a little more to make it work in this case. Trying to de the minimum necessary since @jsquyres will git rm ompi/mpi/cxx at some point.

@jsquyres
Copy link
Member

jsquyres commented May 4, 2017

Yeah, we'll remove the C++ bindings... someday. Probably not soon, though. 😈

@jjhursey
Copy link
Member

jjhursey commented May 4, 2017

@hjelmn Yeah I noticed this a couple days ago too and added a comment here to one of your PRs. I haven't had a chance to look into it any further though. A user was trying to use the C++ bindings in 2.1 and contacted me about the state of the patches.

@amckinstry
Copy link
Author

Ok, 2.1.1rc1 builds fine on all archs except x32 when --without-cma is used where appropriate. Work is needed on the x32 patch (its using 64-bit quantities ? (x32 should use 32-bit quantities but the larger register set from x86_64).
I think this should not be a blocker for the 2.1.1 release.

@hppritcha hppritcha modified the milestones: v2.1.2, v2.1.1 May 10, 2017
@jsquyres
Copy link
Member

@amckinstry Am I reading your comment correctly: you applied --without-cma to all the builds, but you're still seeing a failure on the 32 bit machines (i.e., s390x). Is that right?

If so, where do we find the full build output from the s390x? I see on https://buildd.debian.org/status/package.php?p=openmpi&suite=experimental the last 10-20 lines of the build that shows that there's some linker error when compiling opal_wrapper. But seeing the rest of the build log would be helpful.

@hjelmn
Copy link
Member

hjelmn commented May 23, 2017

Looking into this now.

hjelmn added a commit to hjelmn/ompi that referenced this issue May 26, 2017
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference open-mpi#3442

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented May 26, 2017

Does #3589 help?

hjelmn added a commit to hjelmn/ompi that referenced this issue May 26, 2017
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference open-mpi#3442

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit ee9093c)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue May 26, 2017
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference open-mpi#3442

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit ee9093c)
Signed-off-by: Nathan Hjelm <[email protected]>
hjelmn added a commit to hjelmn/ompi that referenced this issue May 26, 2017
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference open-mpi#3442

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit ee9093c)
Signed-off-by: Nathan Hjelm <[email protected]>
@jsquyres
Copy link
Member

jsquyres commented Jun 1, 2017

@amckinstry Have you had a chance to check out Nathan's reply #3442 (comment)?

@amckinstry
Copy link
Author

I haven't. I've just kicked off a build with the commit ee9093c which I'll push to experimental (assuming it builds on x86)

@amckinstry
Copy link
Author

i've tested the commit in #3442 (log: https://buildd.debian.org/status/fetch.php?pkg=openmpi&arch=s390x&ver=2.1.1-3&stamp=1496331745&raw=0) Unfortunately s390 is still failing.

hjelmn added a commit to hjelmn/ompi that referenced this issue Jun 20, 2017
This commit removes a nonexistent function that was causing build
problems under certain environments.

Reference open-mpi#3442

Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit ee9093c)
Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member

hjelmn commented Jun 22, 2017

We don't have atomics for s390 and it looks like the gcc builtins are not enabled in your build. Try adding --enable-builtin-atomics to your configure line.

@amckinstry
Copy link
Author

--enable-builtin-atomics works fine for s390. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants