Skip to content

Mysterious 32 bit MTT build failures on Absort #3610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsquyres opened this issue May 30, 2017 · 7 comments · Fixed by #3634
Closed

Mysterious 32 bit MTT build failures on Absort #3610

jsquyres opened this issue May 30, 2017 · 7 comments · Fixed by #3634
Assignees
Labels

Comments

@jsquyres
Copy link
Member

We're seeing odd MPI Install failures on 32 bit Absoft failures, but the log messages are not showing what the actual error is.

@cagoelz Can you shed any light, perchance, on what is failing in these builds?

https://mtt.open-mpi.org/index.php?do_redir=2446

@cagoelz
Copy link

cagoelz commented May 30, 2017

The failures occur building ompi_datatype_external.lo:

  CC       ompi_datatype_external.lo
In file included from ompi_datatype_external.c:29:
../../ompi/communicator/communicator.h:265: error: size of array ‘padding’ is too large
make: *** [ompi_datatype_external.lo] Error 1

The code around ../../ompi/communicator/communicator.h:265 is

#define PREDEFINED_COMMUNICATOR_PAD (sizeof(void*) * 64)

struct ompi_predefined_communicator_t {
    struct ompi_communicator_t comm;
    char padding[PREDEFINED_COMMUNICATOR_PAD - sizeof(ompi_communicator_t)];
};

The size of ompi_communicator_t is larger than 256 bytes?

@bosilca
Copy link
Member

bosilca commented May 30, 2017

git blame 50aa143. On 32 bits architectures, as the void* size is 32 bits, the opal_hash_table_t becomes too large for the PREDEFINED_COMMUNICATOR_PAD. We might have an issue with our ABI ...

@jsquyres
Copy link
Member Author

Ah! Good point.

I guess we intended to allow 64 pointers worth of padding. But in hindsight, that's a poor measure. Should we just listed a number of bytes, instead? (e.g., 64*8=512)

Sidenote: I'm not worried about our ABI between 32 and 64 bit builds.

@bosilca
Copy link
Member

bosilca commented May 31, 2017

I was not worried about ABI 32 vs. 64 bit builds (it has no sense), but we will now have to increase the size of the PREDEFINED_COMMUNICATOR_PAD, which will translate into a ABI breakage.

@jsquyres
Copy link
Member Author

Yes, we will break ABI for 32 bit builds. But if we get this into 3.0 (and beyond), I'm not worried about it.

Do you agree?

@bosilca
Copy link
Member

bosilca commented May 31, 2017

I see what you're saying: if we change to a fixed length, equal to the size of 64*8=512, we will not change the size on 64 bits builds, so we will only break the AI on 32 bits builds. Sounds good. 👍

jsquyres added a commit to jsquyres/ompi that referenced this issue Jun 1, 2017
Convert the predefined MPI object padding to a fixed number of bytes
(vs. a multiple of sizeof(void*)) so that the padding is the same size
between 32 and 64 bit builds.  I.e., we won't have a situation where
we've run out of padding in 32 bit builds but still have more space
available in 64 bit builds.

Fixes open-mpi#3610

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres
Copy link
Member Author

jsquyres commented Jun 1, 2017

@cagoelz I didn't say it earlier: thanks for the detailed compilation output! 😄

bosilca pushed a commit that referenced this issue Jun 1, 2017
Convert the predefined MPI object padding to a fixed number of bytes
(vs. a multiple of sizeof(void*)) so that the padding is the same size
between 32 and 64 bit builds.  I.e., we won't have a situation where
we've run out of padding in 32 bit builds but still have more space
available in 64 bit builds.

Fixes #3610

Signed-off-by: Jeff Squyres <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants