Skip to content

5.0.7 build failure (base/sshmem_base_open.c:34:39: error: initialization of ‘void *’ from ‘long unsigned int’ makes pointer from integer without a cast) #13103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
branfosj opened this issue Feb 16, 2025 · 8 comments · Fixed by #13105

Comments

@branfosj
Copy link

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Building from source tarball (https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.bz2)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

n/a

Please describe the system on which you are running

  • Operating system/version: RHEL 8.10
  • Computer hardware: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake)
  • Network type: Mellanox Infiniband (ConnectX-6)

Details of the problem

  • GCC: 14.2.0
  • hwloc: 2.11.2
  • libevent: 2.1.12
  • libfabrix: 2.0.0
  • PMIx: 5.0.6
  • UCX: 1.18.0
  • UCC: 1.3.0
  • PRRTE: 3.0.8
./configure --prefix=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/OpenMPI/5.0.7-GCC-14.2.0--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu --with-cuda=/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7//opal/mca/cuda --with-show-load-errors=no
--enable-mpirun-prefix-by-default  --enable-shared  --with-hwloc=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/hwloc/2.11.2-GCCcore-14.2.0
--with-libevent=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/libevent/2.1.12-GCCcore-14.2.0  --with-ofi=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/libfabric/2.0.0-GCCcore-14.2.0  --with-pmix=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PMIx/5.0.6-GCCcore-14.2.0  --with-ucx=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/UCX/1.18.0-GCCcore-14.2.0
--with-ucc=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/UCC/1.3.0-GCCcore-14.2.0  --with-prrte=/rds/projects/2017/branfosj-rse/easybuild/EL8-ice/software/PRRTE/3.0.8-GCCcore-14.2.0

Then the build (make -j 8) fails with:

Making all in mca/sshmem
make[2]: Entering directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem/mca/sshmem'
  CC       base/sshmem_base_close.lo
  CC       base/sshmem_base_select.lo
  CC       base/sshmem_base_open.lo
  CC       base/sshmem_base_wrappers.lo
base/sshmem_base_open.c:34:39: error: initialization of ‘void *’ from ‘long unsigned int’ makes pointer from integer without a cast [-Wint-conversion]
   34 | void *mca_sshmem_base_start_address = UINTPTR_MAX;
      |                                       ^~~~~~~~~~~
make[2]: *** [Makefile:1513: base/sshmem_base_open.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem/mca/sshmem'
make[1]: *** [Makefile:1924: all-recursive] Error 1
make[1]: Leaving directory '/dev/shm/branfosj/build-up-EL8/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/oshmem'
make: *** [Makefile:1539: all-recursive] Error 1

If I reverse the change to oshmem/mca/sshmem/base/sshmem_base_open.c from #12889 then I do not see the failure. So a patch of:

--- openmpi-5.0.7/oshmem/mca/sshmem/base/sshmem_base_open.c 2025-02-14 16:51:30.988684227 +0000
+++ openmpi-5.0.6/oshmem/mca/sshmem/base/sshmem_base_open.c 2024-11-15 14:18:09.472756350 +0000
@@ -31,7 +31,17 @@
  * globals
  */
 
-void *mca_sshmem_base_start_address = UINTPTR_MAX;
+/**
+ * if 32 bit we set sshmem_base_start_address to 0
+ * to let OS allocate segment automatically
+ */
+#if UINTPTR_MAX == 0xFFFFFFFF
+void *mca_sshmem_base_start_address = (void*)0;
+#elif defined(__aarch64__)
+void* mca_sshmem_base_start_address = (void*)0xAB0000000000;
+#else
+void* mca_sshmem_base_start_address = (void*)0xFF000000;
+#endif
 
 char * mca_sshmem_base_backing_file_dir = NULL;
 

Should this be? Or something else?

void *mca_sshmem_base_start_address = (void*)UINTPTR_MAX;
@opoplawski
Copy link
Contributor

I'm seeing the same build failure with the Fedora openmpi package.

@tonycurtis
Copy link

same with 4.1.8

@bosilca
Copy link
Member

bosilca commented Feb 17, 2025

The fix is as simple as suggested here, an explicit cast. To be technically correct there is another necessary change, but as we do not support 32 bits platforms anymore it shall not really matter.

diff --git a/oshmem/mca/sshmem/base/sshmem_base_open.c b/oshmem/mca/sshmem/base/sshmem_base_open.c
index 1f0d1eb761..06411b3852 100644
--- a/oshmem/mca/sshmem/base/sshmem_base_open.c
+++ b/oshmem/mca/sshmem/base/sshmem_base_open.c
@@ -31,7 +31,7 @@
  * globals
  */
 
-void *mca_sshmem_base_start_address = UINTPTR_MAX;
+void *mca_sshmem_base_start_address = (void*)UINTPTR_MAX;
 
 char * mca_sshmem_base_backing_file_dir = NULL;
 
@@ -49,7 +49,7 @@ mca_sshmem_base_register (mca_base_register_flag_t flags)
                                  "base",
                                  "start_address",
                                  "Specify base address for shared memory region",
-                                 MCA_BASE_VAR_TYPE_UNSIGNED_LONG_LONG,
+                                 MCA_BASE_VAR_TYPE_UNSIGNED_LONG,
                                  NULL,
                                  0,
                                  MCA_BASE_VAR_FLAG_SETTABLE,

@lahwaacz
Copy link
Contributor

The fix is as simple as suggested here, an explicit cast.

Considering that a similar issue has happened in a second release in a row, the fix is not simple. We had to wait 3 months until a fix for #12924 was released (in 5.0.7) and I don't expect a shorter timeline for addressing this issue.

Please adjust your release engineering processes such that either these type conversion issues are revealed promptly in the testing or even development phase (it takes just a modern compiler and/or a stricter set of compiler flags), or that a bugfix release can follow shortly after such incident happens. The community deserves a better software quality than this.

@bosilca
Copy link
Member

bosilca commented Feb 18, 2025

"deserve" ? 😂

@cwlmco
Copy link

cwlmco commented Feb 18, 2025

I agree with "Please adjust your release engineering processes". The intel compilers can be downloaded for free, so why not just download them and test compiling the code with them before a release as standard process? Seems like code that will compile with multiple compilers would be better code.

@jsquyres
Copy link
Member

@lahwaacz is not wrong. We should have caught this.

@jsquyres jsquyres added this to the v5.0.8 milestone Feb 18, 2025
@jsquyres jsquyres reopened this Feb 19, 2025
@jsquyres
Copy link
Member

Need to keep this open until it's fixed on the v5.0.x and v4.1.x branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants