Description
Thank you for taking the time to submit an issue!
Background information
I received off-list a bug report from the HDF5 group by Richard Warren [email protected] that impacts both ompio and romio (although very differently). The issue appears when a single file access operation accesses more than 2GB. A reproducer was provided by the HDF5 group.
In ompio, there is an error message triggered by the fbtl component about invalid arguments. Debugging the item revealed that it only appears for individual operations (although HDF5 calls collective I/O, ompio recognizes that it is a communicator of size 1 and calls the individual I/O operation instead), and was due to two improper conversions from size_t to int. A fix is coming shortly.
In romio, the issue different and has to deal with setting the status after the operation finished. A suggested fix was provided by the HDF5 group. The gist of it is that in ompi/mca/io/romio/src/io_romio321_module.c we need to replace int nbytes
to MPI_Count nbytes
as an argument to MPIR_Status_set_bytes and call MPI_Status_set_elements_x
instead of MPI_Status_set_elements
, e.g.
int MPIR_Status_set_bytes(ompi_status_public_t *status,
struct ompi_datatype_t *datatype, MPI_Count nbytes)
{
MPI_Status_set_elements_x (status, MPI_CHAR, nbytes);
return MPI_SUCCESS;
}
I will have to ask however somebody who is more familiar with the romio integration to look at it
(e.g. @ggouaillardet ?)
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
master, v4.x. and probably v3.x