You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I received off-list a bug report from the HDF5 group by Richard Warren [email protected] that impacts both ompio and romio (although very differently). The issue appears when a single file access operation accesses more than 2GB. A reproducer was provided by the HDF5 group.
In ompio, there is an error message triggered by the fbtl component about invalid arguments. Debugging the item revealed that it only appears for individual operations (although HDF5 calls collective I/O, ompio recognizes that it is a communicator of size 1 and calls the individual I/O operation instead), and was due to two improper conversions from size_t to int. A fix is coming shortly.
In romio, the issue different and has to deal with setting the status after the operation finished. A suggested fix was provided by the HDF5 group. The gist of it is that in ompi/mca/io/romio/src/io_romio321_module.c we need to replace int nbytes to MPI_Count nbytes as an argument to MPIR_Status_set_bytes and call MPI_Status_set_elements_x instead of MPI_Status_set_elements, e.g.
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.
Fixes Issue open-mpi#7045
Cherry-picked from commit a130f56
Signed-off-by: Edgar Gabriel <[email protected]>
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.
Fixes Issue open-mpi#7045
Cherry-picked from commit a130f56
Signed-off-by: Edgar Gabriel <[email protected]>
Thank you for taking the time to submit an issue!
Background information
I received off-list a bug report from the HDF5 group by Richard Warren [email protected] that impacts both ompio and romio (although very differently). The issue appears when a single file access operation accesses more than 2GB. A reproducer was provided by the HDF5 group.
In ompio, there is an error message triggered by the fbtl component about invalid arguments. Debugging the item revealed that it only appears for individual operations (although HDF5 calls collective I/O, ompio recognizes that it is a communicator of size 1 and calls the individual I/O operation instead), and was due to two improper conversions from size_t to int. A fix is coming shortly.
In romio, the issue different and has to deal with setting the status after the operation finished. A suggested fix was provided by the HDF5 group. The gist of it is that in ompi/mca/io/romio/src/io_romio321_module.c we need to replace
int nbytes
toMPI_Count nbytes
as an argument to MPIR_Status_set_bytes and callMPI_Status_set_elements_x
instead ofMPI_Status_set_elements
, e.g.I will have to ask however somebody who is more familiar with the romio integration to look at it
(e.g. @ggouaillardet ?)
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
master, v4.x. and probably v3.x
The text was updated successfully, but these errors were encountered: