Skip to content

HDF5 bug when writing > 2GB in a single call #7045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edgargabriel opened this issue Oct 5, 2019 · 0 comments
Closed

HDF5 bug when writing > 2GB in a single call #7045

edgargabriel opened this issue Oct 5, 2019 · 0 comments

Comments

@edgargabriel
Copy link
Member

edgargabriel commented Oct 5, 2019

Thank you for taking the time to submit an issue!

Background information

I received off-list a bug report from the HDF5 group by Richard Warren [email protected] that impacts both ompio and romio (although very differently). The issue appears when a single file access operation accesses more than 2GB. A reproducer was provided by the HDF5 group.

In ompio, there is an error message triggered by the fbtl component about invalid arguments. Debugging the item revealed that it only appears for individual operations (although HDF5 calls collective I/O, ompio recognizes that it is a communicator of size 1 and calls the individual I/O operation instead), and was due to two improper conversions from size_t to int. A fix is coming shortly.

In romio, the issue different and has to deal with setting the status after the operation finished. A suggested fix was provided by the HDF5 group. The gist of it is that in ompi/mca/io/romio/src/io_romio321_module.c we need to replace int nbytes to MPI_Count nbytes as an argument to MPIR_Status_set_bytes and call MPI_Status_set_elements_x instead of MPI_Status_set_elements, e.g.

int MPIR_Status_set_bytes(ompi_status_public_t *status,
                          struct ompi_datatype_t *datatype, MPI_Count nbytes)
{
    MPI_Status_set_elements_x (status, MPI_CHAR, nbytes);
    return MPI_SUCCESS;
}

I will have to ask however somebody who is more familiar with the romio integration to look at it
(e.g. @ggouaillardet ?)

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

master, v4.x. and probably v3.x

@edgargabriel edgargabriel added this to the v4.0.3 milestone Oct 5, 2019
@edgargabriel edgargabriel self-assigned this Oct 5, 2019
edgargabriel added a commit to edgargabriel/ompi that referenced this issue Oct 22, 2019
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.

Fixes Issue open-mpi#7045

Cherry-picked from commit a130f56

Signed-off-by: Edgar Gabriel <[email protected]>
cniethammer pushed a commit to cniethammer/ompi that referenced this issue May 10, 2020
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.

Fixes Issue open-mpi#7045

Cherry-picked from commit a130f56

Signed-off-by: Edgar Gabriel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant