Skip to content

accelerator framework: cuda component assumes application initializes CUDA prior to calling MPI_init #11084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hppritcha opened this issue Nov 16, 2022 · 9 comments

Comments

@hppritcha
Copy link
Member

hppritcha commented Nov 16, 2022

I'm not sure if this is a bug or a feature, but in testing the accelerator framework work on a system with Nvidia GPUS, I noticed that the osu_mbw_mr test (the 5.8 version of OSU benchmarks) fails using cuda allocated buffers. The problem is that the cuda component. doesn't get initialized until after calling MPI_Init, so the address checker code is not employed. The other mpi/pt2pt tests initialize CUDA before calling MPI_Init and don't see this problem.

I've not tested the rocm component, so don't know if it would have the same issue.

@hppritcha
Copy link
Member Author

@edgargabriel the system i use for rocm testing is down for an extended time. You may want to see if the osu_mbw_mr test works okay with the rocm component.

@edgargabriel
Copy link
Member

@hppritcha the test passes for rocm memory. Not sure whether it makes a difference, I am using OSU benchmarks version 5.9 though.

@hppritcha
Copy link
Member Author

my rocm machine came back on line and I confirmed what you @edgargabriel observed that osu_mbw_mr with D D args works nominally.

@janjust
Copy link
Contributor

janjust commented Nov 29, 2022

@Akshay-Venkatesh @open-mpi/cuda Guys, can you chime in on this issue.
Is it a requirement for cuda init init to be always initialized BEFORE MPI_Init? The new cuda framework changed the delayed initialization from delayed to after MPI_Init and that causes issues for some benchmarks, and certainly will cause issues for some users.

@Akshay-Venkatesh
Copy link
Contributor

Is it a requirement for cuda init init to be always initialized BEFORE MPI_Init?

@janjust It shouldn't be a requirement.

I've tested mbw_mr test with ompi-5x and UCX and haven't encountered an issue.

@hppritcha Did you use UCX pml or something else? What command do you run and what exact cuda errors are you seeing?

FWIW, in UCX, we don't require cuda device initialization before MPI_Init but it is recommended as it helps with some topology decisions.

@janjust
Copy link
Contributor

janjust commented Nov 29, 2022

@edgargabriel

FWIW, in UCX, we don't require cuda device initialization before MPI_Init but it is recommended as it helps with some topology decisions.

I wonder if this is what you were referring to.

@wckzhang
Copy link
Contributor

@Akshay-Venkatesh The reason for this is because some OMPI initialization code relies being within a cuda context. Previously there was a delayed initialization for some items like stream creation for asynchronous memcpy's. We can re-add that delayed initialization

@wckzhang
Copy link
Contributor

PR to delay initialization: #11253

@hppritcha
Copy link
Member Author

closed via #11253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants