-
Notifications
You must be signed in to change notification settings - Fork 900
accelerator framework: cuda component assumes application initializes CUDA prior to calling MPI_init #11084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@edgargabriel the system i use for rocm testing is down for an extended time. You may want to see if the osu_mbw_mr test works okay with the rocm component. |
@hppritcha the test passes for rocm memory. Not sure whether it makes a difference, I am using OSU benchmarks version 5.9 though. |
my rocm machine came back on line and I confirmed what you @edgargabriel observed that osu_mbw_mr with D D args works nominally. |
@Akshay-Venkatesh @open-mpi/cuda Guys, can you chime in on this issue. |
@janjust It shouldn't be a requirement. I've tested mbw_mr test with ompi-5x and UCX and haven't encountered an issue. @hppritcha Did you use UCX pml or something else? What command do you run and what exact cuda errors are you seeing? FWIW, in UCX, we don't require cuda device initialization before MPI_Init but it is recommended as it helps with some topology decisions. |
I wonder if this is what you were referring to. |
@Akshay-Venkatesh The reason for this is because some OMPI initialization code relies being within a cuda context. Previously there was a delayed initialization for some items like stream creation for asynchronous memcpy's. We can re-add that delayed initialization |
PR to delay initialization: #11253 |
closed via #11253 |
I'm not sure if this is a bug or a feature, but in testing the accelerator framework work on a system with Nvidia GPUS, I noticed that the osu_mbw_mr test (the 5.8 version of OSU benchmarks) fails using cuda allocated buffers. The problem is that the cuda component. doesn't get initialized until after calling MPI_Init, so the address checker code is not employed. The other mpi/pt2pt tests initialize CUDA before calling MPI_Init and don't see this problem.
I've not tested the rocm component, so don't know if it would have the same issue.
The text was updated successfully, but these errors were encountered: