Skip to content

comm_split_type HW_GUIDED fix MPI_UNDEFINED handling #10702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 7, 2022

Conversation

jjhursey
Copy link
Member

No description provided.

@jjhursey
Copy link
Member Author

@dalcinl can you take this PR for a spin and let me know if it addresses the issue you mentioned [here]? Thanks!

@dalcinl
Copy link
Contributor

dalcinl commented Aug 23, 2022

@jjhursey Here you have the logs, I believe everything is OK regarding this PR.

However, I got PRTE error when trying to run with 5 processes (IIRC, GitHub Action workers have just 2 virtual cores), with oversubscription turned on via $HOME/.prte/mca-params.conf [link].

Run mpiexec -n 5 python mpi4py/test/runtests.py -v -f
[fv-az163-551:165345] PRTE ERROR: Unable to map job in file rmaps_rr.c at line 184
Error: Process completed with exit code 59.

This new issue is most likely unrelated to this PR, but anyway you may be the right person to notify. This is probably a regression after the recent PRTE updates.

@dalcinl
Copy link
Contributor

dalcinl commented Aug 23, 2022

@jjhursey Forgot to mention: PRs related to the new comm split types (this one and the previous ones) should be also merged into branch v5.0.x, right? Maybe you should label them appropriately? Or are they already merged?

@jjhursey jjhursey mentioned this pull request Aug 23, 2022
14 tasks
@jjhursey
Copy link
Member Author

For the runtime issue, I added a note to Issue #10698 on which the team is currently looking at these type of issues.

As far as v5.0, I'll ask to see if the RMs want to let it in. If so then I'll PR over the full set of fixes and let you know.

@jjhursey
Copy link
Member Author

FYI: I created a PR for v5.0.x that cherry-picks all of the changes so far. Once this PR is merged then I'll work with the v5 RMs to get it into the v5.0 release cycle.

@rhc54
Copy link
Contributor

rhc54 commented Aug 23, 2022

However, I got PRTE error when trying to run with 5 processes (IIRC, GitHub Action workers have just 2 virtual cores), with oversubscription turned on

Should be fixed in openpmix/prrte#1463

@dalcinl
Copy link
Contributor

dalcinl commented Aug 25, 2022

Gentle request to move this PR forward.

@jjhursey
Copy link
Member Author

@bosilca this is ready to review when you have a chance. Thanks

@jjhursey jjhursey requested a review from hppritcha September 1, 2022 14:11
@jjhursey
Copy link
Member Author

jjhursey commented Sep 1, 2022

@bosilca Can you review? I'd like to get this in so that we can properly test the UNGUIDED PR #10739. @hppritcha I added you as a reviewer in case George doesn't have time. Thanks

@jjhursey jjhursey force-pushed the fix-split-guided-undef branch from 4b6a069 to 6ddcc58 Compare September 6, 2022 19:25
@bosilca bosilca merged commit 174b095 into open-mpi:main Sep 7, 2022
@jjhursey jjhursey deleted the fix-split-guided-undef branch September 7, 2022 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants