-
Notifications
You must be signed in to change notification settings - Fork 900
Investigate and document current behavior of "aggressive" mode #11735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is an old link to section 10.8.21 -- it may or may not be correct: https://ompi--8329.org.readthedocs.build/en/8329/faq/running-mpi-apps.html |
Split out from #10480 |
Here is what I found so far by looking into the ompi source code:
This value is then used to set the opal_progress_yield_when_idle value https://github.com/open-mpi/ompi/blob/main/ompi/instance/instance.c#L765
I could not find another place where either |
After looking into this a bit more here is how I understand the logic:
|
I read through the documentation of v5.0.x, and I don't see the documentation making wrong statements wrt to oversubscription. In addition, I am honestly not sure whether the 'explanation' on why something is behaving a certain way (vs. the how the user can influence the setting) warrants a 'blocking' label. Unless there are objections, I would like to i) remove the blocking label and change it to a lower level, and ii) close this ticket sometime next week |
@edgargabriel If the current statements in the docs are correct, cool -- I agree: remove the blocker label. The issue was that there were so many things that had changed about run-time behavior that it warranted a check to ensure that the docs were a) correct, and/or b) should be expanded/clarified/whatever. If we want to additionally expand the current docs (with more explanations, examples, ...etc.) and those aren't critical / could be added at any time after v5.0.0, cool -- we can do that, too. |
@jsquyres I agree. I am 99.9% sure that we don't make an outrageously wrong statement related to oversubscription and the 'aggresive' mode control. If nothing else, I would argue that this issue does not warrant holding up the 5.0. release. Improvements are always possible. |
Hahaha -- I just removed critical, but I see that you were the one who downgraded from blocker to critical. When you read through the text, did you see anything that you would obviously improve? If so, could you just jot down a few bullet points for someone in the future to come through that make those changes? Otherwise, if the text is currently ok and/or you don't see any obvious improvements, then I think we should close this issue as completed. |
I think the only item that stood out to me was that the description was focusing on mpirun/prrte related behavior, and was making only few statements on direct launch. This was however also the reason that I thought that just dropping in a line about oversubscription + direct launch in one of the paragraphs will not help, and will just disturb the current flow. |
This is a common misconception, so I'll just explain it a bit here (it is included in the PRRTE documentation as well). HWLOC cannot provide any info on the number of slots as "slot" is not a hardware-related concept. It is simply a number indicating the number of processes that are allowed to be executed within this allocation on the given node by the user. The number can be set to equal the number of CPUs (cores or hwts) on the node for dedicated allocations (i.e., where nodes are not shared) if the sys admin elects to do so. This is often the case, which is why people conflate the two concepts. Oversubscribed therefore has nothing to do with the number of CPUs on the node. It simply indicates that you are running more processes on the node than the allocated limit. Unmanaged system have no mechanism for detecting and/or controlling such behavior, but even managed systems can be oversubscribed when This is why PRRTE has to provide the "oversubscribed" flag. We chose to have PMIx convey it so that there would be a "standardized" way of getting the info. However, note that it would be highly unlikely that you would be "oversubscribed" during a direct launch - Slurm would definitely refuse to start more procs than allocated slots, thereby preventing even the possibility for operating oversubscribed. I don't know about other environments, but I very much doubt that any of them would allow you to direct launch an oversubscribed job. |
As far as I can see the documentation is correct. I am closing this ticket because of this as complete. |
Thank you Ralph for the explanation! |
Is "aggressive" mode really determined by the slot count provided by PRRTE? Or is it determined by a query to hwloc with a reference to the number of processes per node. It just surprises me that this part of OMPI is controlled by PRRTE instead of something more generic that might work with, say, Slurm direct launch via srun. (from @jjhursey)
The text was updated successfully, but these errors were encountered: