Skip to content

v4.0.x: usnic fixes and optimizations #7043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Oct 5, 2019

See individual commit messages for more details.

Rename "get_nsec()" to "get_ticks()" to more accurately reflect that
this function has no correlation to wall clock time at all.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit ce2910a)
New MCA parameter: btl_usnic_ack_iteration_delay.  Set this to the
number of times through the usNIC component progress function before
sending a standalone ACK (vs. piggy-backing the ACK on any other send
going to the target peer).

Use "ticks" language to clarify that we're really counting the number
of times through the usNIC component DATA_CHANNEL completion check (to
check for incoming messages) -- it has no relation to wall clock time
whatsoever.

Also slightly change the channel-checking scheme in usNIC component
progress: only check the PRIORITY channel once (vs. checking it once,
not finding anything, and then falling through the progress_2() where we
check PRIORITY again and then check the DATA channel).

As before, if our "progress" libevent fires, increment the tick
counter enough to guarantee that all endpoints that need an ACK will
get triggered to send standalone ACKs the next time through progress,
if necessary.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 968b1a5)
Significantly increase the default retrans timeout.  If the
retrans timeout is too soon, we can end up in a retransmission storm
where the logic will continually re-transmit the same frames during a
single run through the usNIC progress function (because the timer for
a single frame expires before we have run through re-transmitting all
the frames pending re-transmission).

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 3cc95d8)
New MCA param: btl_usnic_max_resends_per_iteration.  This is the max
number of resends we'll do in a single pass through usNIC component
progress.  This prevents progress from getting stuck in an endless
loop of retransmissions (i.e., if more retransmissions are triggered
during the sending of retransmissions).  Specifically: we need to
leave the resend loop to allow receives to happen (which may ACK
messages we have sent previously, and therefore cause pending resends
to be moot).

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 27e3040)
Move the prefix area from the head to the body in relevant size
computations.  This fixes a problem in high traffic situations where
usNIC may have sent from unregistered memory.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit fe7f772)
@jsquyres jsquyres added this to the v4.0.2 milestone Oct 5, 2019
@gpaulsen gpaulsen modified the milestones: v4.0.2, v4.0.3 Oct 7, 2019
@gpaulsen
Copy link
Member

gpaulsen commented Oct 7, 2019

Sorry, too late for v4.0.2, moving to v4.0.3

It was previously accidentally set to 0.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 132e4ca)
Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 3080033)
Copy link

@nrescobar nrescobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.

Note: I am a coworker of Jeff's at Cisco.

@jsquyres
Copy link
Member Author

@gpaulsen @hppritcha Can you approve and merge? I have no one left in the OMPI community who can review usNIC stuff, so I got a co-worker to review it for me (but he doesn't have write perms to the OMPI repo, so his review "doesn't count" in terms of github's requirements).

Thanks.

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little curious about the long description of making the timeout longer, and then the lack of description on making it shorter, but I assume you have your reasons and it's not wrong.

@jsquyres
Copy link
Member Author

Yes, I probably could/should have made that explanation better.

The real reason was that I merged the commits on master before getting the code review. In the code review, it came out that it really wasn't necessary to increase the timeout -- 5 miliseconds is long enough already. The real reason for (effectively) that timeout being too short was the lack of a cap on retransmitting during a single iteration of usNIC progress. I.e., when sending many large messages to a very large number of peers, we could get into a situation where the retransmit queue got so long that it effectively took longer to retransmit everything than the retrans timers on the sends. Hence, by the time we finished retransmitting everything, at least some of the timers had already expired, which would then lead to immediately retransmitting again, ...etc. Meaning: the retrans timer wasn't the issue. The amount of work we allowed to happen during a single iteration of usNIC progress was the problem.

@gpaulsen
Copy link
Member

Thanks @bwbarrett for reviewing.
@hppritcha and I haven't yet opened the v4.0.x for merging, but once we open that, we'll get this in.

@hppritcha
Copy link
Member

Ready to merge imho

@hppritcha hppritcha merged commit 106109a into open-mpi:v4.0.x Oct 22, 2019
@jsquyres jsquyres deleted the pr/v4.0.x/usnic-fixes-and-optimizations branch July 17, 2022 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants