Skip to content

oob/tcp: correctly fallback to the loopback interface #6315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ggouaillardet
Copy link
Contributor

vir* interfaces are ignored but they prevented from falling
back to the loopback interface if needed.

Thanks Patrick Begou for reporting this

Signed-off-by: Gilles Gouaillardet [email protected]

vir* interfaces are ignored but they prevented from falling
back to the loopback interface if needed.

Thanks Patrick Begou for reporting this

Signed-off-by: Gilles Gouaillardet <[email protected]>
@rhc54
Copy link
Contributor

rhc54 commented Jan 29, 2019

I have no real objection to this PR, but I don't understand how your comment has anything to do with it. We still ignore vir interfaces after this change. FWICT, the only way this change impacts anything is if opal_ifcount is returning a value greater than 1 when only the loopback device is present - which seems incorrect. Or are you saying that someone has virtual interfaces active, but no real ones, and so opal_ifcount is coming back with an incorrect number?

@ggouaillardet
Copy link
Contributor Author

if a machine only has lo and virbr0, opal_ifcount() is 2 so lo is skipped by oob/tcp, and since we always skip virbr0, we end up with no available interface and mpirun fails.

this PR fixes that.

Can you please suggest a better wording ?

@rhc54
Copy link
Contributor

rhc54 commented Jan 29, 2019

I think perhaps we should ask a different question. If opal_ifcount() is including virtual interfaces, then that implies that we support those - does it not? I'm guessing the TCP BTL might want those virtual interfaces to connect from the host machine to a VM, yes? If so, then shouldn't the oob/tcp also support them so that we can connect to a daemon on the VM?

In other words, is the correct answer that:

  1. do not skip virtual interfaces
  2. add the ipv4loops array to the end of the ip4conn argv array
  3. allow the oob/tcp component to attempt to connect to the peer using the entire array of options. I'm assuming that the virtual interfaces will reject the "self" connection for mpirun, and so only the loopback will actually work

Another alternative would be to keep both argv arrays (loopbacks separate from non-loopback devices). If the connection is being made to a daemon on my own node, then always just use the loopback array. If the connection is to a daemon on some other node, then use the non-loopback array.

Again, it all seems based on why opal_ifcount() is including virtual interfaces. If it shouldn't be doing so, then that would seem the correct fix. If it should be doing so, then oob/tcp probably shouldn't be ignoring them.

@ggouaillardet
Copy link
Contributor Author

all I can say for now is oob/tcp does explicitly discard virtual interfaces, though btl/tcp uses them.
I cannot remember the rationale for oob/tcp (git blame points to a2919174 but the commit message does not mention this part).

@jsquyres @bosilca any thoughts ?

@ggouaillardet
Copy link
Contributor Author

Now I remember ... we disable the virtual interfaces because it is not uncommon they all share the same IP on all the nodes of a cluster. In some cases, we need to use it so we should add yet an other MCA param.

@rhc54
Copy link
Contributor

rhc54 commented Jan 29, 2019

IIRC, I skipped the virtual interfaces because we would pick them up, but they don't support loopback. Thus, mpirun couldn't use them to send to itself. You raise another good point as well - the problem is that we might need to talk to a daemon in a local VM but also need to talk to daemons on other host machines that include the same virtual interface.

I guess the question is: how is the btl/tcp component sorting this out? Can we use their logic here? No point in recreating the wheel, and I'm a little concerned that we might not allow a daemon-to-daemon connection when we would allow proc-to-proc communications.

@jsquyres jsquyres requested review from bwbarrett and bosilca January 29, 2019 15:43
@jsquyres
Copy link
Member

We talked about this on the webex today.

  1. The consensus seemed to be that if we try to "fix" this (even if we just change the default values for if include/exclude for BTL and OOB), we'll fix it for this guy and likely break it for someone else.
  2. For v4.0.x, the solution then feels like we should just document the MCA params that fix it for this user.
  3. For v5.0.x, we should finish the "reachable" framework -- that's a pretty definitive way to determine whether a peer is reachable via a given interface or not. Then we don't have to rely on heuristics that may or may not work for everyone.

@jsquyres
Copy link
Member

@bosilca We agreed on the call today to not merge this PR. See #6315 (comment). Do you feel differently?

@bosilca
Copy link
Member

bosilca commented Jan 29, 2019

No I'm fine either way. It's just that we have here a simple fix for most 'trivial' cases but we prefer to delay without a clear deadline a potential solution .

@ggouaillardet
Copy link
Contributor Author

@jsquyres My understanding of your previous comment is this PR might break something, and unless I am missing something, I think it does not.

Here is the issue the user is facing and this PR fixes

  • 3 interfaces on his laptop : lo, virbr0, and wifi
  • oob/tcp always discards virbr0 (this is hardcoded), and since opal_ifcount() == 3, lo is discarded too
  • oob/tcp (only) uses the wifi interface, and he can mpirun hostname on his laptop.

Now the users takes the train and the wifi interface goes down.

  • oob/tcp still discards virbr0 since this is hardcoded
  • opal_ifcount() == 2, so lo is discarded too
  • oob/tcp ends up with no available interface and mpirun cannot start.

If the user did not have virbr0, then oob/tcp would have used lo (since opal_ifcount() == 1) and mpirun hostname would have always worked regardless the status of his wifi connection.

My analysis is oob/tcp cannot use opal_ifcount() to determine whether lo should be used or discarded, and this PR fixes that and only that. Also, I believe it does not break anything else.

  • opal_ifcount() is not updated (long story short, btl/tcp is behavior is not altered with to vir* interfaces, and in this specific case it does not even matter since btl/vader will be used).
  • virbr0 is still never used by oob/tcp (which means mpirun cannot start a MPI app on a VM/container only accessible from the host via virbr0). This is currently unsupported (vir* interfaces are never used by oob/tcp and this is hardcoded), and this PR does not address this.

I already provided a workaround to the user
(either shutdown virbr0 or mpirun --mca oob_tcp_if_include lo when the wifi is down),
but this is not very user-friendly.

I definitely agree we have to improve things in a near future, and unless we have a plan/roadmap/deadline for that or this PR does break something, I'd rather merge it and backport it to the release branches in order to fix the issue reported by the user (no more, no less).

@rhc54
Copy link
Contributor

rhc54 commented Jan 30, 2019

Guess I'm confused. As Jeff documented, we do have a plan/roadmap to go forward - it consists of documenting the very simple MCA parameter solution to the problem. Remember, we are talking about a near-zero population of people experiencing the problem. The MCA solution for v4.0.x is just fine and low risk. The problem with this PR is that we won't know that we broke somebody else until they start complaining about it - and then it's too late. We've been there - this problem of how to handle loopback and virtual interfaces has continued to haunt us over the years. Every time we fix it, we break it.

Fixing it for v5 will take more time, but we have agreement on doing it and a rough timetable. It is the long-term correct solution.

@ggouaillardet
Copy link
Contributor Author

I think I got it now, the short term plan is to do nothing in existing release branches, but instead document

use --mca oob_tcp_if_include lo if the only available interfaces are lo and vir*.

@rhc54
Copy link
Contributor

rhc54 commented May 28, 2019

We should go ahead and close this as we are not going to use it. Someone can do the recommended documentation if so inclined.

@rhc54 rhc54 closed this May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants