Skip to content

TCP BTL wireup fails when the networking is "strange", as seen on BigRed. #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ompiteam opened this issue Oct 1, 2014 · 9 comments
Closed
Assignees
Labels
Milestone

Comments

@ompiteam
Copy link
Contributor

ompiteam commented Oct 1, 2014

The "new" TCP wireup code introduced in r17450 fails on !BigRed (PPC64), see [http://www.open-mpi.org/mtt/index.php?do_redir=846 MTT-permalink]. As best I can tell, the problem is caused by the IP alias setup on !BigRed's global ethernet device (eth1 and eth1:1). To run over ethernet on !BigRed you need to use these MCA parameters, since the other ethernet device is only wired to other nodes within a single rack:

-mca oob_tcp_include eth1
-mca pml ob1
-mca btl tcp,self
-mca btl_tcp_if_include eth1

The above works on the 1.2 branch, and the trunk prior to r17450. If we get an allocation within a single rack, you can successfully use -mca btl_tcp_if_include eth0 on any OMPI version.
Also, things work if we use the IP over Myrinet via -mca btl_tcp_if_include myri0.

Here is the output of /sbin/ifconfig on one of the compute nodes:

eth0      Link encap:Ethernet  HWaddr 00:11:25:C9:23:96  
          inet addr:10.1.2.156  Bcast:10.1.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:107672277 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38001239 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12364537258 (11791.7 Mb)  TX bytes:5680957916 (5417.7 Mb)
          Interrupt:33 Memory:a0030000-a0040000 

eth1      Link encap:Ethernet  HWaddr 00:11:25:C9:23:97  
          inet addr:10.2.2.156  Bcast:10.2.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:263224319 errors:0 dropped:0 overruns:0 frame:0
          TX packets:164937792 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1801179733170 (1717738.8 Mb)  TX bytes:158800164724 (151443.6 Mb)
          Interrupt:34 Memory:a0010000-a0020000 

eth1:1    Link encap:Ethernet  HWaddr 00:11:25:C9:23:97  
          inet addr:149.165.233.59  Bcast:149.165.233.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          Interrupt:34 Memory:a0010000-a0020000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:242182780 errors:0 dropped:0 overruns:0 frame:0
          TX packets:242182780 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:568226732655 (541903.2 Mb)  TX bytes:568226732655 (541903.2 Mb)

myri0     Link encap:Ethernet  HWaddr 00:60:DD:47:D7:1E  
          inet addr:10.4.2.156  Bcast:10.4.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:5693807 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5775378 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16170960666 (15421.8 Mb)  TX bytes:17390161910 (16584.5 Mb)
          Interrupt:40 

This is a regression from the 1.2 branch, thus I mark this as critical.

@ompiteam ompiteam self-assigned this Oct 1, 2014
@ompiteam ompiteam added this to the Future milestone Oct 1, 2014
@ompiteam ompiteam added the bug label Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Imported from trac issue 1505. Created by timattox on 2008-09-17T16:00:55, last modified: 2010-01-26T11:17:50

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by timattox on 2008-09-23 11:43:27:

Since we (the developers) don't have other machines available to test my theory of IP aliasing as the cause of the failure, we are dropping this to just a major. Hopefully I will get a chance to walk-thru the code to see how it is failing before we release 1.3.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by timattox on 2008-09-24 16:00:36:

Note to whoever looks into this: opal_ifinit() in opal/util/if.c may be a place to start adding some debugging output... although it wasn't changed by r17450.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by timattox on 2008-10-22 14:58:31:

I won't have time to work on this before we release 1.3, so moving it to 1.3.1

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2008-11-11 13:07:10:

FWIW, Jon Mason at Chelsio ran on a cluster with 2 IP aliases and it all seemed to work fine. He ran IMB-MPI1 and it worked fine for him...

Here's the interfaces:

[root@r1-iw ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:78:A0:16 
          inet addr:10.192.176.121  Bcast:10.192.176.255
Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fe78:a016/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3187462 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3107424 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:3044098816 (2.8 GiB)  TX bytes:3101506292 (2.8 GiB)
          Base address:0x2000 Memory:c9000000-c9020000

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2278857 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2278857 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:14554156386 (13.5 GiB)  TX bytes:14554156386 (13.5
GiB)

rnic0     Link encap:Ethernet  HWaddr 00:07:43:05:17:11 
          inet addr:192.168.1.121  Bcast:192.168.1.255
Mask:255.255.255.0
          inet6 addr: fe80::207:43ff:fe05:1711/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:669315 errors:0 dropped:0 overruns:0 frame:0
          TX packets:670854 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000
          RX bytes:3093950750 (2.8 GiB)  TX bytes:3098535465 (2.8 GiB)
          Interrupt:16 Memory:c8201000-c8201fff

rnic0:iw  Link encap:Ethernet  HWaddr 00:07:43:05:17:11 
          inet addr:192.168.2.121  Bcast:192.168.2.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          Interrupt:16 Memory:c8201000-c8201fff

rnic1     Link encap:Ethernet  HWaddr 00:07:43:05:06:94 
          inet addr:192.168.3.121  Bcast:192.168.3.255
Mask:255.255.255.0
          inet6 addr: fe80::207:43ff:fe05:694/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:666969 errors:0 dropped:0 overruns:0 frame:0
          TX packets:669669 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000
          RX bytes:3072544717 (2.8 GiB)  TX bytes:3094301135 (2.8 GiB)
          Interrupt:16 Memory:c9101000-c9101fff

rnic1:iw  Link encap:Ethernet  HWaddr 00:07:43:05:06:94 
          inet addr:192.168.4.121  Bcast:192.168.4.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          Interrupt:16 Memory:c9101000-c9101fff

[root@r1-iw ~]# ping r2-iw
PING r2-iw (192.168.1.122) 56(84) bytes of data.
64 bytes from r2-iw (192.168.1.122): icmp_seq=0 ttl=64 time=0.059 ms
64 bytes from r2-iw (192.168.1.122): icmp_seq=1 ttl=64 time=0.047 ms

--- r2-iw ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.047/0.053/0.059/0.006 ms, pipe 2
[root@r1-iw ~]# ping r2-rdma
PING r2-rdma (192.168.2.122) 56(84) bytes of data.
64 bytes from r2-rdma (192.168.2.122): icmp_seq=0 ttl=64 time=0.073 ms
64 bytes from r2-rdma (192.168.2.122): icmp_seq=1 ttl=64 time=0.040 ms
64 bytes from r2-rdma (192.168.2.122): icmp_seq=2 ttl=64 time=0.042 ms

--- r2-rdma ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.040/0.051/0.073/0.017 ms, pipe 2

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by timattox on 2009-02-10 16:00:15:

Based on the previous comment, the title of this ticket is probably wrong.
The problem might not have anything to do with IP aliases, so I'm changing it to a more generic title.

For our !BigRed machine, a workaround is to use -mca btl_tcp_if_include myri0.
I won't have time to look into this any further, and since IU has a workaround,
I'm "disowning" the ticket.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by bosilca on 2009-07-07 22:34:45:

Now that we have the ability to define more precisely what a private IP address is (see #1821) I wonder if this cannot be fixed by providing the correct private addresses on the nodes. I don't have access to !BigRed, but if somebody can test the following MCA parameter this might allow us to close this ticket.

--mca opal_net_private_ipv4 "10.1.0.0/16;10.2.0.0/16;172.16.0.0/12;192.168.0.0/16;169.254.0.0/16"

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jjhursey on 2010-01-26 11:17:50:

Will most likely not have time to investigate this in the near term. Moving to Future so it does not get lost.

rhc54 pushed a commit that referenced this issue Oct 31, 2014
Disable sendi optimization for GPU buffers
@bwbarrett bwbarrett assigned bwbarrett and unassigned ompiteam Oct 19, 2018
@bwbarrett
Copy link
Member

With #7134, this should no longer be a problem. Since BigRed has been dead for a decade, there's no good way to test this specific bug. I'm going to close this ticket, since we believe we're fixed and can't prove it either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants