Skip to content

RPI 3B+ state of /sys/class/net/eth0/carrier stuck at 1 #1100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brubbel opened this issue Jan 23, 2019 · 22 comments
Closed

RPI 3B+ state of /sys/class/net/eth0/carrier stuck at 1 #1100

brubbel opened this issue Jan 23, 2019 · 22 comments

Comments

@brubbel
Copy link

brubbel commented Jan 23, 2019

TLDR; The eth0 carrier state flag gets stuck at '1' if /sys/class/net/eth0/carrier is polled within <1 second after unplugging the network cable. Systems that depend on it then fail (e.g. route).

Hardware (3 devices were tested, on 2 different networks):

  1. RPI3B+ (rev. a020d3), new raspbian (kernel 4.14.79-v7+).
  2. Raspberry pi official power supply (5.1V, 2.5A)
  3. peripherals: ethernet cable, usb keyboard (dell), hdmi screen (dell).

Setup

  1. OS (2018-11-13-raspbian-stretch-lite.img) is written to SD-card, ssh is not enabled.
  2. ethernet cable is plugged in before boot.
  3. local login.
  4. cat /sys/class/net/eth0/carrier (do enter yet!)
  5. unplug ethernet cable
  6. enter within 1 second after unplug.
  7. (To retry a reboot is needed.)

Result
The /sys/class/net/eth0/carrier flag gets stuck at '1'.
Also, the 'route' command hangs as the os probably doesn't know the interface is down (route -n works).

Expected result
The /sys/class/net/eth0/carrier flag becomes '0'

Situation where this issue is relevant
A script may be checking the carrier flag to failover to WiFi, for example every 10 seconds. However, every once in a while a race condition can occur where this bug is triggered on disconnecting the cable.

@brubbel
Copy link
Author

brubbel commented Jan 23, 2019

@XECDesign
Copy link
Contributor

This is also affecting dhcpcd, which fails to switch over to wifi if the cable is unplugged.

In my case, I can reproduce this by booting with the ethernet cable plugged in and running ip monitor link dev eth0. Then unplugging or plugging the cable back in shows no activity.

If I boot with the cable unplugged, it works as expected.

The issue is present in 4.14 and 4.19.

I tried a patch that adds tasklet_schedule(&dev->bh); back into lan78xx_open, but that made no difference.

@brubbel
Copy link
Author

brubbel commented Jan 28, 2019

One remarkable thing is that ETHTOOL_GLINK (SIOCETHTOOL ioctl) reports the correct link status, so it must be that the carrier status flag is not extracted from what the hardware reports in real time.
I suppose that the driver buffers the carrier state for performance reasons.

e.g. ethtool eth0

@XECDesign
Copy link
Contributor

When you run ethtool, it reads the appropriate register, but only reports the value without tracking any changes. The driver has a different code path to handle carrier state changes, which is where the issue is. I haven't looked into how it works, but it's either an interrupt or polling mechanism that doesn't get triggered.

@JamesH65
Copy link
Contributor

Looking at https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L1179, this is the section of code that handles the link being dropped, I wonder if there needs to be some sort of flushing of the various SKB's etc that might be in flight. For example, there was a recent fix to the end of this function https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L1237 that makes sure requried processing is done when the link comes up. Complete guess though.

What i don;t understand is why the polling seems to kill it, that's just a read as far as I cna see, so would not expect that to cause this. Unless there's a mutex issue somewhere perhaps?

@JamesH65
Copy link
Contributor

I don't seem to be able to replicate this on 4.14.90 using any of the mechanisms above. Don't think there is anything unusual in my kernel setup, non-tainted, so a pure 4.14.90.

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

Bug is still there. Just tested with 4.14.90-v7+ (upgraded from 4.14.79-v7+).
Link speed is 1000Mb/s when connected, if this may be relevant.

When cable is unplugged:

Linux raspberrypi 4.14.90-v7+ #1183 SMP Fri Dec 21 14:03:50 GMT 2018 armv7l GNU/Linux

pi@raspberrypi:~ $ sudo ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                             100baseT/Half 100baseT/Full 
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: no
pi@raspberrypi:~ $ cat /sys/class/net/eth0/carrier
1

@JamesH65
Copy link
Contributor

Odd. Not sure why I am not seeing it then. Tried the cat /sys/class/net/eth0/carrier as soon as I removed the ethernet cable, along with @XECDesign ip monitor test, and all worked fine. I'll try again.

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

Strange. Since the rpi-update (and downgrade to original kernel), carrier is always stuck at 1, unless I boot without cable. Will check a second RPI in a moment.

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

Second RPI (rpi-update to 4.14.94-v7+), cable is unplugged.

cat /sys/class/net/eth0/carrier
switches between
cat: /sys/class/net/eth0/carrier: Invalid argument
and
1

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

More info (back to fresh SD card). I have the impression that thing became worse after the rpi-update.

Boot with cable: carrier stuck (only unbind/bind the driver solves it, and it stays solved)
Boot without cable: works! (carrier switches nicely as it should do)

pi@raspberrypi:~ $ cat /etc/rpi-issue 
Raspberry Pi reference 2018-11-13
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 7e0c786c641ba15990b5662f092c106beed40c9f, stage2

pi@raspberrypi:~ $ vcgencmd version
Nov  4 2018 16:31:07 
Copyright (c) 2012 Broadcom
version ed5baf9520a3c4ca82ba38594b898f0c0446da66 (clean) (release)

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux

@XECDesign
Copy link
Contributor

I doubt it makes a difference, but as a data point - I have wifi connected at the same time. I could also give you my sd card tomorrow if it's not reproducible on your setup.

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

Just tested: the desktop version (2018-11-13-raspbian-stretch.img) does not show the problem.
The lite version (2018-11-13-raspbian-stretch-lite.img) does.

I have the impression that 'something' is happening too fast after boot in the lite version, which can explain why an unbind/bind cycle solves the issue.
Could it be that the LAN7515 chip is not (completely) ready yet before the kernel initiates communication?

@XECDesign
Copy link
Contributor

I was using the desktop version.

@brubbel
Copy link
Author

brubbel commented Jan 29, 2019

I suppose it isn't possible to recompile/insmod the driver without recompiling the kernel?

@XECDesign
Copy link
Contributor

It is, but it's a bit of a hassle. Install raspberrypi-kernel headers and compile the module like it's out of tree. https://www.kernel.org/doc/Documentation/kbuild/modules.txt

@brubbel
Copy link
Author

brubbel commented Mar 18, 2019

Confirmed: happens also on 3B+, "Raspbian Stretch desktop 2018-11-13", 4.14.98-v7+.
Network does not (always) switch to WiFi when cable is unplugged. Manual ifconfig eth0 down solves this.
I didn't look further into the matter yet though.

@maxnet
Copy link

maxnet commented Apr 11, 2019

I have a theory that this is caused by that:

  1. the phy code tells the NIC to generate link status change notification "interrupts" right away on boot. lan78xx_probe() -> lan78xx_phy_init() -> phy_connect_direct() -> phy_start_interrupts() -> lan88xx_phy_config_intr()
  2. however it only opens USB communication to listen for and handle such "interrupts" much later when the network interface is brought up (by dhcpcd) and lan78xx_open() gets called

By the time it gets to 2 the phy has already detected carrier, and we missed the interrupt about it.
Since the interrupt was not acknowledged, it does not send any new ones for future events either.
(When the interface is first brought up it does attempt to setup connection regardless of receiving the interrupt, so it going missing isn't really a problem then, but you do notice it when the link goes down)

Seems to work better with the following hack:

diff -ur linux-rpi-4.19.y.orig/drivers/net/usb/lan78xx.c linux-rpi-4.19.y/drivers/net/usb/lan78xx.c
--- linux-rpi-4.19.y.orig/drivers/net/usb/lan78xx.c	2019-04-10 14:31:00.000000000 +0200
+++ linux-rpi-4.19.y/drivers/net/usb/lan78xx.c	2019-04-11 17:19:04.360246346 +0200
@@ -2692,6 +2692,9 @@
 				  "intr submit %d\n", ret);
 			goto done;
 		}
+		/* ack any previous interrupts we may have missed. */
+		if (net->phydev->drv->ack_interrupt)
+			net->phydev->drv->ack_interrupt(net->phydev);
 	}
 
 	lan78xx_init_stats(dev);

But will leave a proper fix to someone else.
Not sure what the CLEAN way to handle this is.

@MichaelM223
Copy link

Is this being worked on at the moment?
I'm having problems with the link going down not being detected and hence the Pi (3B+) won't failover to a 3G dongle. (default route for eth0 does not change).

Built a kernel with the 'hack' mentioned above for the lan78xx module but that does not change anything either unfortunatly.

@Rubs1er
Copy link

Rubs1er commented Aug 12, 2020

has anyone found a solution?
/sys/class/net/eth0/carrier stuck in 1 if boot with cable
also I notice that
after down and up eth0
/sys/class/net/eth0/carrier stuck in 0

5.4.51-v7+
edit: problem exist on 3b+ but doesnt exist on 4b

@pelwell
Copy link
Contributor

pelwell commented Dec 16, 2020

A possible fix for this is in the current rpi-update kernel, which has just switched to rpi-5.10.y.

@brubbel brubbel closed this as completed Dec 19, 2020
@ruimo
Copy link

ruimo commented Apr 23, 2021

Hi, I found exactly the same issue still existing and rpi-update does not work. Any suggestions?

Procedure:

  1. Burn SD card using rpi-imager, selected other => Raspberry Pi OS Lite (32-bit).
  2. Turn on Raspberry Pi with burned SD.
  3. ping raspberrypi.local works correct.
  4. Unplug the ethernet cable.
  5. Wait a minute.
  6. Plug the ethernet cable back.
  7. ping raspberrypi.local will not work any more. Ping IP where the IP shown in 3) also does not work.

Environment:
Raspberry Pi 3
Raspberry Pi OS Lite (32bit)

rpi-update shows the following version:
*** depmod 5.10.31-v7+
*** depmod 5.10.31-v7l+
*** depmod 5.10.31-v8+
*** depmod 5.10.31+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants