-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Pi3b+: "kevent 4 may have been dropped" with lan78xx #2447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Bugger, thought the kevent issue was ancient history. IIRC, it is because an event is stuck on the system event queue and then another one turns up before it is serviced. But it's not been reported for a couple of years, and that was on the smsc75xx driver. Not sure this is the same thing though, might be a systemd initialisation issue. |
The system I am developing and compiling the raspberrypi kernel for is not using systemd at all but is based on buildroot without systemd. |
OK, do you get the same problem with Raspbian? |
Haven‘t tested it yet with Raspbian. However, this is the kernel defconfig I am using: |
I have the same issue with Compute Module 3. But it looks like a hardware issue. It's not happening on another setup which is identical. |
I am also using Buildroot to make my own image, and I managed to resolve this issue by disabling VLAN support in kernel.
After disabling VLAN support ( Firmware is ae9a493932e47e08cabb25a2728037298075fd00 and userland is a343dcad1dae4e93f4bfb99496697e207f91027e. |
@danijelt @jens-maus We have committed three or four fixes to VLAN support in the lan78xx driver since 4.14.33, so its probably worth getting the latest kernel and seeing if it has fixed the kevent issue with VLAN enabled. There are also other lan78xx fixes in there, so well worth trying to see if the kevent issues has gone. If you do find it has gone, please close this issue. |
@danijelt @jens-maus If no further reports/updates are forthcoming, I'm inclined to close this report. |
We have released a stable production image without VLAN and I'm currently occupied with another project. I'll keep this in mind during the next update and report here if kevent/VLAN issue persists. |
I also saw this some times on my Pi 3 B+. Currently the Pi is completely dead, on console screen the
messages go wild meaning every second about 50 of those lines are shown. Only one option: hard reset of the system. Resulting in fsck errors and so on. Raspbian Stretch by the way. 1 Gbit/s ethernet port. |
@bcutter Are you using the latest kernel? You can apt update, to get an official release, or rpi-update to get the latest bleeding edge one. You might need rpi-update, but backup first. There have been a number of kernel updates so worth trying. |
Currently on 4.14.34-v7+. Latest official release. I try to avoid using rpi-update, cause of the risk of running into other issues (maybe fix one and maybe grab some others - no perfect chance/risk ratio :-))... maybe my opinion will change temporarily if this kevent4 keeps on appearing. |
I'm also experiencing this issue occasionally, 4.14.48-v7+. |
Updated to 4.14.48-v7+ recently, will monitor future behaviour. |
Please note, the 3B error reporting kevent 0 is, I think, a different issue - please keep this thread to 3B+. For those people using 3B+, do you still see the issue after a rpi-update? Report above seems to indicate it has gone away with the latest, but others say no, but I would like to confirm that it is still present, using the latest kernel on the 3B+. |
Also, can I just confirm this only happens during startup? Or have people encountered this error in general usage? |
I always experienced this during normal usage a few hours or even days after startup. |
I'm no longer able to reproduce it, but when I could, it was by running |
Just some notes from a a quick look at the driver code. kevent 4 is EVENT_LINK_RESET. This is only fired off in two places, first during opening the driver, and secondly when an interrupt is received (via USB, so not a 'direct line') from the PHY. I believe the only time an interrupt is received requires the link to be reset so this makes sense. The action taken when the kevent is processed is fairly minimal, most work done is in lan78xx_link_reset, which does a bunch of phy/register reads and writes. Nothing I've seen that would take any real amounts of time. Note though that register reads all go via USB, so they could be delayed if the other end is busy. |
Been giving the latest kernel some stress testing on a debug enabled lan78xx driver. Running stress-ng --cpu 4, so loading the CPU's right up, then making the ethernet go up and down as quickly as possible (ip link set dev eth0 down/up), whilst monitoring for any kevents. No errors at all after well over an hour of this. Will leave it running for a few more hours. |
Cannot get this to go wrong. Run test scripts for hours, no issues. Clearly not exercising the right runes. If anyone seeing this has anything unusual in their networking setup, please let me know here. But for the moment I have to look at some other stuff. |
The only two „special“ things I have are:
Hope this helps a bit. |
I have tried with latest 4.14.50 kernel (commit 3b01f05):
Still having the same problem when I enable VLAN support:
My setup: Buildroot 2017.05, systemd 232, NetworkManager 1.8.0. kevent error appears appears after systemd-hostnamed service is started and kernel notifies that VLAN 0 is added. |
@danijelt Does this only occur during startup? Does it ever cause any problems or does everything still work OK? |
Configuring the VLAN mask and setting up multicast subscriptions both use worker threads VLAN takes data that is precomputed, and multicast works out the settings in the worker, so being requested twice before having been called for the first time should have no side effect (minor assumption that the flag is cleared before the worker task is executed). |
@JamesH65 Yes, only during startup. Once it boots (if kevent 4 doesn't happen), I don't have any problems anymore. I don't do anything special with the network, just bring it up at boot and leave it that way. I'm not even touching anything VLAN related, adding VLAN 0 is probably NetworkManager's (or kernel's) default behaviour. |
commit 30a41ed upstream. With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 30a41ed upstream. With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 30a41ed32d3088cd0d682a13d7f30b23baed7e93 upstream. With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 689d5be)
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed32d3088cd0d682a13d7f30b23baed7e93 ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 30a41ed32d3088cd0d682a13d7f30b23baed7e93 upstream. With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 9c31ac781950dedfb4e4dd22a8f49768d10e8096) Signed-off-by: Jack Vogel <[email protected]>
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit that the driver pays attention to is "link was reset". If there's a flapping status bit in that endpoint data, (such as if PHY negotiation needs a few tries to get a stable link) then polling at a slower rate would act as a de-bounce. See: #2447
[ Upstream commit 30a41ed ] With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] raspberrypi/linux#2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/[email protected] Fixes: 792aec4 ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Every second or third boot with a 4.14.26 kernel where the LAN78XX support had been enabled I receive the following notices in
dmesg
:Sometimes the system still continues and is able to setup eth0 properly. However, every now and then the system is not able to bring up eth0 and thus the boot up process is interrupted.
Any idea where this might come from or how to debug this situation?
The text was updated successfully, but these errors were encountered: