Skip to content

RTL8153 resets (under load) #5239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gongsearch opened this issue Nov 4, 2022 · 6 comments
Open

RTL8153 resets (under load) #5239

gongsearch opened this issue Nov 4, 2022 · 6 comments

Comments

@gongsearch
Copy link

Describe the bug

Using a R8153 usb3 ethernet adapter (on a usb3 port) and putting the device under load makes the device reset: oops -> timeout -> device-reset, see log below.

If new device-firmware is supplied at boot time (initrd), networking cannot be recovered after the reset (See add. context). Using the device-firmware the reset leads to a network-dropout for some time and then networking nearly always recovers.

Sometimes this happens also if there is not much traffic. Putting the device under load reproducably triggers this in a short time.

Steps to reproduce the behaviour

  • Use the following gigabit ethernet-adapter (tried different vendors) on a USB3-Port (directly or via a powered hub): 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
  • Put the device under load (iperf3 -c 192.168.178.1 --bidi -tinf)
  • Wait a moment. In my setup the error occurs nearly always within 10 minutes.

Device (s)

Raspberry Pi 4 Mod. B

System

cat /etc/rpi-issue
Raspberry Pi reference 2022-01-28
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, fbe448ccdc995d295d24c7596e5f0ef62cc2488f, stage2

vcgencmd version
Oct 26 2022 11:09:21
Copyright (c) 2012 Broadcom
version c72ad6b26ff40c91ef776b847436094ee63fabee (clean) (release) (start_cd)

uname -a
Linux server 5.15.76-v8+ #1596 SMP PREEMPT Mon Oct 31 17:15:15 GMT 2022 aarch64 GNU/Linux

Logs

2022-11-04T14:09:31 server kernel:[75215.928853] ------------[ cut here ]------------
2022-11-04T14:09:31 server kernel:[75215.928881] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
2022-11-04T14:09:31 server kernel:[75215.928941] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:478 dev_watchdog+0x398/0x3a0
2022-11-04T14:09:31 server kernel:[75215.928963] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_comment ip6_tables xt_connmark ip6t_rpfilter nf_conntrack_netlink xt_recent xt_TCPMSS xt_tcpmss ipt_REJECT nf_reject_ipv4 xt_addrtype xt_multiport xt_set xt_tcpudp xt_hashlimit xt_mark xt_conntrack ipt_rpfilter nft_counter nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat xt_NFLOG xt_LOG nf_log_syslog nf_tables ip_set_hash_ip ip_set_hash_net ip_set overlay nfnetlink_log nfnetlink binfmt_misc btrfs blake2b_generic xor xor_neon zstd_compress rtc_ds1307 regmap_i2c raid6_pq ch341 raspberrypi_hwmon ftdi_sio sg i2c_bcm2835 cdc_acm usbserial uio_pdrv_genirq uio ip6t_eui64 ax88179_178a wireguard libchacha20poly1305 chacha_neon poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha macvlan pppoe pppox ppp_generic slhc 8021q garp stp llc jitterentropy_rng fuse ip_tables x_tables ipv6
2022-11-04T14:09:31 server kernel:[75215.929148] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.76-v8+ #1596
2022-11-04T14:09:31 server kernel:[75215.929155] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
2022-11-04T14:09:31 server kernel:[75215.929160] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
2022-11-04T14:09:31 server kernel:[75215.929166] pc : dev_watchdog+0x398/0x3a0
2022-11-04T14:09:31 server kernel:[75215.929172] lr : dev_watchdog+0x398/0x3a0
2022-11-04T14:09:31 server kernel:[75215.929178] sp : ffffffc008003d10
2022-11-04T14:09:31 server kernel:[75215.929181] x29: ffffffc008003d10 x28: ffffff81003fba80 x27: 0000000000000004
2022-11-04T14:09:31 server kernel:[75215.929191] x26: 0000000000000140 x25: 00000000ffffffff x24: 0000000000000000
2022-11-04T14:09:31 server kernel:[75215.929200] x23: ffffffe412736000 x22: ffffff8102ec13dc x21: ffffff8102ec1000
2022-11-04T14:09:31 server kernel:[75215.929209] x20: ffffff8102ec1480 x19: 0000000000000000 x18: 0000000000000000
2022-11-04T14:09:31 server kernel:[75215.929218] x17: ffffff9decb0b000 x16: ffffffc008004000 x15: ffffffffffffffff
2022-11-04T14:09:31 server kernel:[75215.929226] x14: ffffffe41229b8a8 x13: 74756f2064656d69 x12: ffffffe4127c6660
2022-11-04T14:09:31 server kernel:[75215.929235] x11: 0000000000000003 x10: ffffffe4127ae620 x9 : ffffffe4114ee89c
2022-11-04T14:09:31 server kernel:[75215.929244] x8 : 0000000000017fe8 x7 : 0000000000000003 x6 : 0000000000000000
2022-11-04T14:09:31 server kernel:[75215.929252] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000103
2022-11-04T14:09:31 server kernel:[75215.929260] x2 : 0000000000000102 x1 : ec86af8fad0c9b00 x0 : 0000000000000000
2022-11-04T14:09:31 server kernel:[75215.929269] Call trace:
2022-11-04T14:09:31 server kernel:[75215.929272] dev_watchdog+0x398/0x3a0
2022-11-04T14:09:31 server kernel:[75215.929279] call_timer_fn+0x38/0x1d8
2022-11-04T14:09:31 server kernel:[75215.929287] run_timer_softirq+0x284/0x520
2022-11-04T14:09:31 server kernel:[75215.929292] __do_softirq+0x1a8/0x4ec
2022-11-04T14:09:31 server kernel:[75215.929297] irq_exit+0x110/0x150
2022-11-04T14:09:31 server kernel:[75215.929304] handle_domain_irq+0x9c/0xe0
2022-11-04T14:09:31 server kernel:[75215.929311] gic_handle_irq+0xac/0xe8
2022-11-04T14:09:31 server kernel:[75215.929315] call_on_irq_stack+0x28/0x54
2022-11-04T14:09:31 server kernel:[75215.929321] do_interrupt_handler+0x60/0x70
2022-11-04T14:09:31 server kernel:[75215.929326] el1_interrupt+0x30/0x78
2022-11-04T14:09:31 server kernel:[75215.929332] el1h_64_irq_handler+0x18/0x28
2022-11-04T14:09:31 server kernel:[75215.929337] el1h_64_irq+0x78/0x7c
2022-11-04T14:09:31 server kernel:[75215.929341] arch_cpu_idle+0x18/0x28
2022-11-04T14:09:31 server kernel:[75215.929347] default_idle_call+0x54/0x19c
2022-11-04T14:09:31 server kernel:[75215.929355] do_idle+0x254/0x268
2022-11-04T14:09:31 server kernel:[75215.929360] cpu_startup_entry+0x2c/0x80
2022-11-04T14:09:31 server kernel:[75215.929365] rest_init+0xe4/0xf8
2022-11-04T14:09:31 server kernel:[75215.929370] arch_call_rest_init+0x18/0x24
2022-11-04T14:09:31 server kernel:[75215.929379] start_kernel+0x6b0/0x6e8
2022-11-04T14:09:31 server kernel:[75215.929385] __primary_switched+0xbc/0xc4
2022-11-04T14:09:31 server kernel:[75215.929392] ---[ end trace 84a57842132e9547 ]---
2022-11-04T14:09:31 server kernel:[75215.929419] r8152 2-2:1.0 eth1: Tx timeout
2022-11-04T14:09:31 server kernel:[75215.933022] r8152 2-2:1.0 eth1: Tx status -2
2022-11-04T14:09:31 server kernel:[75215.933561] r8152 2-2:1.0 eth1: Tx status -2
2022-11-04T14:09:31 server kernel:[75215.933952] r8152 2-2:1.0 eth1: Tx status -2
2022-11-04T14:09:31 server kernel:[75215.934382] r8152 2-2:1.0 eth1: Tx status -2
2022-11-04T14:09:34 server kernel:[75218.486112] usb 2-2: reset SuperSpeed USB device number 3 using xhci_hcd
2022-11-04T14:09:34 server kernel:[75218.505742] usb 2-2: device firmware changed
2022-11-04T14:09:34 server kernel:[75218.514059] r8152 2-2:1.0 eth1: Get ether addr fail
2022-11-04T14:09:34 server kernel:[75218.514750] usb 2-2: USB disconnect, device number 3
2022-11-04T14:09:34 server kernel:[75218.797356] usb 2-2: new SuperSpeed USB device number 9 using xhci_hcd
2022-11-04T14:09:34 server kernel:[75218.819783] usb 2-2: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=30.00
2022-11-04T14:09:34 server kernel:[75218.819811] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=6
2022-11-04T14:09:34 server kernel:[75218.819823] usb 2-2: Product: USB 10/100/1000 LAN
2022-11-04T14:09:34 server kernel:[75218.819834] usb 2-2: Manufacturer: Realtek
2022-11-04T14:09:34 server kernel:[75218.819844] usb 2-2: SerialNumber: 000001
2022-11-04T14:09:34 server kernel:[75218.958249] usb 2-2: reset SuperSpeed USB device number 9 using xhci_hcd
2022-11-04T14:09:34 server kernel:[75219.026741] r8152 2-2:1.0: load rtl8153a-4 v2 02/07/20 successfully
2022-11-04T14:09:34 server kernel:[75219.059149] r8152 2-2:1.0 eth1: v1.12.13

Additional context

I already tried the following, but the error stayed:

  • using different powered hubs, connecting directly the the pi´s usb-port
  • supplied new firmware for the device at boot (initrd), which made things even worse. I think this is because of "usb 2-2: device firmware changed" which results in another reset or something. Without supplying the firmware networking mostly recovers after reset. The log above is with initrd-supplied firmware.
  • used different scaling-governors
  • played with usb/pci powersave-settings
  • different ethtool-settings for offloading / rx-buffer-size etc.
  • played with IRQ-affinity
  • changed network-cables, checked cabling
  • ...

I am using the pi as my home-router. The LAN-port is connected to my DSL-modem, the USB-adapter connects my local network.

Now, as I run out of ideas and this really drives me nuts, I decided to open this report as it might be a kernel-related bug.

@gbraad
Copy link
Contributor

gbraad commented Nov 24, 2022

Have the same roblem with Fedora. Most likely related to autosuspend

@gongsearch
Copy link
Author

Do you use Fedora on a PI or should we better try and address this more general?

@lsahn-gh
Copy link
Contributor

lsahn-gh commented Dec 8, 2022

On Raspberry Pi 3 with Ubuntu 22.10 arm64, even the terminal doesn't work. Nothing prints out...

@micw
Copy link

micw commented Oct 18, 2023

Seems to be a general issue with that adapter on linux. I have one in my thinkpad dock and also one as usb-c adapter. Both resets randomly, especially under load :-(

Edit: I tried a lot workarounds (usb quirks, other kernel driver) with no success. Today I found https://forum.endeavouros.com/t/ethernet-keeps-deactivating-and-reactivating-randomly/18851/60 which looks promising.

@bijwaard
Copy link

bijwaard commented Nov 2, 2023

I can trigger a similar crash with the following subnet scan:

nmap -sn 192.168.1.38/24

This is on a nanopi neo3 (Linux 5.10.63-rockchip64 #21.08.2 SMP PREEMPT Wed Sep 8 10:57:23 UTC 2021 aarch64 GNU/Linux). This is quite an old kernel, will try again with newer kernel when time permits.

[6287945.531197] ------------[ cut here ]------------
[6287945.531279] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
[6287945.531465] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:468 dev_watchdog+0x398/0x3a0
[6287945.531478] Modules linked in: pps_ldisc r8152 pps_gpio nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 cpufreq_dt xt_hl ip6_tables ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG nft_limit zram xt_limit xt_addrtype xt_tcpudp nft_chain_nat xt_MASQUERADE xt_conntrack nft_compat nft_counter nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink drm drm_panel_orientation_quirks sunrpc ip_tables x_tables autofs4 realtek dwmac_rk stmmac_platform stmmac pcs_xpcs gpio_syscon
[6287945.531916] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.63-rockchip64 #21.08.2
[6287945.531927] Hardware name: FriendlyElec NanoPi NEO3 (DT)
[6287945.531945] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[6287945.531964] pc : dev_watchdog+0x398/0x3a0
[6287945.531981] lr : dev_watchdog+0x398/0x3a0
[6287945.531991] sp : ffff800011bebd20

@micw
Copy link

micw commented Nov 2, 2023

I had tried the workarounds with setting the power modes:

echo "on" > /sys/bus/usb/devices/5-1/power/control
echo "-1" > /sys/bus/usb/devices/5-1.1/power/autosuspend_delay_ms

It seemed to work but today I had another series of resets. So seems not to be the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants