Skip to content

[RPi4 4GB] xHCI host controller not responding, assume dead #3404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JANogueira opened this issue Jan 13, 2020 · 81 comments
Closed

[RPi4 4GB] xHCI host controller not responding, assume dead #3404

JANogueira opened this issue Jan 13, 2020 · 81 comments

Comments

@JANogueira
Copy link

Describe the bug
After boot-up, an when activating a service that uses USB interface (Network UPS tools, as an example) xHCI interface crashs and the USB devices get disconnected. Recovery only possible after system reboot.

List of USB devices when the system boots up:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 0463:ffff MGE UPS Systems UPS
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

After enabling a:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

To reproduce
After system boot, start Network UPS tools (UPS connected through USB, having a poll frequency of 30 seconds), and the xHCI interface will crash shortly after, leading to all USB devices being disconnected.

System

Details:
Raspberry Pi 4 Model B Rev 1.1

# cat /etc/os-release | head -4
NAME=HassOS
VERSION="3.8 (RaspberryPi 4 64bit)"
ID=hassos
VERSION_ID=3.8
# uname -a
Linux hassio 4.19.93-v8 #1 SMP PREEMPT Sun Jan 12 18:33:10 UTC 2020 aarch64 Hassio/OS
# cat /proc/cpuinfo | tail -3
Revision        : c03111
Serial          : 1000000013df512e
Model           : Raspberry Pi 4 Model B Rev 1.1
#
# df                                                                                                                                                                                                                                     
Filesystem           1K-blocks      Used Available Use% Mounted on                                                                                                                                                                       
/dev/root                92032     92032         0 100% /                                                                                                                                                                                
devtmpfs               1915496         0   1915496   0% /dev                                                                                                                                                                             
tmpfs                  1948808         0   1948808   0% /dev/shm                                                                                                                                                                         
tmpfs                  1948808       708   1948100   0% /run                                                                                                                                                                             
tmpfs                  1948808         0   1948808   0% /sys/fs/cgroup                                                                                                                                                                   
tmpfs                  1948808       708   1948100   0% /etc/machine-id                                                                                                                                                                  
/dev/mmcblk0p7           91099     18656     65562  22% /mnt/overlay                                                                                                                                                                     
/dev/mmcblk0p7           91099     18656     65562  22% /root/.docker                                                                                                                                                                    
/dev/mmcblk0p7           91099     18656     65562  22% /etc/modprobe.d                                                                                                                                                                  
/dev/mmcblk0p7           91099     18656     65562  22% /etc/modules-load.d                                                                                                                                                              
/dev/mmcblk0p7           91099     18656     65562  22% /etc/docker                                                                                                                                                                      
/dev/mmcblk0p7           91099     18656     65562  22% /etc/dropbear                                                                                                                                                                    
/dev/mmcblk0p7           91099     18656     65562  22% /etc/udev/rules.d
/dev/mmcblk0p7           91099     18656     65562  22% /root/.ssh
/dev/mmcblk0p1           32686      3650     29036  11% /mnt/boot
/dev/mmcblk0p7           91099     18656     65562  22% /etc/hostname
/dev/mmcblk0p7           91099     18656     65562  22% /etc/systemd/timesyncd.conf
/dev/mmcblk0p7           91099     18656     65562  22% /etc/NetworkManager/system-connections
/dev/mmcblk0p7           91099     18656     65562  22% /etc/hosts
/dev/mmcblk0p8       122172044   6834280 109113732   6% /mnt/data
/dev/zram2               15856        40     14676   0% /tmp
/dev/zram1               31728       128     29312   0% /var
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/bluetooth
/dev/mmcblk0p8       122172044   6834280 109113732   6% /var/lib/docker
/dev/mmcblk0p7           91099     18656     65562  22% /var/log/journal
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/systemd
/dev/mmcblk0p7           91099     18656     65562  22% /var/lib/NetworkManager
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/c79789ef74f2520ce9bf8308f4f0a0a0c2e2d1453a2836027ab47f3c629d6263/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/c79789ef74f2520ce9bf8308f4f0a0a0c2e2d1453a2836027ab47f3c629d6263/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/4383e4a9b4bfd33f30377b389a5dd8e96f92522bfb5f5e75b5268393aca9af5f/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/4383e4a9b4bfd33f30377b389a5dd8e96f92522bfb5f5e75b5268393aca9af5f/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/22cdad6842ade8567501c09ce194d03f072af3910c18cb54da5175d19b0e3aa6/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/22cdad6842ade8567501c09ce194d03f072af3910c18cb54da5175d19b0e3aa6/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/d383f5286c098e038869aa411fbc087895e9852c34db2276b293d5493076e82f/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/d383f5286c098e038869aa411fbc087895e9852c34db2276b293d5493076e82f/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/8a1741030a19b420c34768b69144ab241b11b00ec67083b86228cee3844dd40a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/8a1741030a19b420c34768b69144ab241b11b00ec67083b86228cee3844dd40a/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/69458a8f8866e9605cd4a15109eb3d1b2a81e7b2b76c81b8f7a1d1f3f7241ec1/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/69458a8f8866e9605cd4a15109eb3d1b2a81e7b2b76c81b8f7a1d1f3f7241ec1/merged
shm                      65536        64     65472   0% /mnt/data/docker/containers/b9ddddacc42db28bed967a218c67d490d3b89e3bf3ac4bf980d0f027af739f41/mounts/shm
shm                      65536        64     65472   0% /var/lib/docker/containers/b9ddddacc42db28bed967a218c67d490d3b89e3bf3ac4bf980d0f027af739f41/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/563028cbdc5e65467e4299d65e42050ca216a6252d1e144e8491723a1434ad1e/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/563028cbdc5e65467e4299d65e42050ca216a6252d1e144e8491723a1434ad1e/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/a00df7501c5d7ef5be8316129829c2b57b7fef415914170006b3172bb43ebaaa/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/a00df7501c5d7ef5be8316129829c2b57b7fef415914170006b3172bb43ebaaa/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/a72a5bc9321264232980140c9bc6686e89db9c4a83a1da17c8755811f160de6a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/a72a5bc9321264232980140c9bc6686e89db9c4a83a1da17c8755811f160de6a/merged
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/7624b7f0f58cc8d32d9d8702d26ae71c27fcd0026f27a6623b15c0ec13fd499a/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/7624b7f0f58cc8d32d9d8702d26ae71c27fcd0026f27a6623b15c0ec13fd499a/merged
shm                      65536        48     65488   0% /mnt/data/docker/containers/376a3e6a98ad20b772ae7f38b2051750d0d38d2f876be82d1cc17a81ae801a05/mounts/shm
shm                      65536        48     65488   0% /var/lib/docker/containers/376a3e6a98ad20b772ae7f38b2051750d0d38d2f876be82d1cc17a81ae801a05/mounts/shm
shm                      65536        60     65476   0% /mnt/data/docker/containers/8043d49fda6b8c96b4cfd0fd77a8499b6f70e5a4ad7a66fc37eb417c1ac6a936/mounts/shm
shm                      65536        60     65476   0% /var/lib/docker/containers/8043d49fda6b8c96b4cfd0fd77a8499b6f70e5a4ad7a66fc37eb417c1ac6a936/mounts/shm
shm                      65536        68     65468   0% /mnt/data/docker/containers/ef4df3ae0e79c33285baec87cd55c0b9d28d150ffda2a19e799979073cd1297a/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/ef4df3ae0e79c33285baec87cd55c0b9d28d150ffda2a19e799979073cd1297a/mounts/shm
shm                      65536        68     65468   0% /mnt/data/docker/containers/f73b6c980b13d97d27fb314c6224f8ed0ef66be8e8039b69436273dc328925f5/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/f73b6c980b13d97d27fb314c6224f8ed0ef66be8e8039b69436273dc328925f5/mounts/shm
shm                      65536        56     65480   0% /mnt/data/docker/containers/6cbb56501e93036eebc86c0c4bfc38a133d2a62ab458290f5005c592a572e7e3/mounts/shm
shm                      65536        56     65480   0% /var/lib/docker/containers/6cbb56501e93036eebc86c0c4bfc38a133d2a62ab458290f5005c592a572e7e3/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/9737f2fb6b35435b15a046da2583412dd66e3559ab0fde67c9fd7c2e67336aa4/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/9737f2fb6b35435b15a046da2583412dd66e3559ab0fde67c9fd7c2e67336aa4/merged
shm                      65536         0     65536   0% /mnt/data/docker/containers/05216e59fa3a97c660472ae072c417cab1743086a3bb61f40ff167f2caaa03ba/mounts/shm
shm                      65536         0     65536   0% /var/lib/docker/containers/05216e59fa3a97c660472ae072c417cab1743086a3bb61f40ff167f2caaa03ba/mounts/shm
overlay              122172044   6834280 109113732   6% /mnt/data/docker/overlay2/ce9f300e071672f49be4e085525ee1e19e381ff4e6bb869e12f5cad245018136/merged
overlay              122172044   6834280 109113732   6% /var/lib/docker/overlay2/ce9f300e071672f49be4e085525ee1e19e381ff4e6bb869e12f5cad245018136/merged
shm                      65536        68     65468   0% /mnt/data/docker/containers/f3c374354b066023eef0662df5fee6b5112d69eba4e2c18daff4223e387d00fa/mounts/shm
shm                      65536        68     65468   0% /var/lib/docker/containers/f3c374354b066023eef0662df5fee6b5112d69eba4e2c18daff4223e387d00fa/mounts/shm
# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       974400  0       -2
# 
  • Which model of Raspberry Pi? Raspberry Pi 4B 4GB
  • Which OS and version (cat /etc/rpi-issue)? HassOS 3.8
  • Which firmware version (vcgencmd version)?
  • Which kernel version (uname -a)? Linux hassio 4.19.93-v8 var->green.length may be left uninitialized #1 SMP PREEMPT Sun Jan 12 18:33:10 UTC 2020 aarch64 Hassio/OS

Logs
dmesg output:

[    0.197545] usbcore: registered new interface driver usbfs
[    0.197606] usbcore: registered new interface driver hub
[    0.197718] usbcore: registered new device driver usb
[    0.487955] usbcore: registered new interface driver r8152
[    0.488025] usbcore: registered new interface driver lan78xx
[    0.488474] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    0.495448] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 4.19
[    0.495481] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.495507] usb usb1: Product: xHCI Host Controller
[    0.495528] usb usb1: Manufacturer: Linux 4.19.88-v8 xhci-hcd
[    0.495549] usb usb1: SerialNumber: 0000:01:00.0
[    0.496073] hub 1-0:1.0: USB hub found
[    0.496671] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    0.496706] xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed
[    0.497145] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 4.19
[    0.497175] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.497200] usb usb2: Product: xHCI Host Controller
[    0.497220] usb usb2: Manufacturer: Linux 4.19.88-v8 xhci-hcd
[    0.497242] usb usb2: SerialNumber: 0000:01:00.0
[    0.497711] hub 2-0:1.0: USB hub found
[    0.499380] usbcore: registered new interface driver uas
[    0.499478] usbcore: registered new interface driver usb-storage
[    0.499601] usbcore: registered new interface driver usbserial_generic
[    0.499647] usbserial: USB Serial support registered for generic
[    0.507353] usbcore: registered new interface driver usbhid
[    0.507361] usbhid: USB HID core driver
[    0.830300] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[    0.980982] usb 1-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.20
[    0.981020] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    0.981041] usb 1-1: Product: USB2.0 Hub
[    0.982796] hub 1-1:1.0: USB hub found
[    1.278306] usb 1-1.3: new full-speed USB device number 3 using xhci_hcd
[    1.615672] usbcore: registered new interface driver brcmfmac
[    1.999992] usb 1-1.3: New USB device found, idVendor=0463, idProduct=ffff, bcdDevice= 0.01
[    2.000016] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.000027] usb 1-1.3: Product: 5E
[    2.000036] usb 1-1.3: Manufacturer: EATON
[    3.967565] hid-generic 0003:0463:FFFF.0001: hiddev96,hidraw0: USB HID v1.10 Device [EATON 5E] on usb-0000:01:00.0-1.3/input0
[  172.256228] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[  172.272306] xhci_hcd 0000:01:00.0: Host halt failed, -110
[  172.272321] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[  172.272369] xhci_hcd 0000:01:00.0: HC died; cleaning up [  172.272442] usb 1-1: USB disconnect, device number 2
[  172.272467] usb 1-1.3: USB disconnect, device number 3

Additional context
This bug report was opened here because HassOS3.8 rus default raspbian Kernel.
This issue was initially reported here:
home-assistant/operating-system#526

USB cable was replaced to ensure that it was not faulty.

If more logs are needed, please let me know

@timg236
Copy link
Contributor

timg236 commented Jan 13, 2020

It might be worth trying different versions of the VL805 firmware. There’s a thread about it here
https://www.raspberrypi.org/forums/viewtopic.php?t=260879

@JANogueira
Copy link
Author

Thank you. I will give it a try and share the outcome.

@JANogueira
Copy link
Author

Well, on raspbian all went well, but not in hassio.
USB crash is still happening in hassio.

Before shutting down Raspbian I have ensured that the latest vl805 firmware was properly installed (using ´sudo vl805´ and have shown 0137ad).

Do you know if the EEPROM bin file deppends on any other optimization on the linux kernel side? or it should fix the USB behaviour by itself?

Thanks in advance.

@JANogueira
Copy link
Author

Please, disregard my last post.
I was doing it wrong while trying to flash the EEPROM.

Took additional actions and all good apparently. 3 hours have passed and all is stable.
I will observe this for a couple of days more just to be sure, and share the outcome.

Many thanks once again!

@JANogueira
Copy link
Author

Just to confirm that everything is still working perfectly.
0137ad solved the issue. USB devices are all stable and I had a small increase on the sdcard I/O.

@hardwareadictos
Copy link

Hey!! Having exactly the same issue on VL805 FW version: 000137ad

Are you still having issues? Can you provide more info about what you did to solve it?

@hardwareadictos
Copy link

Here some logs:

log_xhci_rpi4.txt

@JANogueira
Copy link
Author

Hi @hardwareadictos,

It worked for about 1 week with VL805 FW 0137ad and then I am facing the same issue again.
Not fully solved unfortunately...

@JANogueira JANogueira reopened this Jan 29, 2020
@hardwareadictos
Copy link

I opened an issue for that because I don't know if we have the same issue (I suspect we do):

#3438

Don't doubt on participating. Thanks for answering 😊

@hardwareadictos
Copy link

This could explain some things: c74b1b5

Dont know if that change is already implemented on last kernel...

@pelwell
Copy link
Contributor

pelwell commented Jan 30, 2020

That patch allows for the possibility of loading VL805 firmware from system RAM rather than EEPROM, which has to be re-enabled after a PCI reset. The VPU firmware is not making use of that facility, so when it comes to your problem that commit is a red herring.

@hardwareadictos
Copy link

That patch allows for the possibility of loading VL805 firmware from system RAM rather than EEPROM, which has to be re-enabled after a PCI reset. The VPU firmware is not making use of that facility, so when it comes to your problem that commit is a red herring.

Im not an expert, that's why i said "could" and not "actually". Was also the only recent commit referencing to xHCI.

That issue isnt happening on previous releases.

What's you opinion on that?

Thanks in advance :)

@pelwell
Copy link
Contributor

pelwell commented Jan 30, 2020

Perhaps the on-board voltage is a marginal - try adding over_voltage=1 to config.txt and rebooting.

@hardwareadictos
Copy link

Perhaps the on-board voltage is a marginal - try adding over_voltage=1 to config.txt and rebooting.

Thank you! Applied. Will test it some days and i will report back.

@JANogueira
Copy link
Author

thank you @pelwell
Also applied this to my hassio installation. I will test it as well for some days and report back.

Cheers

@hardwareadictos
Copy link

24 hours. No issues so far. Next report on Monday, but seems that was the issue...

@hardwareadictos
Copy link

One week without issues, i consider this issue fixed :)

@JANogueira
Copy link
Author

I also confirm that after 1 week, all is ok.
But not sure if over voltage should be the way to go. This is a workaround and not a fix imho.

@hardwareadictos
Copy link

Agree. It was working before that voltage related changes on kernel. Maybe someone can give more feedback about that. But for the moment all is working as always

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 8, 2020

I also confirm that after 1 week, all is ok.
But not sure if over voltage should be the way to go. This is a workaround and not a fix imho.

Indeed, I expect a future firmware release will fix this, if the most recent one hasn't already done so.

@JANogueira
Copy link
Author

I am currently running on 0137ad

@timg236
Copy link
Contributor

timg236 commented Feb 8, 2020

The VideoCore firmware controls DVFS so the update would be there rather than the XHCI firmware. Might be worth trying rpi-update (assuming you are familiar with rpi-update / risks of using early firmware)

@DrJohnM61
Copy link

I am having the same issue after updating to the latest release. If I disconnect the [powered] USB3 hub and boot and then connect the hub, I can see the disks for a few minutes (they show in 'blkid' and 'df' and I can also 'ls' the mount points and see my files). They then they disconnect. Looking at dmesg, the same error as others is being produced:
[ 77.881799] xhci_hcd 0000:01:00.0: WARNING: Host System Error
[ 82.887053] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[ 82.887070] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 82.887236] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 82.887275] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.

I am currently running on 0137ad

I have tried over_voltage=1 (in fact I have tried a number of over_voltage numbers or 1,2 and 3)
I have performed a rpi-update
CPU/GPU is not overclocked

$ cat /proc/cpuinfo | tail -3
Revision : c03112
Serial : 10000000b9ae9a7d
Model : Raspberry Pi 4 Model B Rev 1.2

$ cat /etc/os-release | head -4
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"

$ uname -a
Linux PiHoleNAS 4.19.102-v7l+ #1295 SMP Thu Feb 6 15:49:36 GMT 2020 armv7l GNU/Linux

dmesg2.txt

I am an old DEC 10/TOPS-10, VAX/VMS and PDP11/RSX11M+ assembler programmer. Unix is rather new to me. I apologies in advance if I come over as a noob

@popcornmix
Copy link
Collaborator

@DrJohnM61 do you know what firmware/kernel you were on previously?
Does force_turbo=1 help?
Does switching to firmware from here help?

@DrJohnM61
Copy link

@DrJohnM61 do you know what firmware/kernel you were on previously?
Does force_turbo=1 help?
Does switching to firmware from here help?

IIRC the prior version was the 5th Feb release

I tried the force_turbo=1 but does not seem to make any difference.
[ 13.045392] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 13.045464] xhci_hcd 0000:01:00.0: HC died; cleaning up

Will try the firmware alternative in the morning (its late now here in London)

@DrJohnM61
Copy link

@DrJohnM61 do you know what firmware/kernel you were on previously?
Does force_turbo=1 help?
Does switching to firmware from here help?

I copied the files that your link pointed to into the /boot directory. Not sure if I was supposed to do anything after that (I am a unix noob). Rebooted and the problem with USB 3 connected hub and devices is still there. I note that the 'name -a' showed no difference in firmware version, so I suspect I am missing a step.

Went back to page one of the blog that you direct me to and executed the 'sudo rpi-update 4b2c270' command. Rebooted and then power cycled and the problem has gone away, so clearly (IMHO), this seems to be an issue that has been introduced in a later firmware version.

I will note that somewhere between the 4b2c270 version and the latest, I could not boot the RP4 with the connected USB hub/disks and see them but would have to boot without the hub connected and then plug it in after the boot had completed. This change happened after implementing the firmware change that allowed overclocking. I had for some time looked at forums to find a combination of voltage and wait times to try and fix the issue. However, the latest version of the firmware (4.19.102-v7l+ #1295 SMP) just did not allow the external powered USB hub or disks to stably connect (and even if the USB HUB was connected after boot, the USB ports would successively shut down, causing problems with my RAID sets, until the whole HUB disconnected).

Here is a snip from the log showing the USB disconnects (post force_turbo setting):
[ 14.305428] sd 2:0:0:0: [sdc] tag#20 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.305433] sd 2:0:0:0: [sdc] tag#20 CDB: opcode=0x41 41 00 89 e4 81 00 00 03 00 00
[ 14.335885] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already
[ 14.337628] usb 2-1.4.3: USB disconnect, device number 6
[ 14.338786] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[ 14.355410] print_req_error: I/O error, dev sdc, sector 2313453824
[ 14.375412] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.375420] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.445403] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.445412] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.455411] print_req_error: I/O error, dev sdc, sector 2313454592
[ 14.455425] print_req_error: I/O error, dev sdc, sector 2313454848
[ 14.455434] print_req_error: I/O error, dev sdc, sector 2313455616
[ 14.525404] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.525412] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.595450] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.595457] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.675454] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.675461] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.745449] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.745458] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.805398] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.805405] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.855429] sd 2:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[ 14.865412] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.865419] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.925448] sd 3:0:0:0: [sdd] tag#25 sense submit err -19 uas-tag 1 inflight: s-st a-out s-out a-cmd s-cmd
[ 14.925457] sd 3:0:0:0: [sdd] tag#25 CDB: opcode=0x41 41 00 89 e4 80 00 00 01 00 00
[ 14.975986] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already
[ 14.977293] usb 2-1.4.4: USB disconnect, device number 7
[ 14.978401] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[ 14.978422] print_req_error: I/O error, dev sdd, sector 2313453568
[ 15.075452] print_req_error: I/O error, dev sdd, sector 2313453824
[ 15.485545] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00
[ 15.605939] xhci_hcd 0000:01:00.0: WARN Can't disable streams for endpoint 0x81, streams are being disabled already
[ 17.785614] Buffer I/O error on dev md0, logical block 0, lost sync page write
[ 17.785622] EXT4-fs (md0): I/O error while writing superblock
[ 17.866157] Buffer I/O error on dev md0, logical block 976690672, async page read
[ 17.887111] Buffer I/O error on dev md0, logical block 976690672, async page read
[ 18.037901] md0: detected capacity change from 4000525058048 to 0
[ 18.037951] md: md0 stopped.
[ 18.225997] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.

Happy to try different firmware if someone can explain how to install it after pulling from a file share.

@hardwareadictos
Copy link

CPU/GPU is not overclocked

Maybe your Power supply isnt giving you enough power then

@EliaTolin
Copy link

News?
I have same issue

@DerKleinePunk
Copy link

I have the same Problem!

@percysnoodle
Copy link

I'm seeing the same problem.

@Nazgile94
Copy link

Nazgile94 commented Dec 1, 2022

me ² external hdd enclosure

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
Linux raspberrypi 5.15.76-v8+ #1597 SMP PREEMPT Fri Nov 4 12:16:41 GMT 2022 aarch64 GNU/Linux

raspbian 64 bit

openmediavault 6
Version
6.0.46-5 (Shaitan)
Prozessor
BCM2835
Kernel
Linux 5.15.76-v8+

device disconnects, share disaspears - need to reset enclosure + pi
on a x86 linux machine , all working.


1.12.2022, 22:57:05
kernel: [ 509.452860] BTRFS warning (device sdb1): Skipping commit of aborted transaction.
1.12.2022, 22:57:05
kernel: [ 509.621599] sd 0:0:0:1: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=DRIVER_OK
1.12.2022, 22:57:05
kernel: [ 509.452828] BTRFS info (device sdb1): forced readonly
1.12.2022, 22:57:04
kernel: [ 508.913535] sd 0:0:0:1: [sdb] tag#16 uas_eh_abort_handler 0 uas-tag 1 inflight: IN
1.12.2022, 22:57:04
kernel: [ 508.933539] scsi host0: uas_eh_device_reset_handler start
1.12.2022, 22:57:04
kernel: [ 509.062290] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
1.12.2022, 22:57:04
kernel: [ 509.082480] usb 2-1: device firmware changed
1.12.2022, 22:57:04
kernel: [ 508.913555] sd 0:0:0:1: [sdb] tag#16 CDB: opcode=0x85 85 08 0e 00 00 00 01 00 00 00 00 00 00 00 ec 00
1.12.2022, 22:57:04
kernel: [ 509.090735] sd 0:0:0:1: Device offlined - not ready after error recovery
1.12.2022, 22:57:04
kernel: [ 509.090849] usb 2-1: USB disconnect, device number 2
1.12.2022, 22:57:04
kernel: [ 509.090717] scsi host0: uas_eh_device_reset_handler FAILED err -19
1.12.2022, 22:56:03
kernel: [ 447.196946] xhci_hcd 0000:01:00.0: Looking for event-dma 0000000441e40da0 trb-start 0000000441e40db0 trb-end 0000000441e40db0 seg-start 0000000441e40000 seg-end 0000000441e40ff0
1.12.2022, 22:56:03
kernel: [ 447.129915] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX

@EliaTolin
Copy link

@Nazgile94 same problem.

Without rebooting not works if i umount and mount again.

@P33M
Copy link
Contributor

P33M commented Dec 1, 2022

duplicate of #5060

@P33M P33M closed this as completed Dec 1, 2022
@barart
Copy link

barart commented Sep 19, 2023

Seems closed but im new one with this problem

@cyberplant
Copy link

I been having this issue and it's now "fixed" by adding the quirks to the cmdline: 0634:5602:u (Crucial X8 1Tb)

It's a lot slower, but at least it doesn't hang everyday!!!

I was using an external HP SSD and it worked great, but it was veeery slow, so I replaced with an SSD with an external bay and my problems started. I blamed the cheap bay, replaced with a good one. Still the same. Replaced with an external well known SSD (Crucial) and the same!!

I've created a "watchdog" on my NAS, I connected my RPi to a Shelly switch, so when the HomeAssistant that's running there doesn't reply for a while, my script turns it off and on again after some seconds. This worked fine, but still had one or two reboots per day.

@dhjackal
Copy link

I take it that this issue is still occuring seeing as I appear to be having it

@dhjackal
Copy link

I been having this issue and it's now "fixed" by adding the quirks to the cmdline: 0634:5602:u (Crucial X8 1Tb)

It's a lot slower, but at least it doesn't hang everyday!!!

I was using an external HP SSD and it worked great, but it was veeery slow, so I replaced with an SSD with an external bay and my problems started. I blamed the cheap bay, replaced with a good one. Still the same. Replaced with an external well known SSD (Crucial) and the same!!

I've created a "watchdog" on my NAS, I connected my RPi to a Shelly switch, so when the HomeAssistant that's running there doesn't reply for a while, my script turns it off and on again after some seconds. This worked fine, but still had one or two reboots per day.

Would you mind sharing your solution.....step by step and the "watchdog" script with us. Would be terribly greatful 'ol chap.

@dhjackal
Copy link

Can I quantify / qualify.....is this a hardware, software, firmware / hardware combo or other issue completely???? Anyone? The reason I ask is a) it's not clear from the thread exactly where the problem lies....b) what the actual cause of the problem is (apart from A external USB connected to a Raspberry pi c) most importantly (and selfishly in my case) my Raspberry pi is still within it's 30 day retrurn period SO if it's a hardware issue i'll take the easier softer solution and send the thing back. Thanks

@nikita-fuchs
Copy link

nikita-fuchs commented May 28, 2024 via email

@dhjackal
Copy link

In my case IT helped to give my SSD an external Power supply. Declan Heerey @.> schrieb am Di., 28. Mai 2024, 13:51:

Can I quantify / qualify.....is this a hardware, software, firmware / hardware combo or other issue completely???? Anyone? The reason I ask is a) it's not clear from the thread exactly where the problem lies....b) what the actual cause of the problem is (apart from A external USB connected to a Raspberry pi c) most importantly (and selfishly in my case) my Raspberry pi is still within it's 30 day retrurn period SO if it's a hardware issue i'll take the easier softer solution and send the thing back. Thanks — Reply to this email directly, view it on GitHub <#3404 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZXRJBIB4Y54CIB6ANZJPDZERVSTAVCNFSM4KGFPQPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJTGUYDENJRGMYA . You are receiving this because you commented.Message ID: @.
>

Cool. Thanks. Happy to try this. Easy enough to try and remedy with a spare powered USB hub I have hiding around here somewhere......now where did I leave it??? :o)

@dhjackal
Copy link

I'm having mixed results....as usual with Raspberry Pi's....I didn't enter into this project to tinker but well that's what I always end up doing. Still, learning. Anyway ;

Powering the USB SSD via a hub seems to work....BUT agonizingly not if I put any pressure (load) on the mount, filesystem / disk for long - so it probably isn't working at all....I see the following messages in dmesg

[ 561.128679] device offline error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 [ 561.128750] EXT4-fs (sda1): shut down requested (2) [ 561.128784] Aborting journal on device sda1-8. [ 561.128837] device offline error, dev sda, sector 247728128 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2 [ 561.128857] device offline error, dev sda, sector 247728128 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2 [ 561.128871] Buffer I/O error on dev sda1, logical block 30965760, lost sync page write [ 561.128902] JBD2: I/O error when updating journal superblock for sda1-8. [ 561.183827] sd 0:0:0:0: [sda] Synchronizing SCSI cache

@nikita-fuchs
Copy link

nikita-fuchs commented May 28, 2024 via email

@dhjackal
Copy link

Some info ;

uname -a
Linux Malta 6.6.28+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.28-1+rpt1 (2024-04-22) aarch64 GNU/Linux

`
pi@Malta:~ $ sudo rpi-eeprom-update
BOOTLOADER: up to date
CURRENT: Fri May 17 11:26:58 UTC 2024 (1715945218)
LATEST: Fri May 17 11:26:58 UTC 2024 (1715945218)
RELEASE: latest (/lib/firmware/raspberrypi/bootloader-2711/latest)
Use raspi-config to change the release.

VL805_FW: Using bootloader EEPROM
VL805: up to date
CURRENT: 000138c0
LATEST: 000138c0
`

pi@Malta:~ $ sudo rpi-eeprom-config [all] BOOT_UART=0 WAKE_ON_GPIO=1 POWER_OFF_ON_HALT=0

Official PSU, externally powered hub for SDD.

@dhjackal
Copy link

This reads more Like you should run a thorough checkt of your hard Drive. Declan Heerey @.> schrieb am Di., 28. Mai 2024, 18:00:

I'm having mixed results....as usual with Raspberry Pi's....I didn't enter into this project to tinker but well that's what I always end up doing. Still, learning. Anyway ; Powering the USB SSD via a hub seems to work....BUT agonizingly not if I put any pressure (load) on the mount, filesystem / disk for long - so it probably isn't working at all....I see the following messages in dmesg [ 561.128679] device offline error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2 [ 561.128750] EXT4-fs (sda1): shut down requested (2) [ 561.128784] Aborting journal on device sda1-8. [ 561.128837] device offline error, dev sda, sector 247728128 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2 [ 561.128857] device offline error, dev sda, sector 247728128 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2 [ 561.128871] Buffer I/O error on dev sda1, logical block 30965760, lost sync page write [ 561.128902] JBD2: I/O error when updating journal superblock for sda1-8. [ 561.183827] sd 0:0:0:0: [sda] Synchronizing SCSI cache — Reply to this email directly, view it on GitHub <#3404 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZXRJCW4LKZ5OKUSHOBX23ZESS2VAVCNFSM4KGFPQPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJTGU3DANZXG42Q . You are receiving this because you commented.Message ID: @.
>

Interesting. It was cheap, from Amazon....probably. You get what you pay for. I'll go down the hardware rabbit hole and see where that takes me. Thanks for the response.

@electropolis
Copy link

Is there any solution after all ?

@barart
Copy link

barart commented May 29, 2024

Is there any solution after all ?

No, just the walk-arounds mentioned above they makes the usb not to crash but makes it slower and affects the lifespan of the disks 🤷🏻‍♂️ i dont think that a realfix (that needs a kernel update) its going to happen anytime, rpi5 has similar issues too

@cyberplant
Copy link

Is there any solution after all ?

No, just the walk-arounds mentioned above they makes the usb not to crash but makes it slower and affects the lifespan of the disks 🤷🏻‍♂️ i dont think that a realfix (that needs a kernel update) its going to happen anytime, rpi5 has similar issues too

How is this affecting the lifespan of the disks? I see my system is a lot slower than before and takes much more time to boot. I think even more than the external HP disk I had before that was like 10x times slower than the Crucial!

But right now has been running since May 13th (when I rebooted it for an upgrade)!

@dhjackal
Copy link

Is there any solution after all ?

Changing the external SSD and USB connector "seems" to be working for me. For now at least. My setup now isn't ideal but it's only for testing atm.....I've reverted back to a second SD card and a adapter connected to a USB hub. I went through a multitude of swapping in and out and this worked so i moved on.

@electropolis
Copy link

Is there any solution after all ?

Changing the external SSD and USB connector "seems" to be working for me. For now at least. My setup now isn't ideal but it's only for testing atm.....I've reverted back to a second SD card and a adapter connected to a USB hub. I went through a multitude of swapping in and out and this worked so i moved on.

I also found a table that shows all reliable adapters that work on USB 3.0 with RPi4 and I figure it out that my wasn't actually working and had buy another one.

@David112x
Copy link

Alright well, even in 6.6.64-v8+ (custom built kernel from Raspberry Pi's sources with BBRv3 support), latest Raspbian, this is still an issue and it seems that using usb-storage rather than uas is a fix, but usb-storage isn't nearly as good performance-wise, so I'll be trying newer kernel versions to see how it goes.

@cyberplant
Copy link

Thanks @David112x for confirming that. I've been using usb-storage for some months already without issues but with very slow performance.

I'm waiting for a NVMe hat for my new RPi5 so I can move HASS to it and forget about these issues. Hope not have similar issues (but with another kernel module) 😛

@David112x
Copy link

David112x commented Dec 26, 2024

Thanks @David112x for confirming that. I've been using usb-storage for some months already without issues but with very slow performance.

I'm waiting for a NVMe hat for my new RPi5 so I can move HASS to it and forget about these issues. Hope not have similar issues (but with another kernel module) 😛

Yeah so I tried using newer kernels and I found that:
A. The adapter no longer goes into sleep states, meaning that hard drives will not power down and suspend
B. UAS doesn't have any more issues and the XHCI controller doesn't crash

So in most cases these things are fine, but in my case I'm using a 2.5in hard drive which is NOT enterprise grade so having it on 24/7 without suspending at all is a problem because it could cut down it's lifespan, especially considering that the hard drive I use is from 10 years ago, barely having any use, sure, but it's still a good idea to take preventive measures, so I'll have to look into whatever made this a thing and revert it.

Hopefully find out if the suspend thing is related to the XHCI controller crashes (and imo the drivers for the controller should be able to handle this instead of having UAS compensate for it)

@David112x
Copy link

Update: The HDD does suspend, just doesn't suspend when I use the command sudo smartctl -s standby,now /dev/sda, which used to work in previous kernel versions, so my theory is that the HDD goes into standby/suspend in addition to the controller/drive enclosure's controller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests