Skip to content

USB control messages interleaving data errors - Pi 4 #3054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jfthibert opened this issue Jul 5, 2019 · 17 comments
Open

USB control messages interleaving data errors - Pi 4 #3054

jfthibert opened this issue Jul 5, 2019 · 17 comments

Comments

@jfthibert
Copy link
Contributor

In some cases USB control messages interleaved with bulk transfers seem to be getting the wrong data on Pi 4. Instead of the expected content the capture shows a copy of the previous control data.

For example the following usbmon capture shows the second control data being 1d when it should have contained 00, this causes error with the driver which receives unexpected values.

c8ae4300 3921710804 S Ci:1:004:0 s c0 02 0000 00b2 0001 1 <
c8ae4300 3921710904 C Ci:1:004:0 0 1 = 1d
c8ae4300 3921710934 S Ci:1:004:0 s c0 00 0000 0005 0001 1 <
dcec9840 3921710981 C Bi:1:004:4 0 48128 = 4700311d 1c802eff 722dc883 37b72db7 c5b50207 da001811 00083f9f ffffd7b9
dcec9840 3921711374 S Bi:1:004:4 -115 48128 <
c8ae4300 3921711400 C Ci:1:004:0 0 1 = 1d

a similar capture on PC shows the expected behavior

ffff899c6e272180 1633275323 S Ci:3:002:0 s c0 02 0000 00b2 0001 1 <
ffff899c6e272180 1633275412 C Ci:3:002:0 0 1 = 26
ffff899c6e272180 1633275435 S Ci:3:002:0 s c0 00 0000 0005 0001 1 <
ffff899d47c4f780 1633275439 C Bi:3:002:4 0 48128 = 471fff10 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
ffff899d47c4f780 1633275548 S Bi:3:002:4 -115 48128 <
ffff899c6e272180 1633275554 C Ci:3:002:0 0 1 = 00

This problem can be reproduced by doing both bulk transfers and control transfers using an ATSC USB adapter (I've been using a WinTV dualHD) so that eventually the setup stage of a control message is done before a bulk transfer while the data stage happens afterward.

The same configuration works correctly with a Pi 3B using the same kernel version as well as the same USB adapter.

  • Which model of Raspberry Pi? e.g. Pi3B+, PiZeroW
    Issue happens on Pi 4

  • Which OS and version (cat /etc/rpi-issue)?
    Raspberry Pi reference 2019-06-20
    Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 150e25c4f8123a4c9c63e8dca1b4737fa6c1135c, stage2

  • Which firmware version (vcgencmd version)?
    Jul 5 2019 13:09:40
    Copyright (c) 2012 Broadcom
    version 8d1e59694b5cc50b7ecd7e3f3944db8090523006 (clean) (release) (start)

  • Which kernel version (uname -a)?
    Linux raspberrypi 4.19.57-v7l+ Rpi 4.4.y #1244 SMP Thu Jul 4 18:48:07 BST 2019 armv7l GNU/Linux

I have also tried with the current release versions as well as kernel 5.1.

@pelwell
Copy link
Contributor

pelwell commented Jul 8, 2019

Is this on a 4GB Pi 4? If so, you can artificially restrict the RAM to 3GB by adding total_mem=3072 to config.txt. Knowing whether that fixes the problem will help to narrow down the location of the bug.

@jfthibert
Copy link
Contributor Author

Thanks for the suggestion, this happens on a 2GB Pi 4.

@pelwell
Copy link
Contributor

pelwell commented Jul 8, 2019

Thanks. That's good because it means I didn't break it, and bad because we probably have to rely on somebody else to fix it.

@P33M
Copy link
Contributor

P33M commented Jul 15, 2019

Can you try with rpi-update firmware and report back?

@jfthibert
Copy link
Contributor Author

I just updated and I'm still seeing the same issue.

$ vcgencmd version
Jul 15 2019 17:30:48
Copyright (c) 2012 Broadcom
version 99f678cd2ad635ab64c6e41a74e372bf57694899 (clean) (release) (start)

@P33M
Copy link
Contributor

P33M commented Jul 16, 2019

It was worth a try. Is the symptom always the same, in that you always get faulty control endpoint data corresponding to the previous control transaction (and not some arbitrary one in the past)?

@jfthibert
Copy link
Contributor Author

Yes the symptoms have been the same in all bad cases I have captured so far. The data is a duplicate of the previous control transaction and it happens when there is a bulk transfer in between the request and the data. If I disable the bulk side transfers then all control transactions seem to be error free. Let me know if some specific tests would be useful to capture.

@jfthibert
Copy link
Contributor Author

I have done a few more experiments, it is even easier to reproduce on another device using isochronous transfers :

db8f3540 1740280692 S Ci:004:00 s c0 00 0000 0005 0001 1 <
db8f3540 1740280761 C Ci:004:00 0 1 = 00
db8f3540 1740280796 S Ci:004:00 s c0 02 0000 001c 0001 1 <
db8f3540 1740281052 C Ci:004:00 0 1 = 8b
db8f3540 1740281088 S Ci:004:00 s c0 00 0000 0005 0001 1 <
...
db9fd000 1740281309 S Zi:004:04 -115 60160 <
db8f3540 1740281425 C Ci:004:00 0 1 = 8b

I've ordered a PCIE board with the VL805 to verify I see the same behavior.

@jfthibert
Copy link
Contributor Author

I have confirmed I see proper behavior when using a VL805 on PC (03:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01) (prog-if 30 [XHCI]))

ffff9b5aeeedfb40 3513868822 S Ci:003:00 s c0 02 0000 00b2 0001 1 <
ffff9b5aeeedfb40 3513868924 C Ci:003:00 0 1 = 05
ffff9b5aeeedfb40 3513868935 S Ci:003:00 s c0 00 0000 0005 0001 1 <
ffff9b5b863553c0 3513868957 C Bi:003:04 0 48128 = 47003119 8c45b870 5b49402a ce1
e5542 8a26173c 19bbcddd 7dd2c2e2 5942f50b
ffff9b5b863553c0 3513869049 S Bi:003:04 -115 48128 <
ffff9b5aeeedfb40 3513869053 C Ci:003:00 0 1 = 00

I haven't been able to capture any bad cases so far, it seems the problem is specific to usage on Pi 4 or the specific version of the controller.

@pelwell
Copy link
Contributor

pelwell commented Jul 28, 2019

Using the VL805-based PC card, what does sudo lspci -xxx | grep -A8 VIA report? You may have to install the pciutils package.

@jfthibert
Copy link
Contributor Author

Here is the dump of the start of PCI config space on the PC card :

03:00.0 USB controller: VIA Technologies, Inc. VL805 USB 3.0 Host Controller (rev 01)
00: 06 11 83 34 07 04 10 00 01 30 03 0c 10 00 00 00
10: 04 00 d0 f7 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 06 11 83 34
30: 00 00 00 00 80 00 00 00 00 00 00 00 0a 01 00 00
40: 00 00 00 00 00 01 00 00 09 a0 28 03 04 00 00 00
50: 00 35 01 00 00 00 00 00 00 00 00 00 06 11 83 34
60: 30 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

@jfthibert
Copy link
Contributor Author

I have just tried using the latest VL805 firmware (0137ab) and the issue persists. Looking for ideas on what to try next.

@jfthibert
Copy link
Contributor Author

It turns out this problem is specific to the 32-bit kernel. Running v8/aarch64 kernel I wasn't able to reproduce the issue anymore.

@smallint
Copy link

Not sure if my setup is being hit by the same issue but USB bulk transfer causes troubles with my ATSC USB adapter (TBS 5990). Disabling bulk transfer resolves the issue. It does not occur on x86 kernels. The symptoms are that the device is being logged as "stalled" and it will take some (long) time to recover. It can be reproduced.

  • Kernel version
    Linux rpi4 4.19.73-1-ARCH #1 SMP PREEMPT Sat Sep 21 15:09:04 UTC 2019 armv7l GNU/Linux
    
  • Firmware
    Sep 20 2019 18:15:38 
    Copyright (c) 2012 Broadcom
    version 438bd10f818f4fcb6fbe27aaec6b2ac3de84a2de (clean) (release) (start)
    

I can provide any logs or outputs, just don't know which could be of help.

@jfthibert
Copy link
Contributor Author

Can you give the 64-bit kernel a try? It might also be caused by invalid data on control messages.

@smallint
Copy link

Can you give the 64-bit kernel a try?

Not easily. Are there any pre-compiled packages available or an OS image I can just use?

It might also be caused by invalid data on control messages.

That sounds reasonable. I have tested again with the latest kernel and the problem persists although it does not happen immediately as with older kernels.

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2019

You can install a suitable 64-bit kernel using sudo rpi-update. You then need to arm_64bit=1 to config.txt, otherwise it will choose the 32-bit kernel out of preference (this provides a get-out-of-jail if it doesn't work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants