sc16is7xx driver: performance degradation in 5.10 vs. 4.19 #4351

sonelu · 2021-05-18T12:05:15Z

Describe the bug
The sc16is7xx in kernel 5.10 exhibits poor performance compared with the driver in 4.19.

To reproduce
A board with an SC16IS762 is connected over SPI0. The board uses a 32Mhz crystal and is configured in the overlay at 15Mbps SPI communication speed. See the section Additional context below for more details. The communication is produced by a ROS node that commands 10 Dynamixel servos on one bus and 12 on the other at 2Mbps communication speed.

Expected behaviour
No significant difference in performance between 5.10 and 4.19 versions.

Actual behaviour
5.10 version of the driver exhibits almost 10x more communication errors limiting severely the frequency of update for servos.

The comparison is performed by compiling the 4.19 driver on a Raspberry Pi running 5.10 kernel and replacing the .ko.

System
Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

Which model of Raspberry Pi?

Raspberry Pi 4 (4GB RAM)

Which OS and version (cat /etc/rpi-issue)?

Raspberry Pi reference 2021-03-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 461109415073d2eb67083709662ba983cc191f14, stage2

Which firmware version (vcgencmd version)?

Apr 30 2021 13:45:52
Copyright (c) 2012 Broadcom
version d7f29d96450abfc77cd6cf011af1faf1e03e5e56 (clean) (release) (start)

Which kernel version (uname -a)?

Linux MH5-3B76 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux

Logs
There are messages issues in dmesg by both versions of the driver like this:

[  526.435961] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  528.919261] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  529.339331] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  530.473633] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  531.228985] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  532.288945] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  534.073438] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  534.608286] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  535.028630] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  535.744748] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  536.498590] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  536.708611] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  537.138495] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  537.348310] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  537.707345] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  538.398271] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  538.828197] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  539.238164] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  539.448203] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  539.878126] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  542.398133] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  544.937258] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  545.567628] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  547.877677] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  548.087428] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  548.507475] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  550.792987] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  553.367270] sc16is7xx spi0.1: ttySC0: Possible RX FIFO overrun: 64
[  554.416971] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
[  558.616625] sc16is7xx spi0.1: ttySC1: Possible RX FIFO overrun: 64
...

But this seems to be significantly more happening in the 5.10 kernel driver.

Additional context
The driver is used with a board that uses sc16is762 chip and connects over SPI0 with CE1. On the same SPI0 there is a TFT screen using ST7789 using CE0. Because the stock dtb for sc16is7xx does not support override for CE pin and the SPI communication baud I use a custom dts as follows:

/dts-v1/;
/plugin/;

/ {
	compatible = "brcm,bcm2835";

	fragment@0 {
		target = <&spi0>;
		__overlay__ {
			status = "okay";
			spidev@1{
				status = "disabled";
			};
		};
	};

	fragment@1 {
		target = <&spi0>;
		__overlay__ {
			#address-cells = <1>;
			#size-cells = <0>;
			status = "okay";

			sc16is762: sc16is762@0 {
				compatible = "nxp,sc16is762";
				reg = <1>; /* CE1 */
				clocks = <&sc16is762_clk>;
				interrupt-parent = <&gpio>;
				interrupts = <23 2>; /* IRQ_TYPE_EDGE_FALLING */
				gpio-controller;
				#gpio-cells = <2>;
				spi-max-frequency = <15000000>;

			};
		};
	};
	fragment@2 {
		target-path = "/";
		__overlay__ {
			sc16is762_clk: sc16is762_clk {
				compatible = "fixed-clock";
				#clock-cells = <0>;
				clock-frequency = <32000000>;
			};
		};
	};

	__overrides__ {
		int_pin = <&sc16is762>,"interrupts:0";
		xtal = <&sc16is762_clk>,"clock-frequency:0";
		ce = <&sc16is762>,"reg:0";
	};
};

The focus is to handle high-speed communication over the 2 UART ports. The experiments performed with the 2 drivers use 2Mbps communication speeds for each port.

Also both ports are configured in RS485 mode as the board leverages the hardware flow control to enable the communication direction over a semi-duplex TTL bus. The setup is done in the software using ioctl:

struct serial_rs485 rs485conf;
/* Enable RS485 mode: */
rs485conf.flags |= SER_RS485_ENABLED;
rs485conf.flags |= SER_RS485_RTS_ON_SEND;
rs485conf.flags &= ~(SER_RS485_RTS_AFTER_SEND);
rs485conf.delay_rts_before_send = 0;
rs485conf.delay_rts_after_send = 0;
if (ioctl (socket_fd_, TIOCSRS485, &rs485conf) < 0)
    return false;
else
    return true;

Do let me know in case I need to post additional information.

The text was updated successfully, but these errors were encountered:

pelwell · 2021-05-18T12:20:58Z

Can you confirm my understanding that a self-built rpi-5.10.y (not upstream 5.10), the only change being a forward-port of the 4.19 sc16is7xx driver, behaves much better than an equivalent "official" rpi-5.10.y build?

pelwell · 2021-05-18T12:23:16Z

And is that the sc16is7xx driver as found in rpi-4.19.y or the unmodified upstream version?

pelwell · 2021-05-18T12:28:54Z

A fork of rpi-5.10.y with your patches would be ideal for testing.

sonelu · 2021-05-18T12:32:25Z

Yes, it is the rpi-4.19.y version with only a few line updates to account for the changes related to struct sched_param as implemented here in the rpi-5.10.y version.

The actual code is here.

A fork of rpi-5.10.y with your patches would be ideal for testing.

If the patch is between the forward-ported version (in the link above) and the current rpi-5.10.y then all the updates done in the last year to the driver in 5.10.y will be gone. I can do that but is that really useful?

pelwell · 2021-05-18T13:32:01Z

Thanks - there are 15 commits that separate rpi-5.10.y from rpi-4.19.y:

911edccf93d0 sc16is7xx: Defer probe if device read fails
94a6e1d70b58 sc16is7xx: Fix for hardware flow control
5da6b1c079e6 sc16is7xx: Set iobase to device index
#28d2f209cd16 sched,serial: Convert to sched_set_fifo()
9eb90d57b55a sc16is7xx: Add flag to activate IrDA mode
2aa916e67db3 sc16is7xx: Read the LSR register for basic device presence check
2d12fc792cdd sc16is7xx: Allow sharing the IRQ line
6393ff1c4435 sc16is7xx: Use threaded IRQ
37f3965d74d5 sc16is7xx: Always use falling edge IRQ
4a37c0fcf5d4 serial: sc16is7xx: Replace zero-length array with flexible-array member
c53051128bb0 sc16is7xx: put err_spi and err_i2c into correct #ifdef
6b77908fea62 serial: sc16is7xx: Drop of_match_ptr() use
00d9f0c65401 serial: sc16is7xx: Switch to use device_get_match_data()
#24bc6e68efa0 serial: sc16is7xx: Respect clock-frequency property
84f1c5c0174a sc16is7xx: Use struct_size() in devm_kzalloc()

The commented out lines cause merge conflicts when reverted and are unlikely to be the cause of the problem. so I propose we ignore them for now.

If you don't mind doing some tests yourself - you are set up for it, whereas I'll struggle to scrape together the hardware - try this, after first getting the latest rpi-5.10.y - I'm on b657cd2:

Put the lines above in a file called diffs - in the root of your rpi-5.10.y tree would be sensible.
cut -c1-12 diffs | grep -v '#' | xargs git revert
Confirm that only the driver is changed using git diff --name-only HEAD~13. You can see the remaining differences with git diff origin/rpi-4.19.y -- drivers/tty/serial/sc16is7xx.c.
Build that tree and confirm that it works. If not then we need to look at those two commented-out commits.
git bisect good (answer 'y' or hit return).
grep -v '#' diffs | tac | cut -c1-12 | xargs git cherry-pick
git diff origin/rpi-5.10.y - should be none.
git bisect bad
Now repeatedly build, install and test the module. After each test run git bisect good or git bisect bad as appropriate. You should only need to run 3-4 tests to find the culprit.
Report the breaking commit here.

pelwell · 2021-05-18T14:42:07Z

In case you haven't started yet, you get a slightly more comprehensible git log if you change step 6 from:

git log --oneline | head -13 | cut -c1-12 | xargs git revert

to:

grep -v '#' diffs | tac | cut -c1-12 | xargs git cherry-pick

I've edited the instructions above to avoid confusion.

sonelu · 2021-05-19T13:39:33Z

Sorry for taking so long: I had trashed the kernel yesterday because of a mistake and had to rebuild from a backup from last week, plus I had some issues with the SD card and needed to change it.

Anyway, I will be doing the analysis above. Just that I noticed that if I try to replace the driver hot with:

$ sudo modprobe -r sc16is7xx
$ sudo cp <local> /lib/modules/.../serial/sc16is7xx.ko
$ sudo modprobe sc16is7xx

I get this in the dmesg:

[ 2381.687987] irq: type mismatch, failed to map hwirq-23 for gpio@7e200000!
[ 2381.691676] sc16is7xx: probe of spi0.1 failed with error -22

Seems like the driver is not releasing the IRQ and when reinserting and it fails to reacquire it.

Should I raise this as a separate issue?

pelwell · 2021-05-19T13:54:19Z

Should I raise this as a separate issue?

Yes please - just a brief issue. It will end up being a separate patch, ideally upstreamed, and having a separate issue makes it easier to track/harder to forget.

sonelu · 2021-05-19T15:54:10Z

This is where the bisect analysis pointing to:

commit ee0be50a73e0fd49100583c9b0b2307c59db2005
Author: Daniel Mack <[email protected]>
Date:   Thu May 21 11:11:50 2020 +0200

    sc16is7xx: Use threaded IRQ

    Use a threaded IRQ handler to get rid of the irq_work kthread.
    This also allows for the driver to use interrupts generated by
    a threaded controller.

    Signed-off-by: Daniel Mack <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 drivers/tty/serial/sc16is7xx.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

pelwell · 2021-05-19T16:24:22Z

Thanks - that's a great help, although it suggests the problem is not going to be so easy to solve properly.

Can you try just with that one commit reverted? It isn't a clean revert, so revert all those local patches (git reset --hard origin/rpi-5.10.y), download and apply this patch (git am <filename>):

From 069db9c083b5eeed1773ab83510158766af1c431 Mon Sep 17 00:00:00 2001
From: Phil Elwell <[email protected]>
Date: Wed, 19 May 2021 17:15:38 +0100
Subject: [PATCH] Revert "sc16is7xx: Use threaded IRQ"

This reverts commit 6393ff1c4435acc343b1481f5b834b918cb42b12.
---
 drivers/tty/serial/sc16is7xx.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 8e3b83fac33c..c5a65b89d26b 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -328,6 +328,7 @@ struct sc16is7xx_port {
 	unsigned char			buf[SC16IS7XX_FIFO_SIZE];
 	struct kthread_worker		kworker;
 	struct task_struct		*kworker_task;
+	struct kthread_work		irq_work;
 	struct mutex			efr_lock;
 	struct sc16is7xx_one		p[];
 };
@@ -713,9 +714,9 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
 	return true;
 }
 
-static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
+static void sc16is7xx_ist(struct kthread_work *ws)
 {
-	struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
+	struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
 
 	mutex_lock(&s->efr_lock);
 
@@ -730,6 +731,13 @@ static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
 	}
 
 	mutex_unlock(&s->efr_lock);
+}
+
+static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
+{
+	struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
+
+	kthread_queue_work(&s->kworker, &s->irq_work);
 
 	return IRQ_HANDLED;
 }
@@ -1240,6 +1248,7 @@ static int sc16is7xx_probe(struct device *dev,
 	mutex_init(&s->efr_lock);
 
 	kthread_init_worker(&s->kworker);
+	kthread_init_work(&s->irq_work, sc16is7xx_ist);
 	s->kworker_task = kthread_run(kthread_worker_fn, &s->kworker,
 				      "sc16is7xx");
 	if (IS_ERR(s->kworker_task)) {
@@ -1338,16 +1347,14 @@ static int sc16is7xx_probe(struct device *dev,
 	 * In case the interrupt controller doesn't support that, we fall
 	 * back to a non-shared falling-edge trigger.
 	 */
-	ret = devm_request_threaded_irq(dev, irq, NULL, sc16is7xx_irq,
-					IRQF_TRIGGER_LOW | IRQF_SHARED |
-					IRQF_ONESHOT,
-					dev_name(dev), s);
+	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
+			       IRQF_TRIGGER_LOW | IRQF_SHARED | IRQF_ONESHOT,
+			       dev_name(dev), s);
 	if (!ret)
 		return 0;
 
-	ret = devm_request_threaded_irq(dev, irq, NULL, sc16is7xx_irq,
-					IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
-					dev_name(dev), s);
+	ret = devm_request_irq(dev, irq, sc16is7xx_irq,
+			       IRQF_TRIGGER_FALLING, dev_name(dev), s);
 	if (!ret)
 		return 0;
 
-- 
2.25.1

If you find that doesn't work, the next step is to delete/comment out the first call to devm_request_irq (and the following if (!ret) return 0;).

sonelu · 2021-05-19T17:24:40Z

It's interesting:

putting the patch in full over the revered head will freeze the Raspberry Pi when I start the program that uses the UARTs. I had to physically turn the Pi off and on.
if I comment out the first call to devm_request_thereaded_irq the driver runs but it's somewhere in between the "good" running and the "bad" running. Here is what I mean: the screenshoot shows the communication statistics (thousands of packets and %errors) when the driver works fine (ex. rpi-4.19.y):

"lower_body" are connected to ttySC1 and "upper_body" are connected to ttySC0.

When the performance is bad (ex. rpi-5.10.y) it looks like this:

After applying the patch and commenting that lines the stats are like this:

Not sure if this is useful, but at the moment all communication is performed in a single thread that is sequentially executing first the actions on the "upper_body", then on "lower_body".

danergo · 2022-01-18T07:49:38Z

Do we have any news on this topic? I'm having some very similar problem on RPi3+ (aarch64), compared to RPi1 (arm).

danergo · 2022-01-18T07:56:35Z

Working: RPi1 (CM):

Linux rpi1 5.10.63+ #1457 Tue Sep 28 11:24:51 BST 2021 armv6l GNU/Linux

Not-working: RPi3 (CM): (missing bytes)

Linux rpi3 5.10.63-v8+ #1488 SMP PREEMPT Thu Nov 18 16:16:16 GMT 2021 aarch64 GNU/Linux

danergo · 2022-01-29T09:51:29Z

ping

paulenuta · 2022-02-01T13:04:55Z

@pelwell the is also #4711.
br Paul

pelwell · 2022-02-01T13:25:30Z

Investigating problems for random devices not used by Pi products is undertaken when time allows and the spirit moves us.

The sc16is7xx driver in rpi-5.10.y is very close to the upstream driver, the only differences being:

94a6e1d70b58 sc16is7xx: Fix for hardware flow control
b59e97d71a29 sc16is7xx: Don't spin if no data received

You can try reverting them and see if the matter improves. If not, it's an upstream issue.

pelwell · 2022-02-01T13:27:36Z

P.S. The recent improvements in the upstream kernel have come from @mcdermj, who may have some insight on the problem.

paulenuta · 2022-02-01T13:41:11Z

Investigating problems for random devices not used by Pi products is undertaken when time allows and the spirit moves us.

The sc16is7xx driver in rpi-5.10.y is very close to the upstream driver, the only differences being:
94a6e1d70b58 sc16is7xx: Fix for hardware flow control
b59e97d71a29 sc16is7xx: Don't spin if no data received
You can try reverting them and see if the matter improves. If not, it's an upstream issue.

I've tried driver approach without getting a solution, all my hints are to tty and kernel which is kind above what I can dive/try.

mcdermj · 2022-02-01T18:12:45Z

The only change I made upstream is to fix the probing of the driver. In the initial Raspberry Pi 5.10 release, the probe wasn't happening correctly on our board. This was because upstream added a check to see if the device was actually out there, where it never did before. But, instead of deferring the probe, they permanently failed it. This mean that on the Pi, where the i2c driver is a module, if the sc16is7xx driver loaded before the i2c driver, you'd never be able to probe the serial port. My change fixed that, and is present in upstream after the initial 5.10 release.

I'm not sure about performance issues. We probably wouldn't notice them in our board because we primarily use it to talk to a GPS unit. It runs on i2c and is at 4800/9600 baud, not the megabits/sec on SPI other folks here are talking about.

That being said, I'm encountering an issue with the December release where it appears that my serial port isn't working at all. I haven't had a chance to narrow this down to software/hardware yet, but hope to do so in the next couple of weeks.

danergo · 2022-02-01T18:58:30Z

@mcdermj: Thanks for the quick heads-up! Yes, I personally use it via SPI, and with at least 1MBps (hence the preference of SPI over I2C). 1MBps is only the current goal, we would go up to even 2 or 3, and we have 2 of these devices attached onto the PI, which would raise the bandwidth to even higher levels. I could definitely experience performance issues during firmware flashing onto a 3rd party device (which worked great on RPi1).
However one thing is confusing me here: on my RPi1 it was 95% working with 5.10.63+.
Is 5.10.y any newer than 5.10.63+?

pelwell · 2022-02-16T12:03:16Z

See #4885 for a related issue, and a fixing patch (currently at de9c455, but will move at the next rebase).

danergo · 2022-02-16T13:12:41Z

@pelwell: I absolutely love how great are you not just in fixes, but also in terms of information sharing. I would possibly never found that other thread, but now I also have the info. Thank you so much!

Earliest time for me to verify this will be around end of next week.

I will surely report my results here.

sequoia-hope · 2022-02-18T04:46:09Z

Hi folks! Well I am sure glad to see I am not the only one experiencing this issue. I have done a fair bit of debugging and I have some information to share. I have not solved it but I do think I have located a break in SPI communications that may be causing the problem.

Using Debian 10/Buster on a CM4 I went back through commits to the kernel and using rpi-update I found the point at which things broke on my system. It was the transition from 5.4 to 5.10
Specifically the last working version was:
5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l GNU/Linux
(via sudo rpi-update 43022f5adbbf5d6e05ea0e022a3090c2c9feff7c)

And the very next commit in the raspberrypi/rpi-firmware repo was:
5.10.1-v7l+ #1383 SMP Fri Dec 18 19:05:58 GMT 2020 armv7l GNU/Linux
(via sudo rpi-update 367c84201db35a85979f1482ebd78e1eef910efc)
This and all subsequent versions seem broken for me. I located the hashes on this page:
https://github.com/raspberrypi/rpi-firmware/commits/master?after=9e99d81910722b6f2e086f6d14e592f292bcbb8b+139&branch=master

I may not have been reading the git history right, but it did not seem like there were any commits to the sc16is7xx driver in this time. But I cannot exactly tell because the breaking commit has thousands of new changes since it is a new kernel. However this makes me wonder if there is a broader problem with SPI and not just this driver.

I set up my test station so I can rapidly swap between good and bad boot directories and confirmed that is where things break.

I have been using a logic analyzer to look at transfers. Today I noticed that the problems begin when there is an apparent gap in SPI data transfer. I have arranged my bench test to always send the same data to the serial device such that it should always send the same data back. You can see in the following screenshots of the capture how at the beginning of the stream, things look well behaved. There is a pattern to the transfers as things are working properly. And then there is an error in the SPI data stream, an unexpected gap in SPI clocks, and when it resumes it does not pick up where it should, but picks up as if it had been clocking some data, and so the SPI transfer is garbled. In my case this seems to cause my sc16is752 to spew data continuously out the TX line for a while, and this ultimately breaks my communication protocol.

EDIT: Actually my reading of this transfer may be incorrect, since it is MISO not MOSI that has a change in bytes, and this may be due to the fact that RX data came in during the pause. Also the pause is not mid transfer as I was thinking, but in between transfers. I am still thinking this over but original message continues below.

Here you can see a zoomed out view, showing the regular pattern of data transfer until about 3792ms as indicated in the top middle of the screen.

This close up view shows the good block of eight SPI control bytes (in the middle) that would be expected just after the serial TX.

This close up view shoes the corrupted SPI transfer and bad control bytes just after a good TX.

Here is the full PulseView logic capture shown in the screenshots. PulseView is open source so you can download it to view the capture:
spi_error_capture.zip
It seems the protocol decoder does not get saved to the zip. Click this icon in PulseView to add SPI and UART protocol decoders:

We are using this chip for motor control on our open source farming robot and we have laid out a custom board with three of these chips controlled by SPI. It worked great before, so we are pretty motivated to help fix it. But I am not entirely sure where to look from here.

I did notice that there are other pauses in SPI, but shorter pauses do not seem to cause a problem. Is it expected that SPI would have pauses in communication like this? If so, it may not be buffering the data properly. For some pauses, the correct bits are clocked after the pause, but for the corrupting pause the bits are incorrect.

I originally ran it at 15MHz SPI clock speed, but I lowered it to 4MHz, then 1MHz, and finally 400kHz to see if that helped and to make it easier to capture with the logic analyzer. The problems persisted at all clock speeds. Running a CPU stress test in another terminal seems to have no affect on the problem.

Any advice would be appreciated!

sequoia-hope · 2022-02-18T06:22:46Z

I just noticed #4100, which affected SPI on 5.10. However I tried the kernel with that fix (43998c82a7e88df284b7aa30221b2d0d21b2b86a) and I still had the same issues. Someone may want to look more in to what was being fixed in that patch.

I looked more in to the 5.4 and 5.10 branches and discovered that the sc16is7xx driver in 5.10 was actually behind that of 5.4 until Jan 27 2021, the day before the above fix, so I believe I have now tested identical driver versions across the two different kernels.
See:
https://github.com/raspberrypi/linux/commits/rpi-5.4.y/drivers/tty/serial/sc16is7xx.c
https://github.com/raspberrypi/linux/commits/rpi-5.10.y/drivers/tty/serial/sc16is7xx.c

But I did realize that the way I wrote my test program contributed to the irregular transfers. The "garbled" transfers in my above post were due to extra data building up in the serial buffer during the short SPI pauses. I have made some adjustments it seems that small pauses no longer mess it up. It does still sometimes read the wrong bytes however which is certainly a problem, so I will continue to debug and try to get a capture of that.

If anyone wants to write a protocol decoder for PulseView for this chip, it would be helpful. Especially if you are already familiar with the SPI protocol, as I am not. The decoder is python and seems relatively approachable.
https://sigrok.org/wiki/Protocol_decoder_HOWTO

I was able to capture an error with my updated program, though the behavior looks similar to before my updates. I think I need to sleep on this. I will call it a night and see if anyone has any thoughts.
updated_spi_capture.zip

pelwell · 2022-02-18T07:05:12Z

Have you seen #4885? There's a fix for that in the latest rpi-update release that also fixed #4711.

sequoia-hope · 2022-02-18T21:04:57Z

Oh great! I saw #4885 but I did not realize that would fix it for SPI. I hadn't seen #4711. Thank you! I will try those and report back.

sequoia-hope · 2022-02-19T05:53:39Z

They latest rpi-update fixes it for me! I have a board with three sc16is752 chips and my code uses four of the six ports at once. With the new kernel it all works with no errors. Thanks everyone who made this happen!

pelwell · 2022-02-21T09:36:46Z

I'm going to mark this as closed. For any further discussion, please use #4885.

afs_wake_up_async_call() can incur lock recursion. The problem is that it is called from AF_RXRPC whilst holding the ->notify_lock, but it tries to take a ref on the afs_call struct in order to pass it to a work queue - but if the afs_call is already queued, we then have an extraneous ref that must be put... calling afs_put_call() may call back down into AF_RXRPC through rxrpc_kernel_shutdown_call(), however, which might try taking the ->notify_lock again. This case isn't very common, however, so defer it to a workqueue. The oops looks something like: BUG: spinlock recursion on CPU#0, krxrpcio/7001/1646 lock: 0xffff888141399b30, .magic: dead4ead, .owner: krxrpcio/7001/1646, .owner_cpu: 0 CPU: 0 UID: 0 PID: 1646 Comm: krxrpcio/7001 Not tainted 6.12.0-rc2-build3+ #4351 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 do_raw_spin_lock+0x3c/0x90 rxrpc_kernel_shutdown_call+0x83/0xb0 afs_put_call+0xd7/0x180 rxrpc_notify_socket+0xa0/0x190 rxrpc_input_split_jumbo+0x198/0x1d0 rxrpc_input_data+0x14b/0x1e0 ? rxrpc_input_call_packet+0xc2/0x1f0 rxrpc_input_call_event+0xad/0x6b0 rxrpc_input_packet_on_conn+0x1e1/0x210 rxrpc_input_packet+0x3f2/0x4d0 rxrpc_io_thread+0x243/0x410 ? __pfx_rxrpc_io_thread+0x10/0x10 kthread+0xcf/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x24/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>

[ Upstream commit 610a79f ] afs_wake_up_async_call() can incur lock recursion. The problem is that it is called from AF_RXRPC whilst holding the ->notify_lock, but it tries to take a ref on the afs_call struct in order to pass it to a work queue - but if the afs_call is already queued, we then have an extraneous ref that must be put... calling afs_put_call() may call back down into AF_RXRPC through rxrpc_kernel_shutdown_call(), however, which might try taking the ->notify_lock again. This case isn't very common, however, so defer it to a workqueue. The oops looks something like: BUG: spinlock recursion on CPU#0, krxrpcio/7001/1646 lock: 0xffff888141399b30, .magic: dead4ead, .owner: krxrpcio/7001/1646, .owner_cpu: 0 CPU: 0 UID: 0 PID: 1646 Comm: krxrpcio/7001 Not tainted 6.12.0-rc2-build3+ #4351 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 do_raw_spin_lock+0x3c/0x90 rxrpc_kernel_shutdown_call+0x83/0xb0 afs_put_call+0xd7/0x180 rxrpc_notify_socket+0xa0/0x190 rxrpc_input_split_jumbo+0x198/0x1d0 rxrpc_input_data+0x14b/0x1e0 ? rxrpc_input_call_packet+0xc2/0x1f0 rxrpc_input_call_event+0xad/0x6b0 rxrpc_input_packet_on_conn+0x1e1/0x210 rxrpc_input_packet+0x3f2/0x4d0 rxrpc_io_thread+0x243/0x410 ? __pfx_rxrpc_io_thread+0x10/0x10 kthread+0xcf/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x24/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

sonelu mentioned this issue Jul 9, 2021

sc16is7xx driver: error when removing and reinserting the module #4352

Open

MichaIng mentioned this issue Aug 3, 2021

[Kernel5.4] Lowering arm_freq_min leads to system hang/crash raspberrypi/firmware#1431

Open

MichaIng mentioned this issue Jan 17, 2022

SC16IS752: driver failure? MichaIng/DietPi#5179

Closed

pelwell closed this as completed Feb 21, 2022

sc16is7xx driver: performance degradation in 5.10 vs. 4.19 #4351

sc16is7xx driver: performance degradation in 5.10 vs. 4.19 #4351

Comments

sonelu commented May 18, 2021

pelwell commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented May 18, 2021

Uh oh!

pelwell commented May 18, 2021

Uh oh!

sonelu commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented May 18, 2021

Uh oh!

sonelu commented May 19, 2021

Uh oh!

pelwell commented May 19, 2021

Uh oh!

sonelu commented May 19, 2021

Uh oh!

pelwell commented May 19, 2021

Uh oh!

sonelu commented May 19, 2021

Uh oh!

danergo commented Jan 18, 2022

Uh oh!

danergo commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danergo commented Jan 29, 2022

Uh oh!

paulenuta commented Feb 1, 2022

Uh oh!

pelwell commented Feb 1, 2022

Uh oh!

pelwell commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulenuta commented Feb 1, 2022

Uh oh!

mcdermj commented Feb 1, 2022

Uh oh!

danergo commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented Feb 16, 2022

Uh oh!

danergo commented Feb 16, 2022

Uh oh!

sequoia-hope commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sequoia-hope commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented Feb 18, 2022

Uh oh!

sequoia-hope commented Feb 18, 2022

Uh oh!

sequoia-hope commented Feb 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelwell commented Feb 21, 2022

Uh oh!

pelwell commented May 18, 2021 •

edited

Loading

sonelu commented May 18, 2021 •

edited

Loading

pelwell commented May 18, 2021 •

edited

Loading

danergo commented Jan 18, 2022 •

edited

Loading

pelwell commented Feb 1, 2022 •

edited

Loading

danergo commented Feb 1, 2022 •

edited

Loading

sequoia-hope commented Feb 18, 2022 •

edited

Loading

sequoia-hope commented Feb 18, 2022 •

edited

Loading

sequoia-hope commented Feb 19, 2022 •

edited

Loading