-
Notifications
You must be signed in to change notification settings - Fork 5.2k
sc16is7xx driver: performance degradation in 5.10 vs. 4.19 #4351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you confirm my understanding that a self-built rpi-5.10.y (not upstream 5.10), the only change being a forward-port of the 4.19 sc16is7xx driver, behaves much better than an equivalent "official" rpi-5.10.y build? |
And is that the sc16is7xx driver as found in rpi-4.19.y or the unmodified upstream version? |
A fork of rpi-5.10.y with your patches would be ideal for testing. |
Yes, it is the rpi-4.19.y version with only a few line updates to account for the changes related to The actual code is here.
If the patch is between the forward-ported version (in the link above) and the current rpi-5.10.y then all the updates done in the last year to the driver in 5.10.y will be gone. I can do that but is that really useful? |
Thanks - there are 15 commits that separate rpi-5.10.y from rpi-4.19.y:
The commented out lines cause merge conflicts when reverted and are unlikely to be the cause of the problem. so I propose we ignore them for now. If you don't mind doing some tests yourself - you are set up for it, whereas I'll struggle to scrape together the hardware - try this, after first getting the latest rpi-5.10.y - I'm on b657cd2:
|
In case you haven't started yet, you get a slightly more comprehensible git log if you change step 6 from:
to:
I've edited the instructions above to avoid confusion. |
Sorry for taking so long: I had trashed the kernel yesterday because of a mistake and had to rebuild from a backup from last week, plus I had some issues with the SD card and needed to change it. Anyway, I will be doing the analysis above. Just that I noticed that if I try to replace the driver hot with:
I get this in the
Seems like the driver is not releasing the IRQ and when reinserting and it fails to reacquire it. Should I raise this as a separate issue? |
Yes please - just a brief issue. It will end up being a separate patch, ideally upstreamed, and having a separate issue makes it easier to track/harder to forget. |
This is where the bisect analysis pointing to:
|
Thanks - that's a great help, although it suggests the problem is not going to be so easy to solve properly. Can you try just with that one commit reverted? It isn't a clean revert, so revert all those local patches ( From 069db9c083b5eeed1773ab83510158766af1c431 Mon Sep 17 00:00:00 2001
From: Phil Elwell <[email protected]>
Date: Wed, 19 May 2021 17:15:38 +0100
Subject: [PATCH] Revert "sc16is7xx: Use threaded IRQ"
This reverts commit 6393ff1c4435acc343b1481f5b834b918cb42b12.
---
drivers/tty/serial/sc16is7xx.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 8e3b83fac33c..c5a65b89d26b 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -328,6 +328,7 @@ struct sc16is7xx_port {
unsigned char buf[SC16IS7XX_FIFO_SIZE];
struct kthread_worker kworker;
struct task_struct *kworker_task;
+ struct kthread_work irq_work;
struct mutex efr_lock;
struct sc16is7xx_one p[];
};
@@ -713,9 +714,9 @@ static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno)
return true;
}
-static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
+static void sc16is7xx_ist(struct kthread_work *ws)
{
- struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
+ struct sc16is7xx_port *s = to_sc16is7xx_port(ws, irq_work);
mutex_lock(&s->efr_lock);
@@ -730,6 +731,13 @@ static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
}
mutex_unlock(&s->efr_lock);
+}
+
+static irqreturn_t sc16is7xx_irq(int irq, void *dev_id)
+{
+ struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
+
+ kthread_queue_work(&s->kworker, &s->irq_work);
return IRQ_HANDLED;
}
@@ -1240,6 +1248,7 @@ static int sc16is7xx_probe(struct device *dev,
mutex_init(&s->efr_lock);
kthread_init_worker(&s->kworker);
+ kthread_init_work(&s->irq_work, sc16is7xx_ist);
s->kworker_task = kthread_run(kthread_worker_fn, &s->kworker,
"sc16is7xx");
if (IS_ERR(s->kworker_task)) {
@@ -1338,16 +1347,14 @@ static int sc16is7xx_probe(struct device *dev,
* In case the interrupt controller doesn't support that, we fall
* back to a non-shared falling-edge trigger.
*/
- ret = devm_request_threaded_irq(dev, irq, NULL, sc16is7xx_irq,
- IRQF_TRIGGER_LOW | IRQF_SHARED |
- IRQF_ONESHOT,
- dev_name(dev), s);
+ ret = devm_request_irq(dev, irq, sc16is7xx_irq,
+ IRQF_TRIGGER_LOW | IRQF_SHARED | IRQF_ONESHOT,
+ dev_name(dev), s);
if (!ret)
return 0;
- ret = devm_request_threaded_irq(dev, irq, NULL, sc16is7xx_irq,
- IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
- dev_name(dev), s);
+ ret = devm_request_irq(dev, irq, sc16is7xx_irq,
+ IRQF_TRIGGER_FALLING, dev_name(dev), s);
if (!ret)
return 0;
--
2.25.1
If you find that doesn't work, the next step is to delete/comment out the first call to |
It's interesting:
"lower_body" are connected to When the performance is bad (ex. rpi-5.10.y) it looks like this: After applying the patch and commenting that lines the stats are like this: Not sure if this is useful, but at the moment all communication is performed in a single thread that is sequentially executing first the actions on the "upper_body", then on "lower_body". |
Do we have any news on this topic? I'm having some very similar problem on RPi3+ (aarch64), compared to RPi1 (arm). |
Working: RPi1 (CM):
Not-working: RPi3 (CM): (missing bytes)
|
ping |
Investigating problems for random devices not used by Pi products is undertaken when time allows and the spirit moves us. The sc16is7xx driver in rpi-5.10.y is very close to the upstream driver, the only differences being:
You can try reverting them and see if the matter improves. If not, it's an upstream issue. |
P.S. The recent improvements in the upstream kernel have come from @mcdermj, who may have some insight on the problem. |
I've tried driver approach without getting a solution, all my hints are to tty and kernel which is kind above what I can dive/try. |
The only change I made upstream is to fix the probing of the driver. In the initial Raspberry Pi 5.10 release, the probe wasn't happening correctly on our board. This was because upstream added a check to see if the device was actually out there, where it never did before. But, instead of deferring the probe, they permanently failed it. This mean that on the Pi, where the i2c driver is a module, if the sc16is7xx driver loaded before the i2c driver, you'd never be able to probe the serial port. My change fixed that, and is present in upstream after the initial 5.10 release. I'm not sure about performance issues. We probably wouldn't notice them in our board because we primarily use it to talk to a GPS unit. It runs on i2c and is at 4800/9600 baud, not the megabits/sec on SPI other folks here are talking about. That being said, I'm encountering an issue with the December release where it appears that my serial port isn't working at all. I haven't had a chance to narrow this down to software/hardware yet, but hope to do so in the next couple of weeks. |
@mcdermj: Thanks for the quick heads-up! Yes, I personally use it via SPI, and with at least 1MBps (hence the preference of SPI over I2C). 1MBps is only the current goal, we would go up to even 2 or 3, and we have 2 of these devices attached onto the PI, which would raise the bandwidth to even higher levels. I could definitely experience performance issues during firmware flashing onto a 3rd party device (which worked great on RPi1). |
@pelwell: I absolutely love how great are you not just in fixes, but also in terms of information sharing. I would possibly never found that other thread, but now I also have the info. Thank you so much! Earliest time for me to verify this will be around end of next week. I will surely report my results here. |
Hi folks! Well I am sure glad to see I am not the only one experiencing this issue. I have done a fair bit of debugging and I have some information to share. I have not solved it but I do think I have located a break in SPI communications that may be causing the problem. Using Debian 10/Buster on a CM4 I went back through commits to the kernel and using rpi-update I found the point at which things broke on my system. It was the transition from 5.4 to 5.10 And the very next commit in the raspberrypi/rpi-firmware repo was: I may not have been reading the git history right, but it did not seem like there were any commits to the sc16is7xx driver in this time. But I cannot exactly tell because the breaking commit has thousands of new changes since it is a new kernel. However this makes me wonder if there is a broader problem with SPI and not just this driver. I set up my test station so I can rapidly swap between good and bad boot directories and confirmed that is where things break. I have been using a logic analyzer to look at transfers. Today I noticed that the problems begin when there is an apparent gap in SPI data transfer. I have arranged my bench test to always send the same data to the serial device such that it should always send the same data back. You can see in the following screenshots of the capture how at the beginning of the stream, things look well behaved. There is a pattern to the transfers as things are working properly. And then there is an error in the SPI data stream, an unexpected gap in SPI clocks, and when it resumes it does not pick up where it should, but picks up as if it had been clocking some data, and so the SPI transfer is garbled. In my case this seems to cause my sc16is752 to spew data continuously out the TX line for a while, and this ultimately breaks my communication protocol. EDIT: Actually my reading of this transfer may be incorrect, since it is MISO not MOSI that has a change in bytes, and this may be due to the fact that RX data came in during the pause. Also the pause is not mid transfer as I was thinking, but in between transfers. I am still thinking this over but original message continues below. Here you can see a zoomed out view, showing the regular pattern of data transfer until about 3792ms as indicated in the top middle of the screen. This close up view shows the good block of eight SPI control bytes (in the middle) that would be expected just after the serial TX. This close up view shoes the corrupted SPI transfer and bad control bytes just after a good TX. Here is the full PulseView logic capture shown in the screenshots. PulseView is open source so you can download it to view the capture: We are using this chip for motor control on our open source farming robot and we have laid out a custom board with three of these chips controlled by SPI. It worked great before, so we are pretty motivated to help fix it. But I am not entirely sure where to look from here. I did notice that there are other pauses in SPI, but shorter pauses do not seem to cause a problem. Is it expected that SPI would have pauses in communication like this? If so, it may not be buffering the data properly. For some pauses, the correct bits are clocked after the pause, but for the corrupting pause the bits are incorrect. I originally ran it at 15MHz SPI clock speed, but I lowered it to 4MHz, then 1MHz, and finally 400kHz to see if that helped and to make it easier to capture with the logic analyzer. The problems persisted at all clock speeds. Running a CPU stress test in another terminal seems to have no affect on the problem. Any advice would be appreciated! |
I just noticed #4100, which affected SPI on 5.10. However I tried the kernel with that fix (43998c82a7e88df284b7aa30221b2d0d21b2b86a) and I still had the same issues. Someone may want to look more in to what was being fixed in that patch. I looked more in to the 5.4 and 5.10 branches and discovered that the sc16is7xx driver in 5.10 was actually behind that of 5.4 until Jan 27 2021, the day before the above fix, so I believe I have now tested identical driver versions across the two different kernels. But I did realize that the way I wrote my test program contributed to the irregular transfers. The "garbled" transfers in my above post were due to extra data building up in the serial buffer during the short SPI pauses. I have made some adjustments it seems that small pauses no longer mess it up. It does still sometimes read the wrong bytes however which is certainly a problem, so I will continue to debug and try to get a capture of that. If anyone wants to write a protocol decoder for PulseView for this chip, it would be helpful. Especially if you are already familiar with the SPI protocol, as I am not. The decoder is python and seems relatively approachable. I was able to capture an error with my updated program, though the behavior looks similar to before my updates. I think I need to sleep on this. I will call it a night and see if anyone has any thoughts. |
They latest rpi-update fixes it for me! I have a board with three sc16is752 chips and my code uses four of the six ports at once. With the new kernel it all works with no errors. Thanks everyone who made this happen! |
I'm going to mark this as closed. For any further discussion, please use #4885. |
afs_wake_up_async_call() can incur lock recursion. The problem is that it is called from AF_RXRPC whilst holding the ->notify_lock, but it tries to take a ref on the afs_call struct in order to pass it to a work queue - but if the afs_call is already queued, we then have an extraneous ref that must be put... calling afs_put_call() may call back down into AF_RXRPC through rxrpc_kernel_shutdown_call(), however, which might try taking the ->notify_lock again. This case isn't very common, however, so defer it to a workqueue. The oops looks something like: BUG: spinlock recursion on CPU#0, krxrpcio/7001/1646 lock: 0xffff888141399b30, .magic: dead4ead, .owner: krxrpcio/7001/1646, .owner_cpu: 0 CPU: 0 UID: 0 PID: 1646 Comm: krxrpcio/7001 Not tainted 6.12.0-rc2-build3+ #4351 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 do_raw_spin_lock+0x3c/0x90 rxrpc_kernel_shutdown_call+0x83/0xb0 afs_put_call+0xd7/0x180 rxrpc_notify_socket+0xa0/0x190 rxrpc_input_split_jumbo+0x198/0x1d0 rxrpc_input_data+0x14b/0x1e0 ? rxrpc_input_call_packet+0xc2/0x1f0 rxrpc_input_call_event+0xad/0x6b0 rxrpc_input_packet_on_conn+0x1e1/0x210 rxrpc_input_packet+0x3f2/0x4d0 rxrpc_io_thread+0x243/0x410 ? __pfx_rxrpc_io_thread+0x10/0x10 kthread+0xcf/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x24/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]>
[ Upstream commit 610a79f ] afs_wake_up_async_call() can incur lock recursion. The problem is that it is called from AF_RXRPC whilst holding the ->notify_lock, but it tries to take a ref on the afs_call struct in order to pass it to a work queue - but if the afs_call is already queued, we then have an extraneous ref that must be put... calling afs_put_call() may call back down into AF_RXRPC through rxrpc_kernel_shutdown_call(), however, which might try taking the ->notify_lock again. This case isn't very common, however, so defer it to a workqueue. The oops looks something like: BUG: spinlock recursion on CPU#0, krxrpcio/7001/1646 lock: 0xffff888141399b30, .magic: dead4ead, .owner: krxrpcio/7001/1646, .owner_cpu: 0 CPU: 0 UID: 0 PID: 1646 Comm: krxrpcio/7001 Not tainted 6.12.0-rc2-build3+ #4351 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x47/0x70 do_raw_spin_lock+0x3c/0x90 rxrpc_kernel_shutdown_call+0x83/0xb0 afs_put_call+0xd7/0x180 rxrpc_notify_socket+0xa0/0x190 rxrpc_input_split_jumbo+0x198/0x1d0 rxrpc_input_data+0x14b/0x1e0 ? rxrpc_input_call_packet+0xc2/0x1f0 rxrpc_input_call_event+0xad/0x6b0 rxrpc_input_packet_on_conn+0x1e1/0x210 rxrpc_input_packet+0x3f2/0x4d0 rxrpc_io_thread+0x243/0x410 ? __pfx_rxrpc_io_thread+0x10/0x10 kthread+0xcf/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x24/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Describe the bug
The sc16is7xx in kernel 5.10 exhibits poor performance compared with the driver in 4.19.
To reproduce
A board with an SC16IS762 is connected over SPI0. The board uses a 32Mhz crystal and is configured in the overlay at 15Mbps SPI communication speed. See the section Additional context below for more details. The communication is produced by a ROS node that commands 10 Dynamixel servos on one bus and 12 on the other at 2Mbps communication speed.
Expected behaviour
No significant difference in performance between 5.10 and 4.19 versions.
Actual behaviour
5.10 version of the driver exhibits almost 10x more communication errors limiting severely the frequency of update for servos.
The comparison is performed by compiling the 4.19 driver on a Raspberry Pi running 5.10 kernel and replacing the
.ko
.System
Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:
cat /etc/rpi-issue
)?vcgencmd version
)?uname -a
)?Logs
There are messages issues in
dmesg
by both versions of the driver like this:But this seems to be significantly more happening in the 5.10 kernel driver.
Additional context
The driver is used with a board that uses sc16is762 chip and connects over SPI0 with CE1. On the same SPI0 there is a TFT screen using ST7789 using CE0. Because the stock
dtb
for sc16is7xx does not support override for CE pin and the SPI communication baud I use a customdts
as follows:The focus is to handle high-speed communication over the 2 UART ports. The experiments performed with the 2 drivers use 2Mbps communication speeds for each port.
Also both ports are configured in RS485 mode as the board leverages the hardware flow control to enable the communication direction over a semi-duplex TTL bus. The setup is done in the software using
ioctl
:Do let me know in case I need to post additional information.
The text was updated successfully, but these errors were encountered: