-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Aux SPI (SPI1) + Console on Aux UART (ttyS0) = Lockup #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the thorough report. You are correct that the Aux devices share a clock, but that clock is the core clock and is only controlled by the VPU. I think it's more likely that there is a problem with the interrupt handling, since they also share an interrupt line and require an extra level of interrupt decoding. |
If i am reading the kernel source correctly, then the AUX Uart must be being processed by: Which seems to be a generic 8250 driver, and not specific to the quirks of the bcm aux serial port. The error "kernel: [ 49.972018] serial8250: too much work for irq59" happens when the Uart IRQ spins processing UART IRQ's , waiting for the IRQ to go away after servicing the uart. It doesnt have any knowledge in there that this is in fact an AUX serial port and so has no knowledge that the SPI can also cause interrupts. SO you could presumably get into a situation that its stuck in there waiting for Interrupts to go away, but they never do, because the SPI cant process them, until it times out with this error. Assuming this is the driver being used for the aux uart, then It would seem naive to use this driver for the aux uart, because its not an 8250, just smells a bit like one, and clearly doesnt behave like an 8250 with regard to its interrupts. I think pelwell is spot on with his belief that its interrupt handling causing the problem. Looks like the aux uart might need its own specific driver that know these things. Benefit, it would be simpler than the 8250 generic driver simply because of its reduced functionality. A second question this raises of course is why the SPI causes ANY interrupts to be generated at all, because it shouldn’t generate any in the absence of SPI transfers..... Just enabling the SPI port shouldn't do that. |
Agreed, my reading too suggests that the SPI fires off an interrupt. One option is that the SPI driver enables the interrupt, but before claiming the IRQ vector. IIRC, Linux will handle multiple devices on one interrupt vector correctly. Something that was left over from when X86 systems only had sixteen of them. The only way that sharing of IRQs can work is that the IRQ routine returns as fast as possible when it's device was not responsible for the interrupt. This is something that the 8250 driver MUST be capable of, because it dates from the time when COM1, COM4 were both IRQ3 and COM2, COM3 where both IRQ4. Or something like that. This would mean that the bug lies in the SPI driver that has not activated its interrupt routine yet, when the module fires off an interrupt. Checking out the SPI driver... This would happen when that first interrupt does not satisfy: |
If you look at the code for the standard UART driver, you will see this is precisely what it DOES NOT DO. It spin loops, while ever there is an IRQ present, eventually times out, and generates "kernel: [ 49.972018] serial8250: too much work for irq59" which is the UART driver saying, hey, I spun for a long time and the IRQ never went away. The reason why its spinning is conjecture at this point, and it seems to be related to the SPI, BUT that is the reported ERROR from the kernel side. I agree that the driver should be checking its IRQ status and returning as FAST as possible, for any device in the AUX device block, that is as easy as checking a single bit in AUXIRQ Register (0x7E21 5000), but i cant find anywhere that does that. I think the problem is that the UART drivers inside the Linux Kernel seem to believe that they are NOT SHARED with anything but a UART. And in this case thats not true. The UART IRQ handler spins, looking for all possible uarts on the same IRQ and process them until the IRQ goes away. This is because it assumes the UART is on a ISA bus which only has a edge triggered interrupt, and if you return without restoring the state of the IRQ to not asserted, you will not get any more IRQ's and your driver will stall. Again, this is a quirk of ISA bus and has nothing to do with the BCM8253 Uart implementation, and this is precisely where we seem to be getting hung up, based on the messages being reported by the Kernel. The belief the SPI might be causing an IRQ and not handling it properly is beside the point, the UART driver should not get hung up like this, regardless of what is going on in the SPI driver side. But i also read the SPI code, which specifically clears out all pending SPI interrupts before enabling the IRQ, so im not 100% convinced its that which is triggering this fault in the UART driver. But even if it is, the UART driver should not hang up like this, because even if we clear this potentially spurious IRQ at initialisation, any use of SPI1 in conjunction with UART1 will possibly trigger this fault in the UART driver. From the Broadcom BCM2835 Peripherals Manual:
And so, we see this in the code, which enables the module clocks, because the UART driver doesn’t know anything about that: a72021a
We also have this comment from the Broadcom BCM2835 Peripherals Manual:
Which kind of looks more like the problem being experienced. A UART generating millions of IRQ's which the system can not process. Would also explain the really big slow down also reported. Because, even if the SPI caused the interrupt, the UART would eventually time out, the SPI should then get a chance to service, it should clear the source of the interrupt and then we should be back to normal, but that does not happen. What we need is a dump of the full AUX device state (IRQ, CLOCK, UART1, SPI1 & 2) at the time the kernel prints "kernel: [ 49.972018] serial8250: too much work for irq59" and then we could see whats going on exactly. |
Can anyone confirm this on 4.4.13-v7+ ??? |
A recent update disabled the mini-uart by default on the PI3b so you need to explicitly enable it now. See Phil's post for details: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=141195 prior to enabling it you'll probably see
to enable just add 'enable_uart=1' to the config.txt and you'll have slow boot/lockup on a PI3b |
Any update on this? I have a custom designed board based on compute module v1. UART1 is being used on pins 32 and 33 . (as the console) When SPI1 is not enabled, everything is fine. However, when sc16is752-spi1 overlay is enabled, UART1 hangs up immediately. Just as an information, |
I'm afraid this issue dropped off the radar. We're going to be short-handed until January, but if you ping us in the new year, and if it is as easy to reproduce as it sounds, then we ought to be able to make some progress. |
With reference to issue 1635 (effective duplicate of this) has there been any update? |
I've looked for an SC16IS752 module, but not found one yet - acquiring suitable test hardware is a hurdle to understanding and fixing this issue. Can anyone recommend something? |
Hi Forgive me if I'm barking up the wrong tree here, but I was able to lock the system by just enabling two of the aux peripherals (SPI and com port) via overlays at the same time and then trying to use the com port Your comment here : Suggested this is was shared interrupt problem (?) rather than an using a specific piece of hardware FWIW eBay (god bless them) have a SC16IS752 module here : http://www.ebay.co.uk/itm/121292250289 Hope this helps? |
Yes, it probably is a shared interrupt problem, but we still need some (ideally simple) SPI peripheral to be able to tickle it. That module is a 750 (not a 752), and the simplest test case I've come across involved two 752s wired back to back, but if that is sufficient then I'm sure our budget can stretch to buying one. |
What about a CAN-BUS module - that's pretty cheap and easy to obtain Again, I was able to trigger the problem without having an SPI device plugged in, just the CAN overlay enabled |
That's useful information, but can you show me where you previously mentioned not needing any hardware? |
OK, my bad (too many brain cells killed over christmas!) So assuming that's good enough for testing this is a cheap module from ebay UK : http://www.ebay.co.uk/itm/142161216681 |
Not in the short term, but I can provide you a custom SC16IS752 board if needed. @pelwell |
@mypiandrew Does that need a CAN device to exhibit the problem? I'd prefer a UART because at least I have suitable equipment to hook it to. @meerd Thank you for the offer. I'll buy some 750 modules and see how I get on. |
No didn't need to be hooked up to any CAN equipment, just had the HAT board fitted. |
This might be of interest Running the new CM3 I've had to update from 4.1.21 to a later version, however I forgot to make a change to the config.txt file when updating and got a surprise.. Running this : this cmdline.txt: and this config.txt : dtoverlay=mmc The board starts up fine and detects the CAN controller correctly resulting in a CAN0 interface (haven't done a comms test, but this is further than I've seen before) vcdbg log msg shows : 002584.883: dtparam: spi=on I can run both the below without problem stty -F /dev/ttyS0 -a But if I use the below overlay (which I assumed was the correct one for this kernel) : dtoverlay=spi1-1cs Then the board boots s-l-o-w-l-y , ethernet gets an IP but then you cannot SSH into it This effect happens with and without a CAN board plugged in |
related forum post ? |
Changing dtoverlay=spi1-1cs for dtoverlay=spi0-hw-cs Board boots OK and now GPIO7/8 show SPI names rather than just "Output" root@raspberrypi:~# raspi-gpio get I note that I am using SPI0 and not SPI1 or SPI2 here, if my issue has been unrelated my apologies please feel free to remove my posts for clarity. |
OK - I can reproduce this with an SC16IS750 module on SPI1. Watch this space. |
The 4.9 kernel has a dedicated driver for UART1 (bcm2835-aux-uart), and I've found that booting into 4.9 works fine where 4.4 would stall and generate kernel warnings about overloaded IRQs when SPI1 is enabled. We are due to switch to 4.9 very soon as our mainline kernel, but you can try it now by running:
Normally this isn't something I would advise - you get bleeding edge features and the possibility of bugs - but there are currently no known problems with the 4.9 kernel. If you have a spare card, I would recommend installing a fresh Raspbian, updating to the @mypiandrew The sdhost driver is now recommended for all Pis, including CMs, so there is no need for "dtoverlay=mmc" any more. Also, spi-bcm2835 is the default driver so "dtoverlay=spi-bcm2835-overlay" is pointless (and never include "-overlay" when you are referring to overlays). |
Nice work Phil! |
Note that there even in 4.9 there is a potential problem with the Aux peripherals under load - see #1573. I think the correct solution is to create a new Aux interrupt controller, which should be a case of copying some code from pinctrl-bcm2835.c and either adding it to the aux clock driver (@anholt?) or creating a dedicated driver, having first prised ownership of the IRQ register from the clock driver. The larger battle is likely to be getting it upstreamed. |
thanks Phil, |
We may be able to just put an irq controller driver in the existing aux driver. We'd have to try it to see how the subsystem maintainers reacted to it, though (different subsystems feel differently about whether you can use their subsystem from outside of their directory). |
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: #1484 #1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
commit b3c6dd9 Author: Phil Elwell <[email protected]> Date: Thu Mar 23 16:34:46 2017 +0000 bcm2835-aux: Add aux interrupt controller The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]> Gbp-Pq: Topic rpi Gbp-Pq: Name rpi_1211_b3c6dd9ed905a1558061cbc144cc29f8766e31cf.patch
commit 09ee6ad Author: Phil Elwell <[email protected]> Date: Thu Mar 23 17:08:44 2017 +0000 BCM270X_DT: Enable AUX interrupt controller in DT See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]> Gbp-Pq: Topic rpi Gbp-Pq: Name rpi_1212_09ee6ad78fe69c4196edd973e0bb4fb3a417dd85.patch
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]>
The AUX block has a shared interrupt line with a register indicating which devices have active IRQs. Expose this as a nested interrupt controller to avoid sharing problems. See: raspberrypi/linux#1484 raspberrypi/linux#1573 Signed-off-by: Phil Elwell <[email protected]> Signed-off-by: Seth Forshee <[email protected]>
I've come across an issue that causes raspbian to boot up very slowly then lockup at the login prompt.
The issue is occurring with the latest kernel (4.4.9-v7+ #884)
How to reproduce:
The lockup occurs when the following 3 items are configured
dtoverlay=spi1-1cs
to /boot/config.txt)dtoverlay=uart1
to /boot/config.txt on a PI2 or earlier)console=ttyS0,115200
to /boot/cmdline.txt on a PI2 or earlier)As you can see the PI3 is particularly vulnerable as 2 of the 3 conditions are already set so all it takes is activating SPI1 and you get a lockup.
Workarounds:
On a PI3, stopping the console being redirected onto the Aux UART will solve it. There are a few ways to do this depending on what other hardware you need.
console=serial0,115200
from the /boot/cmdline.txtdtoverlay=pi3-disable-bt
to the /boot/config.txt (Disables the bluetooth module and reinstates the PL011 UART as the console)On a PI2 or earlier - don't direct the console to the Aux UART if you are using SPI1!
Symptoms and Errors:
When a PI is configured as per above and restarted It boots very slowly. This seems to start when the console is enabled on ttyS0, e.g.
kernel: [ 1.083247] console [ttyS0] enabled
Throughout the boot process you might see an error like
kernel: [ 49.972018] serial8250: too much work for irq59
The boot process is now a lot longer. e.g. 120secs vs 15 secs for a PI2, 90sec vs 12secs for a PI3.
The console speeds up briefly just before the login prompt is displayed but then locks up hard. Only recovery option is to mount the SD in another machine and reconfigure cmdline.txt / config.txt as per the workarounds.
Root Cause:
Unsure I this stage. The Aux peripherals share a clock so it seems probable that that is the root cause:
The slow boot could be caused by inadvertently configuring the aux clock incorrectly.
The lockup could be caused by the clock being reconfigured whilst enabled.
The uart baud rate seems to be set at least twice during the boot up (once by the kernel for the console and once by getty?) does this cause a clock reconfiguration? somehow having SPI1 active buggers this up (is it setting up the clock independently?)
Unsure of where to look - I'm not even sure which register is the aux clock even...
https://github.com/raspberrypi/linux/blob/rpi-4.4.y/drivers/clk/bcm/clk-bcm2835.c
The text was updated successfully, but these errors were encountered: