Skip to content

MCP2515 fails to initialize on software reboot #2767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
robot-acladera opened this issue Nov 26, 2018 · 14 comments
Closed

MCP2515 fails to initialize on software reboot #2767

robot-acladera opened this issue Nov 26, 2018 · 14 comments

Comments

@robot-acladera
Copy link

I am having a problem that I thing may be the same described in #2175 (comment). I use a custom CAN board using a MCP2515 controller and a SMD crystal (Xtal 8MHz SMD 'ABM3-8.000MHZ-D2Y-T'). The symptoms are:

  • After a power-off/power-on cycle the MCP2515 initializes correctly. But after a software reboot the initialization fails with the following messages:
    mcp251x spi0.0: Cannot initialize MCP2515. Wrong wiring?
    mcp251x spi0.0: Probe failed, err=19
  • When the MCP2515 has been initialized correctly, the command "ifup can0" fails the first time that it is executed but works when is executed a second time.

I analysed the problem monitoring the SPI bus signals and I have concluded that the it is related to the fact that the mcp251x driver puts the controller in sleep-mode after a successful initialization. Then, the next operation wakes up the controller with a RESET spi command, waiting for 5 ms and reading CANSTAT to check the controller presence. The problem here is that my board takes more than 5 ms to wake-up from sleep mode (it takes more than 25 ms).

I have applied the following patch to mcp251x.c to increase the wait time while trying to read CANSTAT.

@@ -629,6 +629,7 @@
    struct mcp251x_priv *priv = spi_get_drvdata(spi);
    u8 reg;
    int ret;
+  int retry_count;
 
    /* Wait for oscillator startup timer after power up */
    mdelay(MCP251X_OST_DELAY_MS);
@@ -641,7 +642,14 @@
    /* Wait for oscillator startup timer after reset */
    mdelay(MCP251X_OST_DELAY_MS);

-   reg = mcp251x_read_reg(spi, CANSTAT);
+   /* Keep retrying read to cope with high wake-up time crystals */
+   retry_count = 100;
+   do {
+       --retry_count;
+       mdelay(1);
+       reg = mcp251x_read_reg(spi, CANSTAT);
+   } while((reg == 0) && (retry_count > 0));
+
    if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF)
        return -ENODEV;

This solves the problem for me.

@pelwell
Copy link
Contributor

pelwell commented Nov 27, 2018

We prefer changes like this to go upstream, but there's already one downstream change on this file so I'm prepared to use this as a staging area.

A few questions and suggestion:

  1. Have you tried just changing the delay value? I understand this would slow down all resets, I just want to confirm that problem is just timing as you say.
  2. 1 ms looks too short if your application needs > 25ms - 5 or 10 seem more reasonable.
  3. The driver should only need the extra delay if the register access fails, i.e.:
    for (retries = 0; retries < 100; retries++) {
        reg = mcp251x_read_reg(spi, CANSTAT);
        if (reg)
            break;
        mdelay(5);
    }

    if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF)
        return -ENODEV;
  1. Is the non-zero reg test within the loop correct/sufficient? Are you guaranteed to get zeroes until the device is ready?

@robot-acladera
Copy link
Author

I have tried with a fixed delay of 30 ms and it works with my board. As you say this has the problem of slowing down all resets and, also, it may not work on boards with an even longer wake-up time.

I agree with points 2 and 3. Maybe the number of retries can be reduced if delay is increased, in order to reduce the initialization time when no controller is present.

The last point is more difficult for me to answer. The MCP2515 data-sheet states:

The MCP2515 utilizes an Oscillator Startup Timer (OST) that holds the MCP2515 in reset to ensure that the oscillator has stabilized before the internal state machine begins to operate.The OST maintains reset for the first 128 OSC1 clock cycles after power-up or a wake-up from Sleep mode occurs. It should be noted that no SPI protocol operations should be attempted until after the OST has expired.

I assume that the MCP2515 holds MISO in high-impedance while OST maintains reset (actually we have checked that this is true by measuring it). In this case we read zeroes as long as the SPI master has a pull-down on this line. I'm pretty sure that this is the case for Raspberrypi, but don't know for other systems.

@hilschernetpi
Copy link

We have finished a CAN board design too two weeks ago using the MCP2515 chip. We are using a 20Mhz crystal instead and communicate across SPI 1.0 interface with it.

We are faced with the same effect as @robot-acladera stated.

1.) On each power cycle everything is working fine, the HATs eeprom is read, the overlay is loaded and the CAN interface is initialized during the kernel boot

If we now execute a soft start we are getting
mcp251x spi1.0: Cannot initialize MCP2515. Wrong wiring? mcp251x spi1.0: Probe failed, err=19
Another soft start brings up the interface back to work.

So our overall summary today is that the interface is working properly only every second time.

2.) Calling ip link set can0 up type can bitrate 1000000 the first time returns RTNETLINK answers: No such device. Calling it a second time works fine.

Cause we really had no clue what's going wrong with the communication (didn't use any oscilloscpe for analysis) we lowered down the maximum SPI frequency step by step with no effect. So we returned back to 10Mhz and are happy to have found this discussion right now.

We'll see if we can patch the kernel source too and report back what happens.

@VoltAdrien
Copy link

With a custom design using a MCP2515 (16MHz) + MCP2561 at 500kb.

I am having the same problem with the response : "RTNETLINK answers: No such device".
The second time works fine too.
The can frames reception seems to be OK but i can't send any frame.

@hilschernetpi
Copy link

Meanwhile we got past the problem. The ingenious idea was delivered by a Super Moderator of Microchip's Forum at https://www.microchip.com/forums/m875465.aspx.

He says "You have to WAIT for the chip to change modes. That means your code has to wait in a loop continuously reading the CANSTAT.OPMODE bits.". And indeed in no word in the MCP2515 manual there is a explained of now long the chip needs to return to "configuration mode" when it was in "normal operation mode".

The original source code just checks the CANSTAT.OPMODE only once after waiting MCP251X_OST_DELAY_MS time (5msec) when the chip got a software reset:

if ((reg & CANCTRL_REQOP_MASK) != CANCTRL_REQOP_CONF)
return -ENODEV;

This is definitively too less time if the chip was already initialized and in normal mode till it returns to the configuration mode (CANCTRL_REQOP_CONF).

So you can either increase the waiting time or you can wait in a loop (which might be difficult in a kernel mods driver)

@VoltAdrien
Copy link

I understand the problem but when i test the same process with a PiCAN2 Board with MCP2515 and MCP2551, it is ok.
Which original source code file should be modified with an increased delay ?

@hilschernetpi
Copy link

In Raspbian (latest version) it is in

https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/can/spi/mcp251x.c

in the function mcp251x_hw_reset()

Watch the MCP251X_OST_DELAY_MS definition set to 5 only. Set it to 40 and it will be enough time.

@JamesH65
Copy link
Contributor

JamesH65 commented Jul 31, 2019

@pelwell The driver still only has a 5ms delay in the latest 4.19 (and 5.2) tree, should we think of merging in your change above?

pelwell pushed a commit that referenced this issue Aug 1, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
@pelwell
Copy link
Contributor

pelwell commented Aug 1, 2019

Fair point - see 9543c9f. N.B. I've read the patch through several times and I think it's correct but I haven't tested it. I would appreciate it if somebody could give it a quick test (we'll probably release an rpi-update firmware in the next day or two).

popcornmix added a commit to raspberrypi/firmware that referenced this issue Aug 1, 2019
kernel: Correct the name of the Raspberry Pi video decoder
See: raspberrypi/linux#3111

kernel: Fixup FKMS interrupt handing for non-existent display
See: raspberrypi/linux#3110

kernel: katana volume minimum value correction
See: raspberrypi/linux#3109

kernel: can: mcp251x: Allow more time after a reset
See: raspberrypi/linux#2767

kernel: overlays: Update the upstream overlay

kernel: overlays: Add audio parameter to vc4-kms-v3d
See: raspberrypi/linux#2489

firmware: pwm_audio: Use the correct DREQs on Pi4
See: #1214

firmware: pixelvalve_2711: Alter back porch for widths of 1366
See: #1202

firmware: Clear the SMIDSW1 display interrupt flag on startup

firmware: dt-blob: Declare Pi 4B's SD_IO voltage selector
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Aug 1, 2019
kernel: Correct the name of the Raspberry Pi video decoder
See: raspberrypi/linux#3111

kernel: Fixup FKMS interrupt handing for non-existent display
See: raspberrypi/linux#3110

kernel: katana volume minimum value correction
See: raspberrypi/linux#3109

kernel: can: mcp251x: Allow more time after a reset
See: raspberrypi/linux#2767

kernel: overlays: Update the upstream overlay

kernel: overlays: Add audio parameter to vc4-kms-v3d
See: raspberrypi/linux#2489

firmware: pwm_audio: Use the correct DREQs on Pi4
See: raspberrypi/firmware#1214

firmware: pixelvalve_2711: Alter back porch for widths of 1366
See: raspberrypi/firmware#1202

firmware: Clear the SMIDSW1 display interrupt flag on startup

firmware: dt-blob: Declare Pi 4B's SD_IO voltage selector
@popcornmix
Copy link
Collaborator

@pelwell 's patch is in latest rpi-update kernel. Please test and report if it helps.

pelwell pushed a commit that referenced this issue Sep 13, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Sep 13, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Sep 13, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Sep 13, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 17, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Sep 19, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Sep 19, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 4, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 4, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 11, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 11, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
pelwell pushed a commit that referenced this issue Oct 29, 2019
Some boards take longer than 5ms to power up after a reset, so allow
a few retry attempts before giving up.

See: #2767

Signed-off-by: Phil Elwell <[email protected]>
@magicsmoke
Copy link

Hmm, seems like this issues coincides with issues that we are having with our own custom hardware as well.

Valid SPI signals (see screenshot) but we get the same err:19. 5ms.

Is there a simple way that I could extend the delay, or does this require learning how to compile a new driver with the delay changed?

image-19

@pelwell
Copy link
Contributor

pelwell commented Apr 8, 2024

Since the 5.10 kernel, the mcp251x driver has retried the read of the status register after power-up or reset for up to 1 second. Unfortunately your message doesn't include enough information to make any suggestions. You can start by explaining the hardware and configuration and by sharing some relevant extracts from the kernel log.

@magicsmoke
Copy link

Hi Phil,

Thanks for taking the time to respond. I in fact managed to trace down this bug.

Turned out that there was another MCP2515 that had it's CS pulled down that was causing havoc on the SPI bus.

This issue was the only one that shows the same symptoms. Thanks and sorry!

@pelwell
Copy link
Contributor

pelwell commented Apr 9, 2024

No problem - it's a helpful reminder that this should have been closed long ago.

@pelwell pelwell closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants