Skip to content

spi-bcm2708 may calculates wrong clock frequency if RPi2 is overclocked to 1Ghz. #974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dgrat opened this issue May 19, 2015 · 21 comments
Closed

Comments

@dgrat
Copy link

dgrat commented May 19, 2015

Get a fail of the init procedure if I overclock and use this SPI based driver with Raspbian.
https://github.com/diydrones/ardupilot/blob/master/libraries/AP_Compass/AP_Compass_AK8963.h

@notro
Copy link
Contributor

notro commented May 19, 2015

If overclocking changes the core frequency, then you have to change the SPI clock in Device Tree.
It is 250MHz by default. arch/arm/boot/dts/bcm2708_common.dtsi

It should be possible to do this with a DT overlay.

There is ongoing work to make it automatic:
[PATCH 7/7] ARM: bcm2835: Tie SPI clock to the core clock rate

@pelwell
Copy link
Contributor

pelwell commented May 19, 2015

As I discovered with the sdhost driver, overclocking the VPU (core_freq) doesn't result in a fixed frequency unless you either use force_turbo or the performance governor; without these, the core frequency changes with ARM load.

I'd be happy to tweak the firmware to patch the SPI clock frequency in DT, as it does for sdhost, but we may need to place some heavy caveats on using SPI while overclocking.

@pelwell
Copy link
Contributor

pelwell commented May 19, 2015

Reading the firmware code, there is a core_freq_min parameter, which if kept equal to core_freq should allow the ARM frequency to change without affecting the SPI (and sdhost) clock.

@dgrat
Copy link
Author

dgrat commented May 19, 2015

What is the exact problem? Is it the switching of the arm or core frequency when going from higher to lower frequency? Where is the critical code in spi-bcm2708. I have no experience with the kernel, but it would be interesting for me. Thanks

@popcornmix
Copy link
Collaborator

When arm is busy it requests the arm clock is increased.
If you are overclocking other clocks (e.g. core_freq, sdram_freq) then they will increase when the arm is increased (consider this a single "turbo" mode).
It makes sense to increase the core_freq when the arm is busy as that speeds up L2 cache (on Pi1) and memory busy (on Pi1 and Pi2).

If you leave core_freq at the default you shouldn't have any issues with spi. I believe we are saying that use of SPI is not compatible with overclocking of core_freq, because its clock is directly divided down from core_freq. If it turns out that spi is happy with clock changing (as long as it doesn't exceed some level), then that would be possible, but you would obviously get lower throughput when the arm (and hence core_freq) is not in turbo mode.

core_freq_min can't exceed stock (250MHz), so probably is not useful.

@Clouded
Copy link

Clouded commented May 21, 2015

Weird thing in my case - RPi2 is overclocked using raspi-config RPi2 preset (including core_freq), force_turbo=1 is enabled. SPI works correctly - SCLK has the same clock frequency on the oscillator as specified in a transfer call (but I did not change the value in dtsi).
BTW, is there a way to check core_freq of a running system? Don't see anything in sys and can't google it either...

@popcornmix
Copy link
Collaborator

vcgencmd measure_clock core

@dgrat
Copy link
Author

dgrat commented May 22, 2015

http://community.emlid.com/t/ak8963-bad-device-id-error-during-ardupilot-start/267/46
Someone else seems to notice this also just on some RPi2s. Problem is that I don't have an oscilloscope.

@Clouded
Copy link

Clouded commented May 22, 2015

@popcornmix thanks!

I don't really understand what's going on, info in cpufreq differs from vngencmd.

Here's what I have in config.txt:

force_turbo=1
arm_freq=1000
core_freq=500
sdram_freq=500
over_voltage=2

And here is what cpufreq and vcgencmd output:

pi@raspberrypi~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance
pi@raspberrypi ~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 
1000000
pi@raspberrypi ~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 
1000000
pi@navio-rpi ~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
1000000
pi@raspberrypi ~ $ vcgencmd measure_clock arm 
frequency(45)=600000000
pi@raspberrypi ~ $ vcgencmd measure_clock core
frequency(1)=250000000

So if vngencmd is to be trusted, then my RPi2 isn't overclocked and that explains why SPI works fine in my case. But why isn't it overclocked and info in cpufreq differs?

@popcornmix
Copy link
Collaborator

"vcgencmd measure_clock arm" is the definitive answer. It connects the clock to a counter and reports the count. It will always be correct.

Seems force_turbo isn't enabled. Can you report output of vcgencmd get_config int and vcgencmd measure_clock core

@Clouded
Copy link

Clouded commented May 22, 2015

Here it is:

pi@raspberrypi ~ $ vcgencmd get_config int
arm_control=0xa5800010
arm_freq=1000
avoid_fix_ts=1
config_hdmi_boost=2
core_freq=500
disable_commandline_tags=2
disable_l2cache=1
emmc_pll_core=1
force_eeprom_read=1
force_pwm_open=1
force_turbo=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
hdmi_force_cec_address=65535
over_voltage=2
over_voltage_avs=0x1b774
pause_burst_frames=1
program_serial_random=1
sdram_freq=500
temp_limit=85
pi@raspberrypi ~ $ vcgencmd measure_clock core
frequency(1)=250000000

@popcornmix
Copy link
Collaborator

That look okay. Are you sure you don't have an under voltage or over temperature warning (https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=82373) which would cause the overclock to be disabled?

@Clouded
Copy link

Clouded commented May 22, 2015

Ah, that's what was causing this.
I was using the RPi2 headless, so never noticed that.
After removing USB WiFi dongle arm and core values pumped up to the max.
@popcornmix thanks for your help!

But in that case even if core_clock is specified in the dts SPI still may break when RPi decides to disable force_turbo. Are there any problems checking the core clock right before starting the transfer (would that be a big overhead?) ?

@popcornmix
Copy link
Collaborator

Checking core_freq before starting the transfer doesn't really fix the issue. The clock can still change during the transfer.

popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue May 23, 2015
kernel: vcsm: Add ioctl for custom cache flushing

firmware: arm_loader: Add new vpu+qpu multi execute api

firmware: arm_loader: Set the SPI clock in DT to core_freq
See: raspberrypi/linux#974
popcornmix added a commit to raspberrypi/firmware that referenced this issue May 23, 2015
kernel: vcsm: Add ioctl for custom cache flushing

firmware: arm_loader: Add new vpu+qpu multi execute api

firmware: arm_loader: Set the SPI clock in DT to core_freq
See: raspberrypi/linux#974
@hkallweit
Copy link
Contributor

"The clock can still change during the transfer."
It should be possible to address this using a mutex (at least keep the cpufreq driver from changing the freq during a transfer).
It also protects against the case that a spi transfer is started whilst the cpufreq driver asks the firmware to change the freq (not sure whether such firmware calls are asynchronous or not).

--- bcm2835-cpufreq.c.old 2015-06-29 23:58:54.785712448 +0200
+++ bcm2835-cpufreq.c 2015-06-30 21:32:51.631212531 +0200
@@ -69,6 +69,9 @@
{0, 0, CPUFREQ_TABLE_END},
};

+DEFINE_MUTEX(bcm2835_cpufreq_chgclk);
+EXPORT_SYMBOL(bcm2835_cpufreq_chgclk);
+
/*

clk_rate either gets or sets the clock rates.
@@ -91,7 +94,9 @@
msg.tag.val = arm_rate * 1000;

' /* send the message */
'+ mutex_lock(&bcm2835_cpufreq_chgclk);
s = bcm_mailbox_property(&msg, sizeof msg);
'+ mutex_unlock(&bcm2835_cpufreq_chgclk);

' /* check if it was all ok and return the rate in KHz */
if (s == 0 && (msg.request_code & 0x80000000))

The exported mutex is then used in the spi driver to protect the transfer in the transfer_one function.
Possible side effect: A cpufreq call to change the frequency might have to wait few msecs if it happens during a longer spi transfer. But this seems to be neglectable to me.

In addition I changed the spi driver to get the core freq directly via mailbox from the firmware for each transfer. (basically just copied the bcm2835_get_clock function from the cpufreq driver)
By the way: IMHO the generic clock getter/setter in the cpufreq driver could be candidates to be moved to the mailbox driver as they might be relevant also for other modules.

Works ok here so far but I'm not sure whether this is a reasonable approach to tackle the issue.

@notro
Copy link
Contributor

notro commented Jul 1, 2015

Eric Arnholt has a proposal for a proper clock driver that controls the firmware clocks:
http://lists.infradead.org/pipermail/linux-rpi-kernel/2015-May/001786.html

With that in place, the SPI driver can clk_notifier_register() a callback to detect clock rate changes. This allows it to accept or reject the change.

Example: cadence i2c driver clk callback: http://lxr.free-electrons.com/ident?i=cdns_i2c_clk_notifier_cb:

Perhaps we could start to use his clock driver in rpi-4.1.y, even if it's not accepted upstream yet.

@hkallweit
Copy link
Contributor

Thanks for the info, sounds very promising ..

@popcornmix
Copy link
Collaborator

I don't believe it's as simple as a mutex on the cpufreq driver.

core freq can still change asynchronously to this. e.g. overclock is disabled on firmware side when over-temperature or under-voltage. We also have plans to trigger turbo mode form the gpu side when we know it is useful (e.g. MVC, 1080p60, 1080i60 with deinterlace).

I think it's safest to assume that dynamic overclock is not supported when using hardware that is clocked from the core clock.

@hkallweit
Copy link
Contributor

Sure, I'm aware that this simple mutex protects against cpufreq-triggered core freq changes only.

Then the interesting question is whether there's a way for the firmware to inform the new clock driver about such asynchronous clock changes.
However one difference to what was written by notro about the new clock driver would be that most likely subscribers of the clock change notifications wouldn't be allowed to reject such asynchronous clock changes.
At a first glance also the new clock driver can't fully solve the issue.

@Ruffio
Copy link

Ruffio commented Aug 16, 2016

@dgrat has your issue been resolved? If so, please close this issue. Thanks.

neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
kernel: vcsm: Add ioctl for custom cache flushing

firmware: arm_loader: Add new vpu+qpu multi execute api

firmware: arm_loader: Set the SPI clock in DT to core_freq
See: raspberrypi/linux#974
@JamesH65
Copy link
Contributor

Closing due to lack of activity. Reopen if you feel this issue is still relevant.

popcornmix pushed a commit that referenced this issue Jan 27, 2022
…h panic

[ Upstream commit 06e629c ]

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Jan 28, 2022
…h panic

[ Upstream commit 06e629c ]

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Jan 28, 2022
…h panic

[ Upstream commit 06e629c ]

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
whdgmawkd pushed a commit to whdgmawkd/linux that referenced this issue Feb 11, 2022
…h panic

[ Upstream commit 06e629c ]

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: raspberrypi#974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab2623 ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants