Skip to content

firmware in Bookworm repo results in degraded performance #1705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HankB opened this issue Mar 15, 2022 · 47 comments
Closed

firmware in Bookworm repo results in degraded performance #1705

HankB opened this issue Mar 15, 2022 · 47 comments

Comments

@HankB
Copy link

HankB commented Mar 15, 2022

Describe the bug

Upgrading raspi-firmware on Debian Bookworm on a Pi 4B results in significant performance degradation.

To reproduce

  1. Install 20220121_raspi_4_bookworm.img.xz to an SD card or SSD.
  2. Boot and evaluate performance by any convenient means.
  3. Determine available processor clocks.
  4. apt update
  5. apt install raspi-firmware
  6. Reboot and evaluate performance by any convenient means. It will be significantly degraded.
  7. (Try to) determine available processor clocks. I could find none.

The firmware updates:

raspi-firmware/testing 1.20220120+ds-1 arm64 [upgradable from: 1.20210805+ds-1]

Expected behaviour

Normal performance and behavior following upgrade of raspi-firmware

Actual behaviour

Severely degraded performance and seeming inability for S/W to determine and control available processor clocks.

System

raspios not available on Debian.

  • Which model of Raspberry Pi?

Pi 4B/8GB and Pi 4B/2GB

  • Which OS and version (cat /etc/rpi-issue)?
Debian GNU/Linux bookworm/sid \n \l
  • Which firmware version (vcgencmd version)?

vcgencmd is not available on Debian. Does the version information listed above answer this?

  • Which kernel version (uname -a)?
Linux down 5.15.0-2-arm64 #1 SMP Debian 5.15.5-2 (2021-12-18) aarch64 GNU/Linux

Logs

/var/log/kern.log at https://paste.debian.net/1234272/

Additional context

I have posted my notes with further detail at https://github.com/HankB/Pi-4B-bookworm-performance

@XECDesign
Copy link
Contributor

That's not our image, kernel or firmware package.

@pelwell pelwell closed this as completed Mar 15, 2022
@gwolf
Copy link

gwolf commented Mar 15, 2022

It is the firmware distributed by this site; I am the Debian maintainer for it. The kernel is, yes, an upstream kernel.
Would you know where this should be reported to, or how can we help solve this regression?

@XECDesign
Copy link
Contributor

Ah alright, I'm not seeing anything obvious in the packaging that may have had an effect, so if the only difference is the firmware change, then it sounds like it could be a legitimate issue.

https://salsa.debian.org/debian/raspi-firmware/-/compare/debian%2F1.20210805+ds-1...debian%2F1.20220120+ds-1

At first I thought maybe this change may have generated a significantly different config.txt, but no, that looks okay. https://salsa.debian.org/debian/raspi-firmware/-/commit/6ee3ca756fe369fdb961b04de900fe0ba12afb29#94dc7bd91c09e1a4aaa1a84a29e4657f3aefb44a_110_110

@XECDesign
Copy link
Contributor

I'm not seeing a difference in Raspberry Pi OS with just a firmware change:

sudo SKIP_KERNEL=1 SKIP_BACKUP=1 SKIP_SDK=1 rpi-update ea9e10e531a301b3df568dccb3c931d52a469106
sudo reboot
$ vcgencmd version
Jan 20 2022 13:56:48 
Copyright (c) 2012 Broadcom
version bd88f66f8952d34e4e0613a85c7a6d3da49e13e2 (clean) (release) (start)
$ time sysbench --test=cpu --cpu-max-prime=20000 run

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   700.76

General statistics:
    total time:                          10.0012s
    total number of events:              7011

Latency (ms):
         min:                                    1.42
         avg:                                    1.43
         max:                                    3.23
         95th percentile:                        1.42
         sum:                                 9998.01

Threads fairness:
    events (avg/stddev):           7011.0000/0.00
    execution time (avg/stddev):   9.9980/0.00


real    0m10.041s
user    0m10.035s
sys     0m0.006s

sudo SKIP_KERNEL=1 SKIP_BACKUP=1 SKIP_SDK=1 rpi-update b63a7cb64dfe45ad1d0b3921529f7cbe6203ecdb
sudo reboot
$ vcgencmd version
Aug  3 2021 18:14:56 
Copyright (c) 2012 Broadcom
version 40787ee5905644f639a2a0f6e00ae12e517a2211 (clean) (release) (start)

$ time sysbench --test=cpu --cpu-max-prime=20000 run

WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   700.61

General statistics:
    total time:                          10.0004s
    total number of events:              7009

Latency (ms):
         min:                                    1.41
         avg:                                    1.43
         max:                                    4.28
         95th percentile:                        1.42
         sum:                                 9997.11

Threads fairness:
    events (avg/stddev):           7009.0000/0.00
    execution time (avg/stddev):   9.9971/0.00


real    0m10.041s
user    0m10.020s
sys     0m0.017s

@diederikdehaas
Copy link

@XECDesign what happens if you run cpupower frequency-info?

@HankB reported on Debian channels earlier this:

root@up:~# cpupower frequency-info
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
  CPUs which run at the same hardware frequency: Not Available
  CPUs which need to have their frequency coordinated by software: Not
Available
  maximum transition latency:  Cannot determine or is not supported.
Not Available
  available cpufreq governors: Not Available
  Unable to determine current policy
  current CPU frequency: Unable to call hardware
  current CPU frequency:  Unable to call to kernel

That's not good and seems to be the reason that the CPU operated at the lowest frequencies ... thus resulting in lower scores then one would otherwise get.

There was also a mention of #1619 which could be related (even though the last message indicates the problem has been fixed)?

@XECDesign
Copy link
Contributor

$ cpupower frequency-info
analyzing CPU 0:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 0 1 2 3
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 600 MHz - 1.80 GHz
  available frequency steps:  600 MHz, 700 MHz, 800 MHz, 900 MHz, 1000 MHz, 1.10 GHz, 1.20 GHz, 1.30 GHz, 1.40 GHz, 1.50 GHz, 1.60 GHz, 1.70 GHz, 1.80 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 600 MHz and 1.80 GHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 1.80 GHz (asserted by call to kernel)

@HankB
Copy link
Author

HankB commented Mar 15, 2022

I really appreciate the attention this has gotten. I don't expect the Raspberry Pi devs to go beyond confirming that the problem does not manifest with the Raspberry Pi OS. Nevertheless, I appreciate any further suggestions on how I can help to track this down. I have installations of R Pi OS (both 32 and 64 bit) that I can compare to as well as a Debian Bullseye install that does not demonstrate this problem. I have an extra SSD (or two) that I can use for testing so as not to risk the systems I use on a regular basis so nothing is off the table as far as experimentation goes. I have gotten further direction from Diederik de Haas and welcome any other thoughts.
Thanks again.

@pelwell pelwell reopened this Mar 15, 2022
@pelwell
Copy link
Contributor

pelwell commented Mar 15, 2022

We can leave this open for now.

@HankB
Copy link
Author

HankB commented Mar 15, 2022

I have determined that 1.20211118 is the last tag that does not exhibit the degraded performance. 1.20220118 (and 1.20220308) do,

@timg236
Copy link

timg236 commented Mar 16, 2022

It's probably worth disabling any over-clocking options and running using "vcgencmd get_throttled" to see if it's hitting thermal issues https://www.raspberrypi.com/documentation/computers/os.html#get_throttled

Running the bcmstat tool https://github.com/MilhouseVH/bcmstat in the background might highlight some differences in system behaviour CPU / temp.

@diederikdehaas
Copy link

The RPi userland programs like vcgencmd are not available on Debian, so the first command will not work. A quick scan of bcmstat shows it's querying /proc/ and /sys/, which probably will work, but it's also using vcgencmd and I'm afraid that it won't cope with that very well.

It's worth noting that 'degraded performance' is an understatement. Before I knew what it entailed, my first guess was something like throttling too. But the (CPU) performance is WAY less then HALF of what it should be.
[I just saw Hank did other tests (too?) except only a CPU test which is what he did when he reported the issue]

What I asked @HankB to do is a procedure similar to git bisect, but then on the tagged releases between 1.20210805 (good) and 1.20220120 (bad). Then from the DL-ed archive belonging to that tag extract bootcode.bin, *.dat and *.elf and place those in /boot/firmware/ (Debian mounts the FAT32 partition there, not on /boot/), reboot and run cpupower frequency-info. If the output was as @XECDesign posted, then it would be good and it would be bad if it was similar to what I posted earlier (where it can't determine the available frequencies).

The reason I think/though #1619 could be related is because it appears to me that clock values are often used to multiply of divide with to set/determine speeds and if those values are 0 you get either 0 as result or an illegal operation which should get caught and could then also result in a 0 return value. But how clocks really work is 'above my pay grade', so it could be entirely unrelated.

@HankB
Copy link
Author

HankB commented Mar 16, 2022

Hi Tim, thanks for the suggestions. I discovered the situation when performing some benchmarks to establish baseline numbers before attempting overclocking. And in this case Debian Liunux still clocks the processor at 1.5GHz (when things are operating as intended.) When the problem manifests, the processor clock seems not to be available by any normal means and I suspect the processor is running at a default power up frequency.

I have the two Pi 4Bs used for testing installed in Flirc cases which cool the processor extremely well. It takes about 15 minutes of run time with a long running stress test just to get the processor to about 60°C, no where near the temperature that would result in throttling.

@timg236
Copy link

timg236 commented Mar 16, 2022

If the vcmailbox interface has stopped then it's possible that it's stuck at some frequency, although I'd expect errors in dmesg about rpi_firmware_transaction failures.

The vcmailbox rpi-userland program doesn't depend on VCHIQ so might be able to download just that binary and run that
https://github.com/raspberrypi/firmware/tree/master/hardfp/opt/vc/bin

This allows you to poke the MEASURE_CLOCK mailbox directly e.g.

# Measure arm freq - clock 3
root@raspberrypi:~# vcmailbox 0x00030047 8 8 3 0
0x00000020 0x80000000 0x00030047 0x00000008 0x80000008 0x00000003 0x23c50fc0 0x00000000 
# Measure core freq - clock 4
root@raspberrypi:~# vcmailbox 0x00030047 8 8 4 0
0x00000020 0x80000000 0x00030047 0x00000008 0x80000008 0x00000004 0x0fed6d50 0x00000000

The other thing to check is /proc/meminfo in-case the size of CMA carveouts has changed. On Raspberry Pi OS Bullseye the 2GB+ models have a larger CMA region in order to support mutter.

@HankB
Copy link
Author

HankB commented Mar 17, 2022

vcmailbox seems not to work on Debian. (I used the 64 bit version that ships with R-Pi OS as the one you linked was 32 bit and would not even run.)

root@kweli:~# ./vcmailbox 0x00030047 8 8 3
Can't open device file: /dev/vcio
Try creating a device file with: sudo mknod /dev/vcio c 100 0
root@kweli:~# mknod /dev/vcio c 100 0
root@kweli:~# ./vcmailbox 0x00030047 8 8 3
Can't open device file: /dev/vcio
Try creating a device file with: sudo mknod /dev/vcio c 100 0
root@kweli:~# 

64 bit R-Pi OS

hbarta@polana:~ $ cat /proc/meminfo|grep -i cma
CmaTotal:         262144 kB
CmaFree:             972 kB
hbarta@polana:~ $ uname -a
Linux polana 5.10.92-v8+ #1514 SMP PREEMPT Mon Jan 17 17:39:38 GMT 2022 aarch64 GNU/Linux
hbarta@polana:~ $ 

64 bit Debian Buster

hbarta@kweli:~$ cat /proc/meminfo|grep -i cma
CmaTotal:          65536 kB
CmaFree:           41568 kB
hbarta@kweli:~$ uname -a
Linux kweli 5.10.0-12-arm64 #1 SMP Debian 5.10.103-1 (2022-03-07) aarch64 GNU/Linux
hbarta@kweli:~$ 

64 bit Debian Bookworm (with working raspi-firmware)

hbarta@kweli-t:~$ cat /proc/meminfo |grep -i cma
CmaTotal:          65536 kB
CmaFree:           41568 kB
hbarta@kweli-t:~$

64 bit Debian Bookworm (with trouble raspi-firmware)

hbarta@kweli-t:~$ cat /proc/meminfo |grep -i cma
CmaTotal:          65536 kB
CmaFree:           61560 kB
hbarta@kweli-t:~$ 

I'm not familiar with the CMA region and have no idea if these numbers mean anything.

Thank you,

@popcornmix
Copy link
Contributor

Does running vcmailbox with sudo help?

@HankB
Copy link
Author

HankB commented Mar 21, 2022

In the example above I was running vcmailbox as root and it did not help.

@popcornmix
Copy link
Contributor

I'd guess that kernel wasn't built with CONFIG_BCM_VCIO=y.

@diederikdehaas
Copy link

No. I found config BCM_VCIO in drivers/char/broadcom/, but the whole broadcom directory doesn't seem to be available in the upstream kernel source. When it is, I can make a MR to enable it (as module) in the Debian kernel(s).
But first it must be present in Linus' upstream kernel source.

@popcornmix
Copy link
Contributor

popcornmix commented Mar 21, 2022

I've installed debian and manually updated firmware to latest and after a reboot I see:

root@rpi4-20220121:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000

I was expecting that to fail (i.e. return nothing or a single frequency)

You can download latest firmware here:
start4.elf
fixup4.dat

replace the ones in /boot/firmware (or first partition if updating on another machine).
Does that work for you?

Note: most debugging tools don't work with the upstream kernel. The vcdbg from here does.
vcdbg version can be used to confirm the build date (and other info) of the running firmware.

@pelwell
Copy link
Contributor

pelwell commented Mar 21, 2022

To explain your vcmailbox failure, the upstream kernels are lacking vcio, which is the userspace ioctl shim to the mailbox interface. That shouldn't affect the use by kernel drivers such as the clock driver, but it does prevent you from prodding the VPU for diagnostic purposes.

@popcornmix
Copy link
Contributor

Ah - I just tried this version:
vcos_build_date = Jan 20 2022

which looks to have the failure reported:

root@rpi4-20220121:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
cat: /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies: No such file or directory

Possibly it is an issue with Jan 20 firmware that is not present with latest version.

@diederikdehaas
Copy link

diederikdehaas commented Mar 21, 2022

What is confusing to me is that @HankB tested the 1.20220308 release and concluded it to be bad.

Hank, could you test every tagged version between 1.20211118 and 1.20220308 and post the results in a table as a reply to this issue? Like you did earlier, but do make the dates sequential now. And then also try it from master.

@popcornmix
Copy link
Contributor

I've bisected on our internal firmware tree and identified the commit where it was fixed:
platform: Limit max clock-id to CLOCK_VEC for now by @timg236

That entered the firmware source tree on Jan 24 2022 and a firmware build was pushed to master branch of raspberrypi/firmware on the same day. It's not currently on the stable tree.

@pelwell
Copy link
Contributor

pelwell commented Mar 22, 2022

This is effectively a duplicate of #1688.

@HankB
Copy link
Author

HankB commented Mar 22, 2022

I've repeated testing firmware including some not previously tested and also the "latest" start4.elf,fixup4.dat which was installed on top of the firmware manually copied from the test procedure (1.20220308) on top of the firmware that installs with the latest bookworm upgrade (.

version result
1.20211118 good
1.20220118 bad
1.20220120 bad
1.20220308_buster good
1.20220308 bad
latest start4.elf,fixup4.dat boot failure (*)

(*) Starts to boot, screen goes blank (which is normal I think) and then looks like like random data written to screen buffer.

I repeated the test with the "latest" on top of the 1.20211118 with the same result. (I'm not sure at this point what firmware is original on the bookworm image as the install which I did not upgrade no longer boots.)

(I made a point of not perusing my previous results to avoid influencing this test.)

@popcornmix
Copy link
Contributor

What is the difference between "1.20220308_buster" and "1.20220308" in terms of version of firmware used?

The latest was working for me. Is the failure repeatable? Can you still ssh in?
What was the configured resolution/framerate of display connected?

@HankB
Copy link
Author

HankB commented Mar 22, 2022

What is the difference between "1.20220308_buster" and "1.20220308" in terms of version of firmware used?

I do not know.

Is the failure repeatable? Can you still ssh in?

100% repeatable for any of the tests I repeated. I usually SSH in for these tests.

What was the configured resolution/framerate of display connected?

Some of these tests were performed in a system w/out a GUI. (That system no longer boots.) Some were performed with Gnome desktop installed, Resolution is 1600x1200. Gnome display settings do not show the refresh rate. The specs I found were for vertical and horizontal refresh rate that were useful when we needed to configure modelines. (It is an older LCD monitor.) It is a DVI display connected via DVI <-> HDMI adapter.

@popcornmix
Copy link
Contributor

What is the difference between "1.20220308_buster" and "1.20220308" in terms of version of firmware used?
I do not know.

You can run ./vcdbg version (vcdbg link provided earlier) to see the version of the firmware. That will confirm if they are the same or not.

Is the failure repeatable? Can you still ssh in?

Check dmesg output for working vs non-working. They may be errors that give a clue.

@HankB
Copy link
Author

HankB commented Mar 23, 2022

Here is what I captured for several versions (including as installed) along with captures of dmesg output following boot.

./vcdbg version for system as installed (fresh install.) dmesg

root@down:~# ./vcdbg version
vcos_build_version     = 40787ee5905644f639a2a0f6e00ae12e517a2211 (clean)
vcos_build_branch      = bcm2711_2
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Aug  3 2021
vcos_build_time        = 18:14:56
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~#

firmware version 1.20220120 (earliest trouble firmware) dmesg

root@down:~# ./vcdbg version
vcos_build_version     = bd88f66f8952d34e4e0613a85c7a6d3da49e13e2 (clean)
vcos_build_branch      = bcm2711_2
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Jan 20 2022
vcos_build_time        = 13:56:48
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~# 

firmware version 1.20220308_buster dmesg

root@down:~# ./vcdbg version
vcos_build_version     = 71bd3109023a0c8575585ba87cbb374d2eeb038f (clean)
vcos_build_branch      = stable_20211201
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Dec  1 2021
vcos_build_time        = 15:01:54
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~#

firmware version 1.20220308 dmesg

root@down:~# ./vcdbg version
vcos_build_version     = bd88f66f8952d34e4e0613a85c7a6d3da49e13e2 (clean)
vcos_build_branch      = bcm2711_2
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Jan 20 2022
vcos_build_time        = 13:56:48
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~# 

In the last dmesg output (1.20220308) I find the following which is not visible in the version 1.20220308_buster.

[   13.326763] brcmfmac mmc0:0001:1: firmware: direct-loading firmware brcm/brcmfmac43455-sdio.raspberrypi,4-model-b.txt
[   13.435968] cpu cpu0: Cannot get clock for CPU0
[   13.446770] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
[   13.455099] random: crng init done

@HankB
Copy link
Author

HankB commented Mar 23, 2022

In additio0n, I have been using lscpu to determine if the problem exists. Output from that is

1.20220308_buster

root@down:~# lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1500.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@down:~# 

1.20220308

root@down:~# lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@down:~# 

The only difference is

hbarta@olive:~/Downloads/dmesg$ diff lscpu*
14a15,16
>     CPU max MHz:         1500.0000
>     CPU min MHz:         600.0000
hbarta@olive:~/Downloads/dmesg$

@popcornmix
Copy link
Contributor

I've said the bug was fixed on Jan 24 2022. All the firmware's you reported were from before the date of the fix.
Can you test with a firmware after that date (e.g. the link to latest I provided) and confirm the date of ./vcdbg version is as expected.

@HankB
Copy link
Author

HankB commented Mar 23, 2022

With the latest you provided (#1705 (comment)) The system does not complete booting so no further information can be collected. Is there a particular version of the firmware I should install this on top of? Should I be grabbing something from the repo that is newer than the tagged versions?

Edit: I downloaded the latest from master and installed the files from .../boot with the following result

root@down:~# lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1500.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@down:~# ./vcdbg version
vcos_build_version     = 4f30b902e2ef1b58069a48c2351169491e66ac7a (clean)
vcos_build_branch      = bcm2711_2
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Mar 22 2022
vcos_build_time        = 16:57:42
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~# 

This looks encouraging.

BTW, this most recent testing (since the collection of vcdbg and dmesg output) is with the kernel

Linux down 5.15.0-2-arm64 #1 SMP Debian 5.15.5-2 (2021-12-18) aarch64 GNU/Linux

Do you still want me to test with start4.elf and fixup4.dat that you linked?

@popcornmix
Copy link
Contributor

Just tested again with my linked version:

root@rpi4-20220121:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 
root@rpi4-20220121:~# ./vcdbg version
vcos_build_version     = 4f30b902e2ef1b58069a48c2351169491e66ac7a (clean)
vcos_build_branch      = bcm2711_2
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Mar 22 2022
vcos_build_time        = 16:57:42
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?

All seems happy for me, and looks identical to the version you are running.
I'm currently also on:

root@rpi4-20220121:~# uname -a
Linux rpi4-20220121 5.15.0-2-arm64 #1 SMP Debian 5.15.5-2 (2021-12-18) aarch64 GNU/Linux

@diederikdehaas
Copy link

This part of Hank's output stood out for me:

firmware version 1.20220308

root@down:~# ./vcdbg version
vcos_build_version     = 71bd3109023a0c8575585ba87cbb374d2eeb038f (clean)
vcos_build_branch      = stable_20211201
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_application'
vcos_build_application = ?
vcos_build_date        = Dec  1 2021
vcos_build_time        = 15:01:54
vcos_build_user        = dom
vcos_build_hostname    = buildbot
vcos_build_platform    = raspberrypi_linux
debug_sym: ReadVideoCoreMemoryBySymbol: Symbol not found: 'vcos_build_type'
vcos_build_type        = ?
root@down:~#

The firmware version tag would suggest it's based on the code from 2022-03-08, while the vcdbg output (strongly) hints at 2021-12-01.
OTOH, 1.20220120 does seem to be firmware based on code from 2022-01-20 ... so it's actually newer then 1.20220308 (but still problematic).

@HankB
Copy link
Author

HankB commented Mar 29, 2022

What's next for this? The end goal on my part is to see working firmware in the Debian repos for Bookworm on the 4B. Is there anything I can do to help this along? Should I continue to test the master branch? Wait for a tagged release?

Anything else I can do to help?

Thanks!

@popcornmix
Copy link
Contributor

The stable branch of raspberrypi/firmware was updated yesterday (in preparation for an update to RPiOS).
I suspect it will get a tag when apt is updated.

Assuming you are happy with that firmware then I'd suggest updating your package to that.

@gwolf
Copy link

gwolf commented Mar 29, 2022 via email

@gwolf
Copy link

gwolf commented Mar 29, 2022 via email

@HankB
Copy link
Author

HankB commented Mar 30, 2022

I've tested the latest code just downloaded from master (995e9f0) and it looks good as far as this issue goes.

@gwolf
Copy link

gwolf commented Mar 30, 2022

HankB dijo [Tue, Mar 29, 2022 at 05:54:16PM -0700]:

I've tested the latest code just downloaded from master
(995e9f0) and it looks good as far
as this issue goes.

Hmmm... Odd -- I tested the same files in my RPi p400, and I still see
the buggy behavior :-(

In any case, those are the files I downloaded, please tell me if
something does not match your end:

    $ for i in fixup{4cd,4,4db,4x,_cd,_db,_x,}.dat bootcode.bin ; do wget                                                                                   
+https://github.com/raspberrypi/firmware/blob/995e9f03d17ab09a8821feb85cee368f29d80f89/boot/$i?raw=true     -O $i;done                                      
    (...)                                                                                                                                                   
    $ sha256sum fixup* bootcode.bin                                                                                                                         
    22db24c621c326d907c7b8c5975b1730e6ce78dea680bec3958d907093031638  fixup4cd.dat                                                                          
    7d28775bff4781bda065f37fbb64c88ed1b56d1f4af79d85f792032f55bb6de7  fixup4.dat                                                                            
    b0d299dd46ddecd2c4eeeca61f42d114a0c464dcd6165861a662d118daa3afc0  fixup4db.dat                                                                          
    495076ed0488703ba59bd0d43bc577ad1695470379263854197a752cd1989330  fixup4x.dat                                                                           
    22db24c621c326d907c7b8c5975b1730e6ce78dea680bec3958d907093031638  fixup_cd.dat                                                                          
    8e90c8a379f3f99ee59370c50e48853db82669dbd6515d4ce08a24307d3dc4c5  fixup.dat                                                                             
    4f5d2433956f64cec640ae91dc47bdea3aa2c6c7cc579faafb779826bd7e554c  fixup_db.dat                                                                          
    82bbf5a3f86f73a41e9f8975deb75e50ab4d7581500a43ec7f0fcc42214948dd  fixup_x.dat                                                                           
    69309823da13dc96b89e3d82b44f820e4f84efa79d207adad2c8784559794f03                                                                                        
    bootcode.bin                                                                                                                                            

I guess this is related, and I have not seen it commented in this
issue: when I try to start up sway (Wayland compositor), I get:

00:00:10.038 [wlr] [backend/backend.c:371] Failed to open any DRM device                                                                                
00:00:10.056 [sway/server.c:53] Unable to create backend                                                                                                

@popcornmix
Copy link
Contributor

@gwolf the start* files are more significant than the fixup* files.
Report output of ./vcdbg version to confirm you are running updated firmware.

@HankB
Copy link
Author

HankB commented Mar 30, 2022 via email

@gwolf
Copy link

gwolf commented Mar 30, 2022

Right! I was missing *.elf. Having downloaded those as well, I can confirm everything works fine again.
Thanks!

@popcornmix
Copy link
Contributor

Stable firmware is now in apt for rpios (and firmware repo has 1.20220328 tag).

Once the debian bookworm firmware package is updated this issue should be resolved.

@HankB
Copy link
Author

HankB commented Apr 3, 2022

I have tested the tagged firmware 1.20220328 and the raspi-firmware package in Debian unstable and found it to have resolved the reported problem.
My testing details at https://hankb.github.io/MkDocs-blog/test-debug/Pi-4B-firmware/

@pelwell pelwell closed this as completed Apr 4, 2022
@gwolf
Copy link

gwolf commented Oct 11, 2022 via email

@HankB
Copy link
Author

HankB commented Oct 11, 2022

Hi gwolf,
Short answer - I could not duplicate.

My test procedure (this morning.)

  • Pi 4B/4GB (I do not have access to a Pi 400)
  • Install latest image to SD card and boot. (20220808_raspi_4_bookworm.img.xz from https://raspi.debian.net/tested-images/
  • Check available frequencies - All seem to be there.
  • update/upgrade 114 packages which seem not to include raspi-firmware and reboot.
  • Recheck frequencies
root@rpi4-20220808:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 
root@rpi4-20220808:~# lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU(s) scaling MHz:  67%
    CPU max MHz:         1500.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected
root@rpi4-20220808:~#
  • apt install sway (93 packages) and reboot.
  • Add regular user and login.
  • Run sway from CLI. It appears to start but I am unfamiliar with it.
  • Log in to text console and verify that sway is running and it is. Not sure what else to check for that. I was unable to find similar errors in dmesg output.

Checking the files I get

root@rpi4-20220808:~# cd /boot/firmware
root@rpi4-20220808:/boot/firmware# sha256sum fixup* bootcode.bin
8e90c8a379f3f99ee59370c50e48853db82669dbd6515d4ce08a24307d3dc4c5  fixup.dat
7d28775bff4781bda065f37fbb64c88ed1b56d1f4af79d85f792032f55bb6de7  fixup4.dat
22db24c621c326d907c7b8c5975b1730e6ce78dea680bec3958d907093031638  fixup4cd.dat
b0d299dd46ddecd2c4eeeca61f42d114a0c464dcd6165861a662d118daa3afc0  fixup4db.dat
495076ed0488703ba59bd0d43bc577ad1695470379263854197a752cd1989330  fixup4x.dat
22db24c621c326d907c7b8c5975b1730e6ce78dea680bec3958d907093031638  fixup_cd.dat
4f5d2433956f64cec640ae91dc47bdea3aa2c6c7cc579faafb779826bd7e554c  fixup_db.dat
82bbf5a3f86f73a41e9f8975deb75e50ab4d7581500a43ec7f0fcc42214948dd  fixup_x.dat
69309823da13dc96b89e3d82b44f820e4f84efa79d207adad2c8784559794f03  bootcode.bin
root@rpi4-20220808:/boot/firmware#

I copied the files from /boot/firmware/ to /tmp/firmware and repeated your wget and compared the sha256sum values and got results matching yours and what is installed. In other words, the firmware you are running seems to match what is in the Pi 4 bookworm image.

Perhaps related: I see that the Pi 400 image on the download page looks like the same image that was previously there for the Pi 4B (same filename: 20220121_raspi_4_bookworm.img.xz.)

Is there anything else I can check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants