Skip to content

Chromium (hardware accelerated) video playing performance improvement and regression #5475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qtmrcdmc opened this issue May 19, 2023 · 9 comments

Comments

@qtmrcdmc
Copy link

Describe the bug

I noticed that the work done for mapping CPU physical addresses and DMA addresses (included in rpi-update e5f7c2648572b7acbc4fbc0e654281ea2d2e94bb) has improved h264 video playing performance (fps) through chromium browser using hardware acceleration. Thank you for that!

This is the setup:

Chromium-browser 113.0.5672.95-rpt1 on raspios bullseye 64 bit 6.1.28.
Raspberry CM4 (4GB RAM)

From rpi-update 38d69e35292e129700ef50443c3ecc37e4124d91, setting a 1GB ZONE_DMA limit (raspberrypi/linux commit e158dcb), there is otherwise a regression on video playing performance (fps) with same previous setup.

Steps to reproduce the behaviour

Play a local h264 video file 1080p 60fps with fullscreen chromium (just on HDMI1 FHD 1920x1080@60Hz, with HDMI2 off) .

This the string to launch chromium-browser:

chromium-browser --ignore-gpu-blocklist --use-gl=egl --enable-gpu-rasterization --enable-accelerated-video-decode --enable-features=VaapiVideoDecoder --enable-zero-copy --start-fullscreen

Device (s)

Raspberry Pi CM4

System

IMPROVEMENT:

pi@raspberrypi:~ $ cat /etc/rpi-issue
Raspberry Pi reference 2023-05-03
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 7c750947a959fb626a70c09fd17c65815df192ac, stage4

pi@raspberrypi:~ $ vcgencmd version
Apr 25 2023 18:26:03
Copyright (c) 2012 Broadcom
version d7f9c2b4ef7e4a8c0b04374a879ce89d7a948453 (clean) (release) (start)

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 6.1.28-v8+ #1649 SMP PREEMPT Fri May 12 14:25:37 BST 2023 aarch64 GNU/Linux

REGRESSION:

pi@raspberrypi:~ $ cat /etc/rpi-issue
Raspberry Pi reference 2023-05-03
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 7c750947a959fb626a70c09fd17c65815df192ac, stage4

pi@raspberrypi:~ $ vcgencmd version
Apr 25 2023 18:26:03
Copyright (c) 2012 Broadcom
version d7f9c2b4ef7e4a8c0b04374a879ce89d7a948453 (clean) (release) (start)

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 6.1.28-v8+ #1651 SMP PREEMPT Wed May 17 14:34:39 BST 2023 aarch64 GNU/Linux

Logs

No response

Additional context

No response

@popcornmix
Copy link
Collaborator

Can you describe how you measured the improvement, and then the degredation?
Is it just visual or are you looking at stats reported by the browser?
If there are stats, what are the numbers in each case?

@pelwell
Copy link
Contributor

pelwell commented May 19, 2023

With your command line on a Pi 400 running the latest kernel, I'm seeing no unexpected swiotlb activity (the obvious cause of slowdown when changing DMA masks).

  1. Which vc4-xxx-v3d overlay are you using?
  2. Do you get any diagnostic messages on stdout? I see:
    pi@raspberrypi:~$ DISPLAY=:0.0 chromium-browser --ignore-gpu-blocklist --use-gl=egl --enable-gpu-rasterization --enable-accelerated-video-decode --enable-features=VaapiVideoDecoder --enable-zero-copy --start-fullscreen Downloads/legend.mp4
    [3638:3638:0519/152327.181500:ERROR:gpu_init.cc(525)] Passthrough is not supported, GL is egl, ANGLE is
    

@qtmrcdmc
Copy link
Author

Can you describe how you measured the improvement, and then the degredation? Is it just visual or are you looking at stats reported by the browser? If there are stats, what are the numbers in each case?

I evaluated it visually, measuring with a stopwatch the time required by chromium to play fullscreen (full hd) the first 16 seconds of h264 1080p 60fps big_buck_bunny video file , locally stored on emmc. During the measurement mouse must be kept disconnected to prevent it from altering normal playback.

In the first case (improvement) chromium takes around 18 seconds to complete the first 16 seconds.

In the second case (degradation) chromium takes around 21 seconds, so 3 seconds more.

With stable kernel 6.1.19 (rpi-update fa51258e0239eaf68d9dff9c156cec3a622fbacc) it also takes around 21 seconds.

@qtmrcdmc
Copy link
Author

Here is the video file I have used https://we.tl/t-0GTGqQ4yXr

@qtmrcdmc
Copy link
Author

With your command line on a Pi 400 running the latest kernel, I'm seeing no unexpected swiotlb activity (the obvious cause of slowdown when changing DMA masks).

  1. Which vc4-xxx-v3d overlay are you using?
  2. Do you get any diagnostic messages on stdout? I see:
    pi@raspberrypi:~$ DISPLAY=:0.0 chromium-browser --ignore-gpu-blocklist --use-gl=egl --enable-gpu-rasterization --enable-accelerated-video-decode --enable-features=VaapiVideoDecoder --enable-zero-copy --start-fullscreen Downloads/legend.mp4
    [3638:3638:0519/152327.181500:ERROR:gpu_init.cc(525)] Passthrough is not supported, GL is egl, ANGLE is
    

I'm using vc4-kms-v3d overlay.

These are my chromium messages (when is loaded rpi-update e5f7c2648572b7acbc4fbc0e654281ea2d2e94bb):

[2830:2830:0519/164347.591162:ERROR:chrome_browser_cloud_management_controller.cc(162)] Cloud management controller initialization aborted as CBCM is not enabled.
MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to retrieve device information
[2897:3068:0519/164349.524690:ERROR:v4l2_video_decode_accelerator.cc(422)] Failed to request buffers!

@pelwell
Copy link
Contributor

pelwell commented May 19, 2023

Without the "Set a 1GB ZONE_DMA limit" commit, i.e. what you get after an rpi-update to raspberrypi/rpi-firmware@8cad204, I get a kernel warning and an error from the codec:

[   84.978658] ------------[ cut here ]------------
[   84.978687] bcm2835-dma fe007000.dma: DMA addr 0xffffffffffffffff+968 overflow (mask ffffffff, bus limit ffffffff).
[   84.978734] WARNING: CPU: 0 PID: 595 at kernel/dma/direct.h:103 dma_direct_map_sg+0x29c/0x2b8
...
[  165.214471] bcm2835-codec bcm2835-codec: dma alloc of size 4096 failed

@qtmrcdmc
Copy link
Author

qtmrcdmc commented May 19, 2023

Without the "Set a 1GB ZONE_DMA limit" commit, i.e. what you get after an rpi-update to raspberrypi/rpi-firmware@8cad204, I get a kernel warning and an error from the codec:

[   84.978658] ------------[ cut here ]------------
[   84.978687] bcm2835-dma fe007000.dma: DMA addr 0xffffffffffffffff+968 overflow (mask ffffffff, bus limit ffffffff).
[   84.978734] WARNING: CPU: 0 PID: 595 at kernel/dma/direct.h:103 dma_direct_map_sg+0x29c/0x2b8
...
[  165.214471] bcm2835-codec bcm2835-codec: dma alloc of size 4096 failed

Yes I get the same message playing video on Chromium, anyway after that commit the video performance doesn't degradate

[ 105.688083] bcm2835-codec bcm2835-codec: dma alloc of size 4096 failed

These all DMA messages:

[ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000] DMA [mem 0x0000000000000000-0x00000000fbffffff]
[ 0.000000] DMA32 empty
[ 0.000000] On node 0, zone DMA: 16384 pages in unavailable ranges
[ 0.024565] DMA: preallocated 1024 KiB GFP_KERNEL pool for atomic allocations
[ 0.024706] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 0.025119] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[ 0.074604] bcm2835-dma fe007000.dma: DMA legacy API manager, dmachans=0x1
[ 1.017266] mmc-bcm2835 fe300000.mmcnr: DMA channel allocated
[ 1.083066] mmc0: SDHCI controller on fe340000.mmc [fe340000.mmc] using ADMA
[ 3.347123] uart-pl011 fe201000.serial: no DMA platform data
[ 105.688083] bcm2835-codec bcm2835-codec: dma alloc of size 4096 failed

@popcornmix
Copy link
Collaborator

I suspect that
[ 105.688083] bcm2835-codec bcm2835-codec: dma alloc of size 4096 failed
and the related:
[2897:3068:0519/164349.524690:ERROR:v4l2_video_decode_accelerator.cc(422)] Failed to request buffers!

results in the hardware codec not working and falling back to software decode.

I think you are saying that performance is better with software than hardware.
I suspect you can prove this by using latest kernel and disabling software decode in chromium.

Assuming this is the case then this is better reported here.

@qtmrcdmc
Copy link
Author

Yes you're right, it was falling back to software decode I see the CPU usage. So there is no issue, and also for chromium browser, because with the last rpi firmware update ( 38d69e35292e129700ef50443c3ecc37e4124d91 ), hardware decoding on Chromium browser works again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants