Skip to content

Fix CPU hang when using CEC while HDMI output is disabled #4418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mripard
Copy link
Contributor

@mripard mripard commented Jun 29, 2021

Hi,

Here's a PR to fix an issue similar to 755b2c8 but with CEC. It's doing further cleanups and drive-by patches and finally using the same solution we used.

Fixes #4319

In order to access the HDMI controller, we need to make sure the HSM
clock is enabled. If we were to access it with the clock disabled, the
CPU would completely hang, resulting in an hard crash.

Since we have different code path that would require it, let's move that
clock enable / disable to runtime_pm that will take care of the
reference counting for us.

Fixes: 4f6e3d6 ("drm/vc4: Add runtime PM support to the HDMI encoder driver")
Signed-off-by: Maxime Ripard <[email protected]>
Reviewed-by: Dave Stevenson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
@pelwell
Copy link
Contributor

pelwell commented Jun 30, 2021

That all looks sensible. Can anyone confirm that it fixes the problem, or at least that it doesn't make anything worse?

@popcornmix
Copy link
Collaborator

I can confirm what @henkv1 reported. This PR breaks boot with kms on Pi3+.
Unfortunately it's a hard lock up, so no dmesg log (even with uart).
Screen just goes blank and uart output stops when we reach the point to switch to kms.
It breaks with "drm/vc4: hdmi: Move the HSM clock enable to runtime_pm" commit.

@pelwell
Copy link
Contributor

pelwell commented Jul 1, 2021

Lazy question (I could probably find the answer if I looked hard enough), but is it guaranteed that the resume method is called at the start of day? In other words, is the clock starting out off when it should start out on?

@mripard
Copy link
Contributor Author

mripard commented Jul 1, 2021

Is it always occurring for you? I had it a couple of times when I started to debug it earlier, but it never occurred if the driver was built-in, and as soon as I started adding a bit of tracing it went away.

My first guess was that, if it's compiled as a module, we would have the HSM clock being disabled by the clock framework at the end of the kernel boot, and then we would access a register without the clock enabled. However, since it seems to be timing sensitive, I'm wondering if it's not a race somewhere ?

Lazy question (I could probably find the answer if I looked hard enough), but is it guaranteed that the resume method is called at the start of day? In other words, is the clock starting out off when it should start out on?

it will be called when we call pm_runtime_get or pm_runtime_resume_and_get if the refcount was 0. If we're missing one of these calls, then yeah, the clock will be off (but the power domain will also be for example)

@mripard
Copy link
Contributor Author

mripard commented Jul 1, 2021

As soon as I started testing with RaspiOS (as opposed to the buildroot system I was using earlier), I indeed get the issue all the time.

mripard added 2 commits July 2, 2021 11:54
The HPD GPIO retrieval code on failure will jump to the
err_unprepare_hsm label that calls pm_runtime_disable.

However at that point we haven't called pm_runtime_enable, so we end up
with an unbalanced call.

The next error than can occur (and therefore the next label) needs both
pm_runtime_disable and drm_encoder_cleanup though, so let's rearrange
the labels to match what we expect.

Fixes: cd4cb49 ("drm/vc4: hdmi: Adjust HSM clock rate depending on pixel rate")
Signed-off-by: Maxime Ripard <[email protected]>
In the bind hook, we actually need the device to have the HSM clock
running during the final part of the display initialisation where we
reset the controller and initialise the CEC component.

Failing to do so will result in a complete, silent, hang of the CPU.

Fixes: 411efa1 ("drm/vc4: hdmi: Move the HSM clock enable to runtime_pm")
Signed-off-by: Maxime Ripard <[email protected]>
@mripard mripard force-pushed the rpi/5.10-fix-cec-while-disabled branch from 126af69 to 929b378 Compare July 2, 2021 10:11
@mripard
Copy link
Contributor Author

mripard commented Jul 2, 2021

So, I'm not really sure why that particular commit made it bad because the issue has been there for a while, but we were having some registers access at bind time without the HSM clock running. I've fixed it up, it should fix the initial CEC issue now and I'm not having any hang anymore while building as a module

@popcornmix
Copy link
Collaborator

So, I'm not really sure why that particular commit made it bad because the issue has been there for a while

That's interesting as when I first tested this PR and found the hang I started bisecting, and spent a couple of hours going down false paths. Eventually I restarted that testing and found "drm/vc4: hdmi: Move the HSM clock enable to runtime_pm" commit seemed bad and the preceding commit was at least sometimes good.

But it doesn't surprise me that there's something racy there.

mripard added 3 commits July 2, 2021 14:00
In the vc4_hdmi_encoder_pre_crtc_configure() function error path we
never actually call pm_runtime_put() even though
pm_runtime_resume_and_get() is our very first call.

Fixes: 4f6e3d6 ("drm/vc4: Add runtime PM support to the HDMI encoder driver")
Signed-off-by: Maxime Ripard <[email protected]>
In order to ease further additions to the CEC enable and disable, let's
split the function into two functions, one to enable and the other to
disable.

Signed-off-by: Maxime Ripard <[email protected]>
Similarly to what we encountered with the detect hook with DRM, nothing
actually prevents any of the CEC callback from being run while the HDMI
output is disabled.

However, this is an issue since any register access to the controller
when it's powered down will result in a silent hang.

Let's make sure we run the runtime_pm hooks when the CEC adapter is
opened and closed by the userspace to avoid that issue.

Fixes: 15b4511 ("drm/vc4: add HDMI CEC support")
Signed-off-by: Maxime Ripard <[email protected]>
@mripard mripard force-pushed the rpi/5.10-fix-cec-while-disabled branch from 929b378 to f4ce2af Compare July 2, 2021 12:00
@mripard
Copy link
Contributor Author

mripard commented Jul 2, 2021

I've updated the branch according to @pelwell reviews

return;
}

ret = clk_prepare_enable(vc4_hdmi->pixel_clock);
if (ret) {
DRM_ERROR("Failed to turn on pixel clock: %d\n", ret);
pm_runtime_put(&vc4_hdmi->pdev->dev);
return;
}

hsm_rate = vc4_hdmi->variant->calc_hsm_clock(vc4_hdmi, pixel_rate);
vc4_hdmi->hsm_req = clk_request_start(vc4_hdmi->hsm_clock, hsm_rate);
if (IS_ERR(vc4_hdmi->hsm_req)) {
DRM_ERROR("Failed to set HSM clock rate: %ld\n", PTR_ERR(vc4_hdmi->hsm_req));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not also missing a clk_disable_unprepare(vc4_hdmi->pixel_clock); here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with merging this now as a fix, and the clk_disable_unprepare can be added as a later improvement.

@popcornmix
Copy link
Collaborator

A quick test did work.
Booted okay on Pi3+. cec worked. Disabled /sys/class/drm/card0-HDMI-A-1/status and cec still worked. Enabled and still working.

@pelwell pelwell merged commit ecdd08f into raspberrypi:rpi-5.10.y Jul 2, 2021
popcornmix added a commit to raspberrypi/firmware that referenced this pull request Jul 2, 2021
…mplete

See: raspberrypi/linux#4421

kernel: Fix CPU hang when using CEC while HDMI output is disabled
See: raspberrypi/linux#4418

kernel: drm/vc4: hdmi: Add missing clk_disable_unprepare on error path
See: raspberrypi/linux#4426

kernel: staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes
See: raspberrypi/linux#4419

kernel: media: i2c: imx477: Add support for imx378 as a compatible sensor
See: raspberrypi/linux#4420
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this pull request Jul 2, 2021
…mplete

See: raspberrypi/linux#4421

kernel: Fix CPU hang when using CEC while HDMI output is disabled
See: raspberrypi/linux#4418

kernel: drm/vc4: hdmi: Add missing clk_disable_unprepare on error path
See: raspberrypi/linux#4426

kernel: staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes
See: raspberrypi/linux#4419

kernel: media: i2c: imx477: Add support for imx378 as a compatible sensor
See: raspberrypi/linux#4420
@henkv1
Copy link

henkv1 commented Jul 3, 2021

It did not work for me. I compiled ecdd08f. But as soon as I modprobe vc4, the system crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPi 2b & RPi 3b crash after a while with kernel 5.10.31+
4 participants