-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix CPU hang when using CEC while HDMI output is disabled #4418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CPU hang when using CEC while HDMI output is disabled #4418
Conversation
In order to access the HDMI controller, we need to make sure the HSM clock is enabled. If we were to access it with the clock disabled, the CPU would completely hang, resulting in an hard crash. Since we have different code path that would require it, let's move that clock enable / disable to runtime_pm that will take care of the reference counting for us. Fixes: 4f6e3d6 ("drm/vc4: Add runtime PM support to the HDMI encoder driver") Signed-off-by: Maxime Ripard <[email protected]> Reviewed-by: Dave Stevenson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
That all looks sensible. Can anyone confirm that it fixes the problem, or at least that it doesn't make anything worse? |
I can confirm what @henkv1 reported. This PR breaks boot with kms on Pi3+. |
Lazy question (I could probably find the answer if I looked hard enough), but is it guaranteed that the resume method is called at the start of day? In other words, is the clock starting out off when it should start out on? |
Is it always occurring for you? I had it a couple of times when I started to debug it earlier, but it never occurred if the driver was built-in, and as soon as I started adding a bit of tracing it went away. My first guess was that, if it's compiled as a module, we would have the HSM clock being disabled by the clock framework at the end of the kernel boot, and then we would access a register without the clock enabled. However, since it seems to be timing sensitive, I'm wondering if it's not a race somewhere ?
it will be called when we call pm_runtime_get or pm_runtime_resume_and_get if the refcount was 0. If we're missing one of these calls, then yeah, the clock will be off (but the power domain will also be for example) |
As soon as I started testing with RaspiOS (as opposed to the buildroot system I was using earlier), I indeed get the issue all the time. |
The HPD GPIO retrieval code on failure will jump to the err_unprepare_hsm label that calls pm_runtime_disable. However at that point we haven't called pm_runtime_enable, so we end up with an unbalanced call. The next error than can occur (and therefore the next label) needs both pm_runtime_disable and drm_encoder_cleanup though, so let's rearrange the labels to match what we expect. Fixes: cd4cb49 ("drm/vc4: hdmi: Adjust HSM clock rate depending on pixel rate") Signed-off-by: Maxime Ripard <[email protected]>
In the bind hook, we actually need the device to have the HSM clock running during the final part of the display initialisation where we reset the controller and initialise the CEC component. Failing to do so will result in a complete, silent, hang of the CPU. Fixes: 411efa1 ("drm/vc4: hdmi: Move the HSM clock enable to runtime_pm") Signed-off-by: Maxime Ripard <[email protected]>
126af69
to
929b378
Compare
So, I'm not really sure why that particular commit made it bad because the issue has been there for a while, but we were having some registers access at bind time without the HSM clock running. I've fixed it up, it should fix the initial CEC issue now and I'm not having any hang anymore while building as a module |
That's interesting as when I first tested this PR and found the hang I started bisecting, and spent a couple of hours going down false paths. Eventually I restarted that testing and found "drm/vc4: hdmi: Move the HSM clock enable to runtime_pm" commit seemed bad and the preceding commit was at least sometimes good. But it doesn't surprise me that there's something racy there. |
In the vc4_hdmi_encoder_pre_crtc_configure() function error path we never actually call pm_runtime_put() even though pm_runtime_resume_and_get() is our very first call. Fixes: 4f6e3d6 ("drm/vc4: Add runtime PM support to the HDMI encoder driver") Signed-off-by: Maxime Ripard <[email protected]>
In order to ease further additions to the CEC enable and disable, let's split the function into two functions, one to enable and the other to disable. Signed-off-by: Maxime Ripard <[email protected]>
Similarly to what we encountered with the detect hook with DRM, nothing actually prevents any of the CEC callback from being run while the HDMI output is disabled. However, this is an issue since any register access to the controller when it's powered down will result in a silent hang. Let's make sure we run the runtime_pm hooks when the CEC adapter is opened and closed by the userspace to avoid that issue. Fixes: 15b4511 ("drm/vc4: add HDMI CEC support") Signed-off-by: Maxime Ripard <[email protected]>
929b378
to
f4ce2af
Compare
I've updated the branch according to @pelwell reviews |
return; | ||
} | ||
|
||
ret = clk_prepare_enable(vc4_hdmi->pixel_clock); | ||
if (ret) { | ||
DRM_ERROR("Failed to turn on pixel clock: %d\n", ret); | ||
pm_runtime_put(&vc4_hdmi->pdev->dev); | ||
return; | ||
} | ||
|
||
hsm_rate = vc4_hdmi->variant->calc_hsm_clock(vc4_hdmi, pixel_rate); | ||
vc4_hdmi->hsm_req = clk_request_start(vc4_hdmi->hsm_clock, hsm_rate); | ||
if (IS_ERR(vc4_hdmi->hsm_req)) { | ||
DRM_ERROR("Failed to set HSM clock rate: %ld\n", PTR_ERR(vc4_hdmi->hsm_req)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you not also missing a clk_disable_unprepare(vc4_hdmi->pixel_clock);
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with merging this now as a fix, and the clk_disable_unprepare can be added as a later improvement.
A quick test did work. |
…mplete See: raspberrypi/linux#4421 kernel: Fix CPU hang when using CEC while HDMI output is disabled See: raspberrypi/linux#4418 kernel: drm/vc4: hdmi: Add missing clk_disable_unprepare on error path See: raspberrypi/linux#4426 kernel: staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes See: raspberrypi/linux#4419 kernel: media: i2c: imx477: Add support for imx378 as a compatible sensor See: raspberrypi/linux#4420
…mplete See: raspberrypi/linux#4421 kernel: Fix CPU hang when using CEC while HDMI output is disabled See: raspberrypi/linux#4418 kernel: drm/vc4: hdmi: Add missing clk_disable_unprepare on error path See: raspberrypi/linux#4426 kernel: staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes See: raspberrypi/linux#4419 kernel: media: i2c: imx477: Add support for imx378 as a compatible sensor See: raspberrypi/linux#4420
It did not work for me. I compiled ecdd08f. But as soon as I modprobe vc4, the system crashes. |
Hi,
Here's a PR to fix an issue similar to 755b2c8 but with CEC. It's doing further cleanups and drive-by patches and finally using the same solution we used.
Fixes #4319