Skip to content

drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12 #6716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 12, 2025

Conversation

popcornmix
Copy link
Collaborator

Rebase of #6715

Tested with #6660 (which it resolves).

lategoodbye and others added 9 commits March 12, 2025 16:42
Since the initial commit 57692c9 ("drm/v3d: Introduce a new DRM driver
for Broadcom V3D V3.x+") the struct v3d_dev reserved a pointer for
an optional V3D clock. But there wasn't any code, which fetched it.
So add the missing clock handling before accessing any V3D registers.

Signed-off-by: Stefan Wahren <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The V3D driver still relies on `drm_sched_increase_karma()` and
`drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
The function `drm_sched_increase_karma()` marks the job as guilty, while
`drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
that guilty job.

Because of this, we must check whether the job’s DMA fence has been
flagged with an error before executing the job. Otherwise, the same guilty
job may be resubmitted indefinitely, causing repeated GPU resets.

This patch adds a check for an error on the job's fence to prevent running
a guilty job that was previously flagged when the GPU timed out.

Note that the CPU and CACHE_CLEAN queues do not require this check, as
their jobs are executed synchronously once the DRM scheduler starts them.

Cc: [email protected]
Fixes: d223f98 ("drm/v3d: Add support for compute shader dispatch.")
Fixes: 1584f16 ("drm/v3d: Add support for submitting jobs to the TFU.")
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Similar to commit e4b5ccd ("drm/v3d: Ensure job pointer is set to
NULL after job completion"), ensure the job pointer is set to `NULL` when
a job's fence has an error. Failing to do so can trigger kernel warnings
in specific scenarios, such as:

1. v3d_csd_job_run() assigns `v3d->csd_job = job`
2. CSD job exceeds hang limit, causing a timeout → v3d_gpu_reset_for_timeout()
3. GPU reset
4. drm_sched_resubmit_jobs() sets the job's fence to `-ECANCELED`.
5. v3d_csd_job_run() detects the fence error and returns NULL, not
   submitting the job to the GPU
6. User-space runs `modprobe -r v3d`
7. v3d_gem_destroy()

v3d_gem_destroy() triggers a warning indicating that the CSD job never
ended, as we didn't set `v3d->csd_job` to NULL after the timeout. The same
can also happen to BIN, RENDER, and TFU jobs.

Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
The V3D driver currently determines the GPU tech version (33, 41...)
by reading a register. This approach has worked so far since this
information wasn’t needed before powering on the GPU.

V3D 7.1 introduces new registers that must be written to power on the
GPU, requiring us to know the V3D version beforehand. To address this,
associate each supported SoC with the corresponding VideoCore GPU version
as part of the device data.

To prevent possible mistakes, add an assertion to verify that the version
specified in the device data matches the one reported by the hardware.
If there is a mismatch, the kernel will trigger a warning.

Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
In order to enforce per-SoC register rules, add per-compatible
restrictions. V3D 3.3 (represented by brcm,7268-v3d) has a cache
controller (GCA), which is not present in other V3D generations.
Declaring these differences helps ensure the DTB accurately reflect
the hardware design.

While not ideal, this commit keeps the register order flexible for
brcm,7268-v3d with the goal to keep the ABI backwards compatible.

Signed-off-by: Maíra Canal <[email protected]>
V3D 7.1 exposes a new register block, called V3D_SMS. As BCM2712 has a
V3D 7.1 core, add a new register item to its compatible. Similar to the
GCA, which is specific for V3D 3.3, SMS is optional and should only be
added for V3D 7.1 variants (such as brcm,2712-v3d).

Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring
the V3D_SMS registers for proper power on/off and reset. Add the new
registers to `v3d_regs.h` and ensure they are properly configured during
device probing, removal, and reset.

This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712).
Without exposing these registers, a GPU reset causes the GPU to hang,
stopping any further job execution and freezing the desktop GUI. The same
issue occurs when unloading and loading the v3d driver.

Link: raspberrypi#6660
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
As established in commit 89d0499 ("MAINTAINERS: Drop Emma Anholt
from all M lines."), Emma is no longer active in the Linux kernel and
dropped the V3D maintainership. Therefore, remove Emma as one of the DT
maintainers and add the current V3D driver maintainer.

Acked-by: Emma Anholt <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
@pelwell pelwell merged commit 28d1e43 into raspberrypi:rpi-6.12.y Mar 12, 2025
11 of 12 checks passed
@popcornmix popcornmix deleted the v3d_hang12 branch March 12, 2025 17:32
popcornmix added a commit to raspberrypi/firmware that referenced this pull request Mar 16, 2025
kernel: drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12
See: raspberrypi/linux#6716
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Mar 16, 2025
kernel: drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12
See: raspberrypi/linux#6716
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants