-
Notifications
You must be signed in to change notification settings - Fork 5.2k
drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12 #6716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+303
−106
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since the initial commit 57692c9 ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+") the struct v3d_dev reserved a pointer for an optional V3D clock. But there wasn't any code, which fetched it. So add the missing clock handling before accessing any V3D registers. Signed-off-by: Stefan Wahren <[email protected]> Reviewed-by: Maíra Canal <[email protected]> Signed-off-by: Maíra Canal <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The V3D driver still relies on `drm_sched_increase_karma()` and `drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs. The function `drm_sched_increase_karma()` marks the job as guilty, while `drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of that guilty job. Because of this, we must check whether the job’s DMA fence has been flagged with an error before executing the job. Otherwise, the same guilty job may be resubmitted indefinitely, causing repeated GPU resets. This patch adds a check for an error on the job's fence to prevent running a guilty job that was previously flagged when the GPU timed out. Note that the CPU and CACHE_CLEAN queues do not require this check, as their jobs are executed synchronously once the DRM scheduler starts them. Cc: [email protected] Fixes: d223f98 ("drm/v3d: Add support for compute shader dispatch.") Fixes: 1584f16 ("drm/v3d: Add support for submitting jobs to the TFU.") Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
Similar to commit e4b5ccd ("drm/v3d: Ensure job pointer is set to NULL after job completion"), ensure the job pointer is set to `NULL` when a job's fence has an error. Failing to do so can trigger kernel warnings in specific scenarios, such as: 1. v3d_csd_job_run() assigns `v3d->csd_job = job` 2. CSD job exceeds hang limit, causing a timeout → v3d_gpu_reset_for_timeout() 3. GPU reset 4. drm_sched_resubmit_jobs() sets the job's fence to `-ECANCELED`. 5. v3d_csd_job_run() detects the fence error and returns NULL, not submitting the job to the GPU 6. User-space runs `modprobe -r v3d` 7. v3d_gem_destroy() v3d_gem_destroy() triggers a warning indicating that the CSD job never ended, as we didn't set `v3d->csd_job` to NULL after the timeout. The same can also happen to BIN, RENDER, and TFU jobs. Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
The V3D driver currently determines the GPU tech version (33, 41...) by reading a register. This approach has worked so far since this information wasn’t needed before powering on the GPU. V3D 7.1 introduces new registers that must be written to power on the GPU, requiring us to know the V3D version beforehand. To address this, associate each supported SoC with the corresponding VideoCore GPU version as part of the device data. To prevent possible mistakes, add an assertion to verify that the version specified in the device data matches the one reported by the hardware. If there is a mismatch, the kernel will trigger a warning. Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
In order to enforce per-SoC register rules, add per-compatible restrictions. V3D 3.3 (represented by brcm,7268-v3d) has a cache controller (GCA), which is not present in other V3D generations. Declaring these differences helps ensure the DTB accurately reflect the hardware design. While not ideal, this commit keeps the register order flexible for brcm,7268-v3d with the goal to keep the ABI backwards compatible. Signed-off-by: Maíra Canal <[email protected]>
V3D 7.1 exposes a new register block, called V3D_SMS. As BCM2712 has a V3D 7.1 core, add a new register item to its compatible. Similar to the GCA, which is specific for V3D 3.3, SMS is optional and should only be added for V3D 7.1 variants (such as brcm,2712-v3d). Signed-off-by: Maíra Canal <[email protected]>
In addition to the standard reset controller, V3D 7.x requires configuring the V3D_SMS registers for proper power on/off and reset. Add the new registers to `v3d_regs.h` and ensure they are properly configured during device probing, removal, and reset. This change fixes GPU reset issues on the Raspberry Pi 5 (BCM2712). Without exposing these registers, a GPU reset causes the GPU to hang, stopping any further job execution and freezing the desktop GUI. The same issue occurs when unloading and loading the v3d driver. Link: raspberrypi#6660 Reviewed-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
As established in commit 89d0499 ("MAINTAINERS: Drop Emma Anholt from all M lines."), Emma is no longer active in the Linux kernel and dropped the V3D maintainership. Therefore, remove Emma as one of the DT maintainers and add the current V3D driver maintainer. Acked-by: Emma Anholt <[email protected]> Acked-by: Rob Herring (Arm) <[email protected]> Signed-off-by: Maíra Canal <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
popcornmix
added a commit
to raspberrypi/firmware
that referenced
this pull request
Mar 16, 2025
kernel: drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12 See: raspberrypi/linux#6716
popcornmix
added a commit
to raspberrypi/rpi-firmware
that referenced
this pull request
Mar 16, 2025
kernel: drm/v3d: Fix GPU reset issues on the Raspberry Pi 5 for 6.12 See: raspberrypi/linux#6716
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rebase of #6715
Tested with #6660 (which it resolves).