Skip to content

drm/vc4: hvs: Defer dlist slots deallocation #5327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 23, 2023

Conversation

popcornmix
Copy link
Collaborator

During normal operations, the cursor position update is done through an asynchronous plane update, which on the vc4 driver basically just modifies the right dlist word to move the plane to the new coordinates.

However, when we have the overscan margins setup, we fall back to a regular commit when we are next to the edges. And since that commit happens to be on a cursor plane, it's considered a legacy cursor update by KMS.

The main difference it makes is that it won't wait for its completion (ie, next vblank) before returning. This means if we have multiple commits happening in rapid succession, we can have several of them happening before the next vblank.

In parallel, our dlist allocation is tied to a CRTC state, and each time we do a commit we end up with a new CRTC state, with the previous one being freed. This means that we free our previous dlist entry (but don't clear it though) every time a new one is being committed.

Now, if we were to have two commits happening before the next vblank, we could end up freeing reusing the same dlist entries before the next vblank.

Indeed, we would start from an initial state taking, for example, the dlist entries 10 to 20, then start a commit taking the entries 20 to 30 and setting the dlist pointer to 20, and freeing the dlist entries 10 to
20. However, since we haven't reach vblank yet, the HVS is still using the entries 10 to 20.

If we were to make a new commit now, chances are the allocator are going to give the 10 to 20 entries back, and we would change their content to match the new state. If vblank hasn't happened yet, we just corrupted the active dlist entries.

A first attempt to solve this was made by creating an intermediate dlist buffer to store the current (ie, as of the last commit) dlist content, that we would update each time the HVS is done with a frame. However, if the interrupt handler missed the vblank window, we would end up copying our intermediate dlist to the hardware one during the composition, essentially creating the same issue.

Since making sure that our interrupt handler runs within a fixed, constrained, time window would require to make Linux a real-time kernel, this seems a bit out of scope.

Instead, we can work around our original issue by keeping the dlist slots allocation longer. That way, we won't reuse a dlist slot while it's still in flight. In order to achieve this, instead of freeing the dlist slot when its associated CRTC state is destroyed, we'll queue it in a list.

A naive implementation would free the buffers in that queue when we get our end of frame interrupt. However, there's still a race since, just like in the shadow dlist case, we don't control when the handler for that interrupt is going to run. Thus, we can end up with a commit adding an old dlist allocation to our queue during the window between our actual interrupt and when our handler will run. And since that buffer is still being used for the composition of the current frame, we can't free it right away, exposing us to the original bug.

Fortunately for us, the hardware provides a frame counter that is increased each time the first line of a frame is being generated. Associating the frame counter the image is supposed to go away to the allocation, and then only deallocate buffers that have a counter below or equal to the one we see when the deallocation code should prevent the above race from occuring.

Signed-off-by: Maxime Ripard [email protected]

During normal operations, the cursor position update is done through an
asynchronous plane update, which on the vc4 driver basically just
modifies the right dlist word to move the plane to the new coordinates.

However, when we have the overscan margins setup, we fall back to a
regular commit when we are next to the edges. And since that commit
happens to be on a cursor plane, it's considered a legacy cursor update
by KMS.

The main difference it makes is that it won't wait for its completion
(ie, next vblank) before returning. This means if we have multiple
commits happening in rapid succession, we can have several of them
happening before the next vblank.

In parallel, our dlist allocation is tied to a CRTC state, and each time
we do a commit we end up with a new CRTC state, with the previous one
being freed. This means that we free our previous dlist entry (but don't
clear it though) every time a new one is being committed.

Now, if we were to have two commits happening before the next vblank, we
could end up freeing reusing the same dlist entries before the next
vblank.

Indeed, we would start from an initial state taking, for example, the
dlist entries 10 to 20, then start a commit taking the entries 20 to 30
and setting the dlist pointer to 20, and freeing the dlist entries 10 to
20. However, since we haven't reach vblank yet, the HVS is still using
the entries 10 to 20.

If we were to make a new commit now, chances are the allocator are going
to give the 10 to 20 entries back, and we would change their content to
match the new state. If vblank hasn't happened yet, we just corrupted
the active dlist entries.

A first attempt to solve this was made by creating an intermediate dlist
buffer to store the current (ie, as of the last commit) dlist content,
that we would update each time the HVS is done with a frame. However, if
the interrupt handler missed the vblank window, we would end up copying
our intermediate dlist to the hardware one during the composition,
essentially creating the same issue.

Since making sure that our interrupt handler runs within a fixed,
constrained, time window would require to make Linux a real-time kernel,
this seems a bit out of scope.

Instead, we can work around our original issue by keeping the dlist
slots allocation longer. That way, we won't reuse a dlist slot while
it's still in flight. In order to achieve this, instead of freeing the
dlist slot when its associated CRTC state is destroyed, we'll queue it
in a list.

A naive implementation would free the buffers in that queue when we get
our end of frame interrupt. However, there's still a race since, just
like in the shadow dlist case, we don't control when the handler for
that interrupt is going to run. Thus, we can end up with a commit adding
an old dlist allocation to our queue during the window between our
actual interrupt and when our handler will run. And since that buffer is
still being used for the composition of the current frame, we can't free
it right away, exposing us to the original bug.

Fortunately for us, the hardware provides a frame counter that is
increased each time the first line of a frame is being generated.
Associating the frame counter the image is supposed to go away to the
allocation, and then only deallocate buffers that have a counter below
or equal to the one we see when the deallocation code should prevent the
above race from occuring.

Signed-off-by: Maxime Ripard <[email protected]>
@popcornmix
Copy link
Collaborator Author

A while back we had an easily reproducible graphics glitch with the cursor moving between the edges of two displays.
#4895 was the fix.

Upstream suggested removing the legacy cursor support, and we followed that including a revert here: #4971

That avoided the easily reproducible case.

But there's been a number of glitches seen since then that don't involve the cursor.
Typically they appear as multicoloured pixels that occupy the bottom part (up to 20%) of screen.

I've been seeing it in kodi when playing video. It affects both hw and sw decoded video.
It can also occur in kodi's gui (when library scan busy spinner is updating is one case).
It also occurs in RpiOS when running mutter (typically with something animating like a youtube video, windowed in browser)

Some reports:
https://forum.kodi.tv/showthread.php?tid=369090
https://forums.raspberrypi.com/viewtopic.php?p=2068441#p2068441
https://forums.raspberrypi.com/viewtopic.php?t=340078

Here's one I caught on kodi's gui.
kodi_glitch

I've resurrected Maxime's cursor fix and I've been unable to reproduce any glitches since. I think it is still wanted.
I believe we are freeing dlist allocation while it is still on screen. If it gets reused, the hvs processes a corrupt dlist
(typically towards the end of a display).

@popcornmix popcornmix marked this pull request as ready for review January 23, 2023 11:25
@popcornmix
Copy link
Collaborator Author

@mripard do you think reintroducing this is correct?

I've had a couple of positive reports from LibreELEC slack.

@pelwell
Copy link
Contributor

pelwell commented Jan 23, 2023

I'm happy to merge on the basis of your testing alone, whether or not it is ultimately the best solution.

@popcornmix
Copy link
Collaborator Author

Lets get it in for wider testing.

@pelwell pelwell merged commit 7ed8668 into raspberrypi:rpi-5.15.y Jan 23, 2023
@popcornmix popcornmix deleted the defer_dlist_515 branch January 23, 2023 17:12
@popcornmix
Copy link
Collaborator Author

I've picked it to 6.0 and 6.1 trees (it needed slightly different fixups).

popcornmix added a commit to raspberrypi/firmware that referenced this pull request Jan 24, 2023
See: raspberrypi/linux#5327

kernel: vc4/hdmi: Always enable GCP with AVMUTE cleared
See: raspberrypi/linux#5317

kernel: media: bcm2835-v4l2-codec: Enable selection ioctl for ISP
See: raspberrypi/linux#5328

userland: tvservice: Update unsupported message to recommend kmsprint
See: https://forums.raspberrypi.com/viewtopic.php?t=346202
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Jan 24, 2023
See: raspberrypi/linux#5327

kernel: vc4/hdmi: Always enable GCP with AVMUTE cleared
See: raspberrypi/linux#5317

kernel: media: bcm2835-v4l2-codec: Enable selection ioctl for ISP
See: raspberrypi/linux#5328

userland: tvservice: Update unsupported message to recommend kmsprint
See: https://forums.raspberrypi.com/viewtopic.php?t=346202
@pelwell
Copy link
Contributor

pelwell commented Jan 24, 2023

Did you notice the KUnit test failures? https://github.com/raspberrypi/linux/actions/runs/3988799056
They seem to start with this commit.

@popcornmix
Copy link
Collaborator Author

@mripard can you see why kunit test fails with this commit (on 6.1 and 6.2).
It seems to boot and work fine on my Pi4.

@mripard
Copy link
Contributor

mripard commented Jan 25, 2023

Sorry for the delay

@mripard do you think reintroducing this is correct?

For the record, yes, if it still fixes some issue then it makes sense to merge it.

I've had a look at the kunit crash and we now access a register (the frame count) as part of the CRTC state destruction. Since kunit runs with a mock device, it's a immediate failure condition. It's not clear to me what the best solution is at the moment, but I'm on it.

@pelwell
Copy link
Contributor

pelwell commented Jan 25, 2023

Many thanks, Maxime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants