Skip to content

Clock Framework Improvements, and removal of clock requests API #4940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Apr 22, 2022

Conversation

mripard
Copy link
Contributor

@mripard mripard commented Mar 16, 2022

Hi,

This is a PR that brings the changes that have been accepted into upstream to bring the same feature provided by the clock request API we have in the downstream tree.

The approach taken has been different though and has instead relied on fixing the clk_set_rate_range / clk_set_min_rate functions instead.

So this PR contains both the new changes aimed for 5.18, and reverts the API we had before to sync the tree with upstream.

This also fixes the cursorA-vs-flipA-toggle subtest of the kms_cursor_legacy test suite in IGT that was producing a null pointer dereference on the clock request pointer.

Let me know what you think,
Maxime

* Request a clock rate based on the current HVS
* requirements.
*/
hvs->core_req = clk_request_start(hvs->core_clk,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be replaced by a call to clk_set_min_rate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added in the next patch but you're right that it makes more sense to have it there. I've reworked the branch a bit and pushed a new version.

@popcornmix
Copy link
Collaborator

I think this PR is okay. In testing I found a case where pixel-bvb clock wasn't being set.
Manually setting that through mailbox brought screen back.
But I think the fix for this is probably firmware side and isn't introduced by this PR.

@mripard mripard force-pushed the rpi/5.15-clk-improvements branch from ccce78f to 2f45eaa Compare March 21, 2022 09:18
@popcornmix popcornmix force-pushed the rpi-5.15.y branch 2 times, most recently from 20f34a0 to 87ed617 Compare March 21, 2022 16:05
@popcornmix
Copy link
Collaborator

popcornmix commented Mar 23, 2022

@mripard after this PR I can't use a 4kp60 display.

A bit of debugging shows we set vc4_hdmi->disable_4kp60 = true as clk_round_rate(vc4->hvs->core_clk, 550000000) returns 220000000. I have hdmi_enable_4kp60=1 in config.txt.

Not quite sure why that doesn't give a working non-4kp60 mode (no mode is set at all).

Without this PR, clk_round_rate(vc4->hvs->core_clk, 550000000) returns 55000000 and 4kp60 works.

@mripard
Copy link
Contributor Author

mripard commented Mar 24, 2022

@mripard after this PR I can't use a 4kp60 display.

A bit of debugging shows we set vc4_hdmi->disable_4kp60 = true as clk_round_rate(vc4->hvs->core_clk, 550000000) returns 220000000. I have hdmi_enable_4kp60=1 in config.txt.

I found what's going on, and I'm not sure how to fix it properly. We were using clk_round_rate to infer whether hdmi_enable_4kp60 was enabled or not since the maximum was higher when enabled.

However, the clock changes we did will now try to always run at the minimum frequency, so we'll end up here with 220MHz, which is the minimum allowed by the firmware at that point in time.

Not quite sure why that doesn't give a working non-4kp60 mode (no mode is set at all).

Even we couldn't detect the parameter properly, this should work though.

@mripard
Copy link
Contributor Author

mripard commented Mar 24, 2022

Not quite sure why that doesn't give a working non-4kp60 mode (no mode is set at all).

Actually, it makes sense: the firmware will have used 4kp60 and will pass that on the kernel command line. However, the kernel wrongly assumes it can't do that, so the framebuffer emulation will still try to set that mode and the kernel will reject it. We then end up with a black screen.

@mripard mripard force-pushed the rpi/5.15-clk-improvements branch from 2f45eaa to b7da038 Compare March 24, 2022 11:03
@popcornmix
Copy link
Collaborator

Actually, it makes sense: the firmware will have used 4kp60 and will pass that on the kernel command line. However, the kernel wrongly assumes it can't do that, so the framebuffer emulation will still try to set that mode and the kernel will reject it. We then end up with a black screen.

Ah yes. disable_fw_kms_setup=1 in config.txt avoids that (and we end up with 4kp30).

I found what's going on, and I'm not sure how to fix it properly. We were using clk_round_rate to infer whether hdmi_enable_4kp60 was enabled or not since the maximum was higher when enabled.

We still have the number wanted with this PR

pi@pi4:~ $ sudo cat /sys/kernel/debug/clk/fw-clk-core/clk_max_rate
550000000

is there not a way of getting hold of that from the kms driver?

@mripard
Copy link
Contributor Author

mripard commented Mar 24, 2022

pi@pi4:~ $ sudo cat /sys/kernel/debug/clk/fw-clk-core/clk_max_rate
550000000

is there not a way of getting hold of that from the kms driver?

Not really, there's no consumer API to retrieve the clock boundaries. I've pushed a new version that will try to raise the minimum to 550MHz. If we don't have enable_hdmi_4kp60, the range reported by the firmware (120-500) will be disjoint with the range we're trying to set and it will error out. If it's enable, the range will become 550-550 and will thus work.

@popcornmix
Copy link
Collaborator

@mripard updated version of this patch is not working for me.
With 4kp60 + 1920x1200@60 display I get http://sprunge.us/DRnXm1
It looks like new code gets a valid core_clk and clk_set_min_rate returns 0 (and the bind function is called 3 times), but no displays and flip done timeout.
pi@pi4:~ $ clk | grep fw
Looking at the fw clocks:

 fw-clk-vec                           0        0        0           0          0     0  50000         Y
 fw-clk-pixel-bvb                     2        2        0   300000000          0     0  50000         Y
 fw-clk-m2mc                          2        2        0   599940000          0     0  50000         Y
 fw-clk-hevc                          0        0        0   250000000          0     0  50000         Y
 fw-clk-pixel                         0        0        0           0          0     0  50000         Y
 fw-clk-v3d                           0        0        0   250000000          0     0  50000         Y
 fw-clk-core                          1        1        0   220000000          0     0  50000         Y
 fw-clk-arm                           0        0        0   600000000          0     0  50000         Y

The core is being set to 220MHz which is too low for 4kp60 (+1920x1080p).

Interestingly when using just a 4kp60 display I get:

fw-clk-vec                           0        0        0           0          0     0  50000         Y
 fw-clk-pixel-bvb                     1        1        0   300000000          0     0  50000         Y
 fw-clk-m2mc                          1        1        0   599940000          0     0  50000         Y
 fw-clk-hevc                          0        0        0   250000000          0     0  50000         Y
 fw-clk-pixel                         0        0        0           0          0     0  50000         Y
 fw-clk-v3d                           0        0        0   250000000          0     0  50000         Y
 fw-clk-core                          1        1        0   534600000          0     0  50000         Y
 fw-clk-arm                           0        0        0   600000000          0     0  50000         Y

so, core is 534MHz. It feels like the second display is replacing the core clock requirement (rather than adding to it).

@mripard
Copy link
Contributor Author

mripard commented Mar 24, 2022

Actually, the issue is that the clk_set_min_rate calls fail in vc4_atomic_commit_tail.

Indeed, we try to set a minimum of 683 MHz, which is our estimate of what the core clock should be. Since it's way higher than the maximum that the firmware report, clk_set_min_rate will fail, and thus we never have our minimum enforced.

Now the interesting part is that if I cap the HVS rate we try to enforce to 550MHz, it seems to work just fine. So it looks like our calculation of the core clock is completely off.

For reference, this is the calculation we have at the moment:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_crtc.c#L712

In our case, HDMI0 runs at 3840x2160@60Hz, so it gives a load of 594000000 * 9 / 10 = 534600000.0

HDMI1 will run at 1920x1080@60Hz, so 148500000.

We'll do the sum here:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_kms.c#L620
and here:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_kms.c#L901

and thus, we end up with 683100000.0 Hz.

@popcornmix
Copy link
Collaborator

I think the firmware effectively does max(534600000, 148500000) rather than sum for the part related to display timings. The part related to input planes is summed across the displays.

Probably due to HVS processing planes sequentially, compared to Pixel Valve driving displays in parallel.
@timg236 may be able to explain this better.

The numbers are somewhat heuristic, and won't be exact, so it would be better to choose the closest frequency (i.e. max) we can, rather than dropping a too high request.

@mripard
Copy link
Contributor Author

mripard commented Mar 25, 2022

Yeah, this would make sense that it would be the max. I'll let @timg236 confirm.

I've also added a warning when clk_set_min_rate fails to at least complain instead of ignoring it. And I'm currently reworking it to max it to what the firmware will allow us to.

@jc-kynesim
Copy link
Contributor

Just to check - this new API has no way of asking for whatever the maximum supported clock is? The HEVC clock really wants that. There isn't a "good" rate I can calculate beforehand, and in nearly all cases faster is better.

@mripard
Copy link
Contributor Author

mripard commented Apr 5, 2022

Just to check - this new API has no way of asking for whatever the maximum supported clock is?

It's not a new API, it was already there in the kernel. We changed its behaviour for some of our clocks though

And no, mostly because it would be racy and thus wouldn't be very useful. Even if you were to retrieve and ask for the maximum, there's nothing that would prevent another user from changing that maximum before you get the chance to ask for that maximum.

The HEVC clock really wants that. There isn't a "good" rate I can calculate beforehand, and in nearly all cases faster is better.

So you just want to run the clock at the highest frequency all the time? That doesn't sound very power efficient, but we can do that fairly easily by doing something like 1c19533

@jc-kynesim
Copy link
Contributor

So you just want to run the clock at the highest frequency all the time? That doesn't sound very power efficient, but we can do that fairly easily by doing something like 1c19533

I want to run the clock at max while I am decoding - i.e. when I get stream start start it at max, when stop reset to 0 - short of exciting (prone to error) code that measures Q lengths and adjusts the clock dynamically I can't think of a better idea - can you?

@mripard
Copy link
Contributor Author

mripard commented Apr 5, 2022

Like I said this would be possible by implementing something like 1c19533, but I'm not sure it needs to be part of this PR. It wasn't possible before that PR either.

@popcornmix
Copy link
Collaborator

@mripard it is currently possible (prior to this PR).

max_hevc_clock = clk_round_rate(dev->clock, ULONG_MAX);

gets the max hevc clock frequency (which is 500MHz by default, 550MHz when using hdmi_enable_4kp60 and may be higher still with a custom config.txt setting).

@mripard
Copy link
Contributor Author

mripard commented Apr 7, 2022

Right, but that still works?

The HEVC clock will just take the boundaries reported by the firmware and clamp ULONG_MAX to those boundaries.

@popcornmix
Copy link
Collaborator

Ah - I assumed because we couldn't determine the max core clock, we wouldn't be able to determine the max hevc clock.

@mripard
Copy link
Contributor Author

mripard commented Apr 7, 2022

We don't have that option for KMS because the core clock will always return the minimum available so it's not just a clamp.

It's the only clock in that configuration at the moment, so all the others will just return the rate asked for, assuming it's within the boundaries reported by the firmware.

@pelwell
Copy link
Contributor

pelwell commented Apr 11, 2022

What's the status on this PR? As I understand things, it (or something like it) is needed, but as it stands dual monitor support (at least in some configurations) is broken.

@timg236
Copy link
Contributor

timg236 commented Apr 11, 2022

Yeah, this would make sense that it would be the max. I'll let @timg236 confirm.

I've also added a warning when clk_set_min_rate fails to at least complain instead of ignoring it. And I'm currently reworking it to max it to what the firmware will allow us to.

On 2711 HVS runs off the the core-clk and the minimum core clock frequency must be the maximum of pixel reading input stage and the composition output stage. The composition stage happens in parallel and reads from the Composition Output Buffer. This must be at least as fast as the highest active pixel rate so 550 MHz on HDMI0 for 4Kp60. However, the PV for HDMI1 has a much smaller FIFO so it would need to be at least as fast as the display pixel-clock on HDMI1.

The input stage is roughly the sum of all the input planes but AFAIK there's no scheduling and it's hard to fully utilise the bus for two displays so you have to add some overhead percentage. You also have to take scaling into account. The firmware numbers for the input stage were generated by experimentally finding a stable clock configuration for common use-cases e.g. dual display + camera + OMX video.

@jc-kynesim
Copy link
Contributor

What needs to be done to progress this branch? We've got a bug in the existing code that can cause the linked list of clock requests to grow indefinitely that ends up using enough CPU to break timely video decode at high bitrates. This patch fixes the issue by removing that logic. I could search for the previous bug but there doesn't seem a lot of point if the fix is almost certainly in code that is going to vanish.

@mripard
Copy link
Contributor Author

mripard commented Apr 13, 2022

What's the status on this PR? As I understand things, it (or something like it) is needed, but as it stands dual monitor support (at least in some configurations) is broken.

Yes, dual output is broken with that PR and we needed some input from @timg236 to get going. We shouldn't merge this version of the PR.

Yeah, this would make sense that it would be the max. I'll let @timg236 confirm.
I've also added a warning when clk_set_min_rate fails to at least complain instead of ignoring it. And I'm currently reworking it to max it to what the firmware will allow us to.

On 2711 HVS runs off the the core-clk and the minimum core clock frequency must be the maximum of pixel reading input stage and the composition output stage. The composition stage happens in parallel and reads from the Composition Output Buffer. This must be at least as fast as the highest active pixel rate so 550 MHz on HDMI0 for 4Kp60. However, the PV for HDMI1 has a much smaller FIFO so it would need to be at least as fast as the display pixel-clock on HDMI1.

I think we cover the difference between HDMI0 and HDMI1 by using:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_crtc.c#L712

And we then associate to each FIFO the frequency we generated before:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_kms.c#L925

And finally we sum everything here:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/gpu/drm/vc4/vc4_kms.c#L954

If the output part all happens in parallel, then we should move that to take the max, not the sum, right?

The input stage is roughly the sum of all the input planes but AFAIK there's no scheduling and it's hard to fully utilise the bus for two displays so you have to add some overhead percentage. You also have to take scaling into account. The firmware numbers for the input stage were generated by experimentally finding a stable clock configuration for common use-cases e.g. dual display + camera + OMX video.

Ok, so it's sum of all input, but max of all the output

@mripard
Copy link
Contributor Author

mripard commented Apr 13, 2022

What needs to be done to progress this branch? We've got a bug in the existing code that can cause the linked list of clock requests to grow indefinitely that ends up using enough CPU to break timely video decode at high bitrates. This patch fixes the issue by removing that logic. I could search for the previous bug but there doesn't seem a lot of point if the fix is almost certainly in code that is going to vanish.

We were waiting for a clarification, and I'm going to push a new version some time this week

@timg236
Copy link
Contributor

timg236 commented Apr 13, 2022

"If the output part all happens in parallel, then we should move that to take the max, not the sum, right?"

The maximum would be better. It's not really in parallel, I think it interleaves the scaling / composition of the display FIFOs but runs with an internal data rate of up to 4 pixels per-clock unscaled or 2 pixels per clock scaled. The scheduling and memory fetching won't be perfect so you have to allow some overhead for that.

It might be best to allow the display to slightly overshoot the max-clock rate and clamp that before rejecting elements because it's just too complicated to get a perfect model of the HVS clocks on an unconstrained system (i.e. desktop Linux).

@jc-kynesim
Copy link
Contributor

Yeah - the bulk of that commit eba927c is sound enough, in that it removes stuff that is about to vanish but the 2 extra lines in rpivid_stop_streaming do seem utterly spurious.

@mripard
Copy link
Contributor Author

mripard commented Apr 22, 2022

Yeah, I wasn't entirely sure how to revert it, but it's revert of both ec7556e, while taking 50d2e7e into account. While we needed to remove the old clock API, the new layout of the code seemed sound to me so I wanted to keep it that way.

The two lines in rpivid_stop_streaming are indeed spurious, I'll remove them.

mripard added 12 commits April 22, 2022 10:57
With the recent introduction of clock drivers that will force their
clock rate to either the minimum or maximum boundaries, it becomes
harder for clock users to discover either boundary of their clock.

Indeed, the best way to do that previously was to call clk_round_rate()
on either 0 or ULONG_MAX and count on the driver to clamp the rate to
the current boundary, but that won't work anymore.

Since any other alternative (calling clk_set_rate_range() and looking at
the returned value, calling clk_round_rate() still, or just doing
nothing) depends on how the driver will behaves, we actually are
punching a hole through the abstraction provided by the clock framework.

In order to avoid any abstraction violation, let's create a bunch of
accessors that will return the current minimum and maximum for a given
clock.

Signed-off-by: Maxime Ripard <[email protected]>
Let's introduce a bunch of unit tests to make sure the values returned
by clk_get_rate_range() are sane.

Signed-off-by: Maxime Ripard <[email protected]>
Let's add a test on the rate range after a reparenting. This fails for
now, but it's worth having it to document the corner cases we don't
support yet.

Signed-off-by: Maxime Ripard <[email protected]>
In order to support higher HDMI frequencies, users have to set the
hdmi_enable_4kp60 parameter in their config.txt file.

We were detecting this so far by calling clk_round_rate on the core
clock with the frequency we're supposed to run at when one of those
modes is enabled. Whether or not the parameter was enabled could then be
inferred by the returned rate since the maximum clock rate reported by
the firmware was one of the side effect of setting that parameter.

However, the recent clock rework we did changed what clk_round_rate was
returning to always return the minimum allowed, and thus this test
wasn't reliable anymore.

Let's instead try to set a minimum on that clock for the rate we'd like
to reach. If the maximum reported by the firmware is below the minimum
we're trying to set, the clock framework will return an error which we
then can use to infer whether the parameter is set or not.

Signed-off-by: Maxime Ripard <[email protected]>
We currently ignore the clk_set_min_rate return code assuming it would
succeed. However, it can fail if we ask for a rate higher than the
current maximum for example.

Since we can't fail in atomic_commit, at least warn on failure.

Signed-off-by: Maxime Ripard <[email protected]>
Following the clock rate range improvements to the clock framework,
trying to set a disjoint range on a clock will now result in an error.

Thus, we can't set a minimum rate higher than the maximum reported by
the firmware, or clk_set_min_rate() will fail.

Thus we need to clamp the rate we are about to ask for to the maximum
rate possible on that clock.

Signed-off-by: Maxime Ripard <[email protected]>
The core clock computation takes into account both the load due to the
input (ie, planes) and its output (ie, encoders).

However, while the input load needs to consider all the planes, and thus
sum all of their associated loads, the output happens mostly in
parallel.

Therefore, we need to consider only the maximum of all the output loads,
and not the sum like we were doing. This resulted in a clock rate way
too high which could be discarded for being too high by the clock
framework.

Since recent changes, the clock framework will even downright reject it,
leading to a core clock being too low for its current needs.

Fixes: 16e1010 ("drm/vc4: Increase the core clock based on HVS load")
Signed-off-by: Maxime Ripard <[email protected]>
The driver was using clk_round_rate() to figure out the maximum clock
rate that was allowed for the HEVC clock.

Since we have a function to return it directly now, let's use it.

Signed-off-by: Maxime Ripard <[email protected]>
@mripard mripard force-pushed the rpi/5.15-clk-improvements branch from 0377905 to ec18e22 Compare April 22, 2022 09:07
@pelwell pelwell merged commit ba69dce into raspberrypi:rpi-5.15.y Apr 22, 2022
@pelwell
Copy link
Contributor

pelwell commented Apr 22, 2022

Thanks - that will do.

popcornmix added a commit to raspberrypi/firmware that referenced this pull request Apr 22, 2022
See: raspberrypi/linux#4940

kernel: config: Enable the NFT_SYNPROXY module
See: raspberrypi/linux#4993

kernel: configs: (Re)Enable CONFIG_IR_TOY
See: raspberrypi/linux#4997
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Apr 22, 2022
See: raspberrypi/linux#4940

kernel: config: Enable the NFT_SYNPROXY module
See: raspberrypi/linux#4993

kernel: configs: (Re)Enable CONFIG_IR_TOY
See: raspberrypi/linux#4997
@popcornmix
Copy link
Collaborator

I'm getting a hang on boot following this PR on Pi3+ with hdmi disconnected (i.e. composite output).
Okay with this PR reverted.

@mripard
Copy link
Contributor Author

mripard commented Apr 25, 2022

I have the same issue, I'm looking into it.

@mripard
Copy link
Contributor Author

mripard commented Apr 25, 2022

I found what's going on, but I'm not sure how to address or workaround the issue.

The offending commit is 2a52a4b. The HSM clock seems to be uninitialized by the firmware when no monitor is plugged in, which was pointed out in #4457.

So far, we had a call to clk_set_rate introduced by #4547, later changed to clk_set_min_rate upstream (2d717d8).

On RaspberryPi3, the HSM clock is handled by the clk-bcm2835 driver:
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/arch/arm/boot/dts/bcm2835-common.dtsi#L132

For the HSM clock, the driver uses the REGISTER_PER_CLK macro
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/clk/bcm/clk-bcm2835.c#L2158

Which then set the parents to the bcm2835_clock_per_parents list
https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/clk/bcm/clk-bcm2835.c#L1604

When the monitor isn't attached, it looks like the parent is the "gnd" clock, which is never registered. The HSM clock is thus always an orphan, and the clk_set_min_rate call skips the rate change due to the commit above. Since the rate is never changed, the rate is still at 0, and we have a lockup.

I think we could just drop the commit 2a52a4b. Whether or not to skip the rate change could be argued either way, and it's not critical anyway.

However, the fact that the GND clock is a (default?) parent for most peripheral clocks but is never registered seems like a bug to me. I tried to google for some information about what that clock could be but came short, so I'm not quite sure how to address that.

@jc-kynesim
Copy link
Contributor

A naive guess would be that GND is just "ground" i.e. a null 0 Hz clk

@popcornmix
Copy link
Collaborator

Yes, GND is just a null clock.

On RaspberryPi3, the HSM clock is handled by the clk-bcm2835 driver:

There's no reason for this. I think moving this to fw driver makes sense (especially if it avoids this issue).

@mripard
Copy link
Contributor Author

mripard commented Apr 25, 2022

It works fine if we use the firmware clock, I've pushed a PR to address this.

@alanbork
Copy link

Does this mean that 4kp60 is supposed to work out of the box now? Because as of 1c56b3a58974f725e1ac63214eeaf914c5908302, I still get hangs/blank screens if I try to use 4096x2160p60 unless I overclock the cpu via config.txt (core_freq=600 core_freq_min=600)

dmesg reports:

vc4-drm gpu: [drm] ERROR Timed out waiting for commit
[ 273.107442] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] ERROR [CRTC:94:crtc-3] flip_done timed out

@mripard
Copy link
Contributor Author

mripard commented May 11, 2022

It was supposed to work out of the box before this PR, and is still supposed to work out of the box after.

Could you file a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants