-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Clock Framework Improvements, and removal of clock requests API #4940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clock Framework Improvements, and removal of clock requests API #4940
Conversation
drivers/gpu/drm/vc4/vc4_kms.c
Outdated
* Request a clock rate based on the current HVS | ||
* requirements. | ||
*/ | ||
hvs->core_req = clk_request_start(hvs->core_clk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be replaced by a call to clk_set_min_rate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added in the next patch but you're right that it makes more sense to have it there. I've reworked the branch a bit and pushed a new version.
I think this PR is okay. In testing I found a case where pixel-bvb clock wasn't being set. |
ccce78f
to
2f45eaa
Compare
20f34a0
to
87ed617
Compare
@mripard after this PR I can't use a 4kp60 display. A bit of debugging shows we set Not quite sure why that doesn't give a working non-4kp60 mode (no mode is set at all). Without this PR, |
I found what's going on, and I'm not sure how to fix it properly. We were using clk_round_rate to infer whether hdmi_enable_4kp60 was enabled or not since the maximum was higher when enabled. However, the clock changes we did will now try to always run at the minimum frequency, so we'll end up here with 220MHz, which is the minimum allowed by the firmware at that point in time.
Even we couldn't detect the parameter properly, this should work though. |
Actually, it makes sense: the firmware will have used 4kp60 and will pass that on the kernel command line. However, the kernel wrongly assumes it can't do that, so the framebuffer emulation will still try to set that mode and the kernel will reject it. We then end up with a black screen. |
2f45eaa
to
b7da038
Compare
Ah yes.
We still have the number wanted with this PR
is there not a way of getting hold of that from the kms driver? |
Not really, there's no consumer API to retrieve the clock boundaries. I've pushed a new version that will try to raise the minimum to 550MHz. If we don't have enable_hdmi_4kp60, the range reported by the firmware (120-500) will be disjoint with the range we're trying to set and it will error out. If it's enable, the range will become 550-550 and will thus work. |
@mripard updated version of this patch is not working for me.
The core is being set to 220MHz which is too low for 4kp60 (+1920x1080p). Interestingly when using just a 4kp60 display I get:
so, core is 534MHz. It feels like the second display is replacing the core clock requirement (rather than adding to it). |
Actually, the issue is that the clk_set_min_rate calls fail in vc4_atomic_commit_tail. Indeed, we try to set a minimum of 683 MHz, which is our estimate of what the core clock should be. Since it's way higher than the maximum that the firmware report, clk_set_min_rate will fail, and thus we never have our minimum enforced. Now the interesting part is that if I cap the HVS rate we try to enforce to 550MHz, it seems to work just fine. So it looks like our calculation of the core clock is completely off. For reference, this is the calculation we have at the moment: In our case, HDMI0 runs at 3840x2160@60Hz, so it gives a load of 594000000 * 9 / 10 = 534600000.0 HDMI1 will run at 1920x1080@60Hz, so 148500000. We'll do the sum here: and thus, we end up with 683100000.0 Hz. |
I think the firmware effectively does Probably due to HVS processing planes sequentially, compared to Pixel Valve driving displays in parallel. The numbers are somewhat heuristic, and won't be exact, so it would be better to choose the closest frequency (i.e. max) we can, rather than dropping a too high request. |
Yeah, this would make sense that it would be the max. I'll let @timg236 confirm. I've also added a warning when clk_set_min_rate fails to at least complain instead of ignoring it. And I'm currently reworking it to max it to what the firmware will allow us to. |
Just to check - this new API has no way of asking for whatever the maximum supported clock is? The HEVC clock really wants that. There isn't a "good" rate I can calculate beforehand, and in nearly all cases faster is better. |
It's not a new API, it was already there in the kernel. We changed its behaviour for some of our clocks though And no, mostly because it would be racy and thus wouldn't be very useful. Even if you were to retrieve and ask for the maximum, there's nothing that would prevent another user from changing that maximum before you get the chance to ask for that maximum.
So you just want to run the clock at the highest frequency all the time? That doesn't sound very power efficient, but we can do that fairly easily by doing something like 1c19533 |
I want to run the clock at max while I am decoding - i.e. when I get stream start start it at max, when stop reset to 0 - short of exciting (prone to error) code that measures Q lengths and adjusts the clock dynamically I can't think of a better idea - can you? |
Like I said this would be possible by implementing something like 1c19533, but I'm not sure it needs to be part of this PR. It wasn't possible before that PR either. |
@mripard it is currently possible (prior to this PR).
gets the max hevc clock frequency (which is 500MHz by default, 550MHz when using hdmi_enable_4kp60 and may be higher still with a custom config.txt setting).
|
Right, but that still works? The HEVC clock will just take the boundaries reported by the firmware and clamp ULONG_MAX to those boundaries. |
Ah - I assumed because we couldn't determine the max core clock, we wouldn't be able to determine the max hevc clock. |
We don't have that option for KMS because the core clock will always return the minimum available so it's not just a clamp. It's the only clock in that configuration at the moment, so all the others will just return the rate asked for, assuming it's within the boundaries reported by the firmware. |
What's the status on this PR? As I understand things, it (or something like it) is needed, but as it stands dual monitor support (at least in some configurations) is broken. |
On 2711 HVS runs off the the core-clk and the minimum core clock frequency must be the maximum of pixel reading input stage and the composition output stage. The composition stage happens in parallel and reads from the Composition Output Buffer. This must be at least as fast as the highest active pixel rate so 550 MHz on HDMI0 for 4Kp60. However, the PV for HDMI1 has a much smaller FIFO so it would need to be at least as fast as the display pixel-clock on HDMI1. The input stage is roughly the sum of all the input planes but AFAIK there's no scheduling and it's hard to fully utilise the bus for two displays so you have to add some overhead percentage. You also have to take scaling into account. The firmware numbers for the input stage were generated by experimentally finding a stable clock configuration for common use-cases e.g. dual display + camera + OMX video. |
What needs to be done to progress this branch? We've got a bug in the existing code that can cause the linked list of clock requests to grow indefinitely that ends up using enough CPU to break timely video decode at high bitrates. This patch fixes the issue by removing that logic. I could search for the previous bug but there doesn't seem a lot of point if the fix is almost certainly in code that is going to vanish. |
Yes, dual output is broken with that PR and we needed some input from @timg236 to get going. We shouldn't merge this version of the PR.
I think we cover the difference between HDMI0 and HDMI1 by using: And we then associate to each FIFO the frequency we generated before: And finally we sum everything here: If the output part all happens in parallel, then we should move that to take the max, not the sum, right?
Ok, so it's sum of all input, but max of all the output |
We were waiting for a clarification, and I'm going to push a new version some time this week |
"If the output part all happens in parallel, then we should move that to take the max, not the sum, right?" The maximum would be better. It's not really in parallel, I think it interleaves the scaling / composition of the display FIFOs but runs with an internal data rate of up to 4 pixels per-clock unscaled or 2 pixels per clock scaled. The scheduling and memory fetching won't be perfect so you have to allow some overhead for that. It might be best to allow the display to slightly overshoot the max-clock rate and clamp that before rejecting elements because it's just too complicated to get a perfect model of the HVS clocks on an unconstrained system (i.e. desktop Linux). |
Yeah - the bulk of that commit eba927c is sound enough, in that it removes stuff that is about to vanish but the 2 extra lines in rpivid_stop_streaming do seem utterly spurious. |
Yeah, I wasn't entirely sure how to revert it, but it's revert of both ec7556e, while taking 50d2e7e into account. While we needed to remove the old clock API, the new layout of the code seemed sound to me so I wanted to keep it that way. The two lines in rpivid_stop_streaming are indeed spurious, I'll remove them. |
This reverts commit ec7556e.
This reverts commit 12917ad.
This reverts commit f078b2c.
This reverts commit 23fbabe.
With the recent introduction of clock drivers that will force their clock rate to either the minimum or maximum boundaries, it becomes harder for clock users to discover either boundary of their clock. Indeed, the best way to do that previously was to call clk_round_rate() on either 0 or ULONG_MAX and count on the driver to clamp the rate to the current boundary, but that won't work anymore. Since any other alternative (calling clk_set_rate_range() and looking at the returned value, calling clk_round_rate() still, or just doing nothing) depends on how the driver will behaves, we actually are punching a hole through the abstraction provided by the clock framework. In order to avoid any abstraction violation, let's create a bunch of accessors that will return the current minimum and maximum for a given clock. Signed-off-by: Maxime Ripard <[email protected]>
Let's introduce a bunch of unit tests to make sure the values returned by clk_get_rate_range() are sane. Signed-off-by: Maxime Ripard <[email protected]>
Let's add a test on the rate range after a reparenting. This fails for now, but it's worth having it to document the corner cases we don't support yet. Signed-off-by: Maxime Ripard <[email protected]>
In order to support higher HDMI frequencies, users have to set the hdmi_enable_4kp60 parameter in their config.txt file. We were detecting this so far by calling clk_round_rate on the core clock with the frequency we're supposed to run at when one of those modes is enabled. Whether or not the parameter was enabled could then be inferred by the returned rate since the maximum clock rate reported by the firmware was one of the side effect of setting that parameter. However, the recent clock rework we did changed what clk_round_rate was returning to always return the minimum allowed, and thus this test wasn't reliable anymore. Let's instead try to set a minimum on that clock for the rate we'd like to reach. If the maximum reported by the firmware is below the minimum we're trying to set, the clock framework will return an error which we then can use to infer whether the parameter is set or not. Signed-off-by: Maxime Ripard <[email protected]>
We currently ignore the clk_set_min_rate return code assuming it would succeed. However, it can fail if we ask for a rate higher than the current maximum for example. Since we can't fail in atomic_commit, at least warn on failure. Signed-off-by: Maxime Ripard <[email protected]>
Following the clock rate range improvements to the clock framework, trying to set a disjoint range on a clock will now result in an error. Thus, we can't set a minimum rate higher than the maximum reported by the firmware, or clk_set_min_rate() will fail. Thus we need to clamp the rate we are about to ask for to the maximum rate possible on that clock. Signed-off-by: Maxime Ripard <[email protected]>
The core clock computation takes into account both the load due to the input (ie, planes) and its output (ie, encoders). However, while the input load needs to consider all the planes, and thus sum all of their associated loads, the output happens mostly in parallel. Therefore, we need to consider only the maximum of all the output loads, and not the sum like we were doing. This resulted in a clock rate way too high which could be discarded for being too high by the clock framework. Since recent changes, the clock framework will even downright reject it, leading to a core clock being too low for its current needs. Fixes: 16e1010 ("drm/vc4: Increase the core clock based on HVS load") Signed-off-by: Maxime Ripard <[email protected]>
The driver was using clk_round_rate() to figure out the maximum clock rate that was allowed for the HEVC clock. Since we have a function to return it directly now, let's use it. Signed-off-by: Maxime Ripard <[email protected]>
0377905
to
ec18e22
Compare
Thanks - that will do. |
See: raspberrypi/linux#4940 kernel: config: Enable the NFT_SYNPROXY module See: raspberrypi/linux#4993 kernel: configs: (Re)Enable CONFIG_IR_TOY See: raspberrypi/linux#4997
See: raspberrypi/linux#4940 kernel: config: Enable the NFT_SYNPROXY module See: raspberrypi/linux#4993 kernel: configs: (Re)Enable CONFIG_IR_TOY See: raspberrypi/linux#4997
I'm getting a hang on boot following this PR on Pi3+ with hdmi disconnected (i.e. composite output). |
I have the same issue, I'm looking into it. |
I found what's going on, but I'm not sure how to address or workaround the issue. The offending commit is 2a52a4b. The HSM clock seems to be uninitialized by the firmware when no monitor is plugged in, which was pointed out in #4457. So far, we had a call to clk_set_rate introduced by #4547, later changed to clk_set_min_rate upstream (2d717d8). On RaspberryPi3, the HSM clock is handled by the clk-bcm2835 driver: For the HSM clock, the driver uses the REGISTER_PER_CLK macro Which then set the parents to the bcm2835_clock_per_parents list When the monitor isn't attached, it looks like the parent is the "gnd" clock, which is never registered. The HSM clock is thus always an orphan, and the clk_set_min_rate call skips the rate change due to the commit above. Since the rate is never changed, the rate is still at 0, and we have a lockup. I think we could just drop the commit 2a52a4b. Whether or not to skip the rate change could be argued either way, and it's not critical anyway. However, the fact that the GND clock is a (default?) parent for most peripheral clocks but is never registered seems like a bug to me. I tried to google for some information about what that clock could be but came short, so I'm not quite sure how to address that. |
A naive guess would be that GND is just "ground" i.e. a null 0 Hz clk |
Yes, GND is just a null clock.
There's no reason for this. I think moving this to fw driver makes sense (especially if it avoids this issue). |
It works fine if we use the firmware clock, I've pushed a PR to address this. |
Does this mean that 4kp60 is supposed to work out of the box now? Because as of 1c56b3a58974f725e1ac63214eeaf914c5908302, I still get hangs/blank screens if I try to use 4096x2160p60 unless I overclock the cpu via config.txt (core_freq=600 core_freq_min=600) dmesg reports: vc4-drm gpu: [drm] ERROR Timed out waiting for commit |
It was supposed to work out of the box before this PR, and is still supposed to work out of the box after. Could you file a new issue? |
Hi,
This is a PR that brings the changes that have been accepted into upstream to bring the same feature provided by the clock request API we have in the downstream tree.
The approach taken has been different though and has instead relied on fixing the clk_set_rate_range / clk_set_min_rate functions instead.
So this PR contains both the new changes aimed for 5.18, and reverts the API we had before to sync the tree with upstream.
This also fixes the cursorA-vs-flipA-toggle subtest of the kms_cursor_legacy test suite in IGT that was producing a null pointer dereference on the clock request pointer.
Let me know what you think,
Maxime