VC4 firmware display to KMS transition regressions

If we want to switch Raspbian from using the firmware display through bcm2708_fb to using KMS, we probably need to solve the following regressions:

Absolute stability on HDMI

We've long had reports of timeouts during modesetting on particular monitors. I think Paul Kocialkowski's debugging of underflows for chamelium testing will be relevant to this.

One issue that happens at mode configuration is that the HVS will keep using its previous display list (instead of the new one that is configured for the new mode) for the next frame, while the CRTC and HDMI encoder are reconfigured immediately, creating an inconsistency in the pipeline's configuration causing FIFO under/overruns. It's unclear what this is caused by, as early investigation of the HVS disable/enable sequences did not provide any clue.

Automatic 7" touchscreen DSI support.

Trying to upstream automatic DSI support ("probe the driver with DSI when the panel is attached, probe the driver minus DSI when the panel isn't attached") has been utterly stalled. I think at this point we should work on getting a downstream solution, with the firmware detecting the panel and automatically applying an additional overlay.

The only reason I haven't proposed that solution for upstream is that upstream doesn't ship any overlays in tree yet, after all these years. I think Alexander Graf's solution of downstream DTs (so you get overlays) and upstream kernel is probably the way to go for otherwise-upstream distros.

config.txt equivalents

There are config.txt options that are really important to people, and we should make sure that they have equivalents in vc4. Optionally, once we do, the fw should pick up these options and pass them to vc4. Some that are on my mind:

underscan
hdmi_force_hotplug
hdmi_ignore_hotplug
display_hdmi_rotate
hdmi_drive
hdmi_boost
cec_osd_name
hdmi_ignore_cec_init
hdmi_ignore_edid_audio
hdmi_edid_file

DPI panel support

Right now people can use config.txt to smash the DPI register to configure arbitrary panels. We have one DPI panel upstream, maybe we should find the other commonly-used panels (probably just the other Adafruit one I'd seen) and add upstream panel-simple drivers for them and downstream overlays to apply them. Or, we could take the panel-dpi binding/driver from omapdrm and port it over to panel-simple so that people can define arbitrary panels in DT (this will not be accepted upstream, though).

fb performance in the console

fbcon performance in upstream is on-par with the downstream fb driver and feels just as responsive. While the latter has a custom ioctl for performing framebuffer copies with the DMA engine, it's not used by the kernel internally.

X11 with glamor uses so much more CMA

This is the toughest issue for the Pis -- x11 with glamor puts pixmaps in GPU memory (CMA) instead of scatter-gather system memory. Our CMA allocations are also not pageable, though that's not particularly relevant for Pis which are rarely configured with swap.

Some options here:

Make a new mode for glamor where pixmaps are allocated on the CPU until they're needed for DRI2/DRI3.

Back in the EXA days we tried dynamically moving pixmaps between CPU and GPU based on usage, and that was a disaster. However, my changes to promote buffers from GPU-able to GPU-dmabuf-shareable have worked nicely. Maybe being conservative and starting buffers on the CPU until they need to be scanout-capable would be usable? We could still probably use the Render glyph cache and core text glyph cache, though that increases the chance of X11 hitting a nasty out-of-memory. Expected performance changes relative to today's glamor:

CopyArea screen-to-screen (window moves) stays the same
Present (glXSwapBuffers()) stays the same
CopyArea pix-to-screen (GTK presentation) moves to a glTexSubImage(), which should hit 60fps fullscreen at the expense of CPU.
PutImage to screen stays the same
GTK pixmap drawing moves to CPU, which is fast (matches current downstream behavior)
Onscreen non-pixmap-using drawing stays glamor-accelerated (xterm)
Onscreen drawing using pixmap sources (Render) starts using the Render temporary upload path.

That last category includes xcompmgr and other non-GL compositors. Given our experience trying to deploy xcompmgr/compton, I'm not sure it's really suitable for the Pi. This may be an appropriate cost to pay.

Identify excess allocations and resolve them somehow

Maybe GL apps are over-allocating memory?

Aligning all allocations to 4kb base and 4kb size may be wasting our memory? I think I've only seen ~1000 GEM objects ever allocated in normal usage, so this is probably not the problem.

Are 4kb GEM allocations eating our CMA?

Thinking of the above, are we scattering small GEM allocations throughout CMA, fragmenting the area, when we could have allocated from general system memory? We should do a check by looking at addresses of the 4kb buffers we allocate.

Hang on to the bin BO long term.

The most common first memory allocation failure on running some large GL application (which includes X11 with glamor today) is when we try to allocate the 16MB binner BO. Right now, when you reach this point, regardless of the offending app, X11 fails to render anything so basically it looks like the system has gone down.

I've always freed the buffer because:

fullscreen apps need to allocate a couple of 8MB buffers anyway, so the failure is coming soon, and
X11 glamor rendering requires basically arbitrary allocations.

However, there's a big difference between "my app fails to render and dmesg gets spam but my system otherwise keeps working" and "X11 stops rendering anything". If X11 isn't allocating heaps of memory, and we hung on to the bin BO in the kernel, probably X11 would keep working. This would be a matter of just allocating it at boot and not freeing it in vc4_v3d.c's runtime PM.

An ongoing patch series addresses this issue by allocating the binner BO at firstopen/lastclose. This ensures two things:

the buffer is never allocated before an userspace client opens the DRM device, avoiding the allocation for fbcon usage (which doesn't need it);
the buffer is kept alive while userspace needs it, as to avoid falling short of CMA memory for allocating the buffer, which results in a hard crash.

With these changes and recent patches contributed to glamor, there is no longer any crash when running out of CMA memory for GL and glamor falls back to software rendering without interruption. However, some font rendering issues were observed when this transition takes place.

Has CMA just been broken and the problem isn't actually that bad once it's fixed?

See https://github.com/raspberrypi/linux/pull/2699 for some relevant debugging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VC4 firmware display to KMS transition regressions

Absolute stability on HDMI

Automatic 7" touchscreen DSI support.

config.txt equivalents

DPI panel support

fb performance in the console

X11 with glamor uses so much more CMA

Make a new mode for glamor where pixmaps are allocated on the CPU until they're needed for DRI2/DRI3.

Identify excess allocations and resolve them somehow

Are 4kb GEM allocations eating our CMA?

Hang on to the bin BO long term.

Has CMA just been broken and the problem isn't actually that bad once it's fixed?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally