Skip to content

Unicam dummy buffer overrun fix #5157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2022
Merged

Conversation

naushir
Copy link
Contributor

@naushir naushir commented Aug 31, 2022

No description provided.

@naushir
Copy link
Contributor Author

naushir commented Aug 31, 2022

The second patch is mostly speculative, but does no harm... apart from using up a bit more cma memory!

@6by9
Copy link
Contributor

6by9 commented Aug 31, 2022

Both look reasonable, but it would be nice to get response from kralo on #5138 before merging.

@pelwell
Copy link
Contributor

pelwell commented Aug 31, 2022

"Circular" is misspelled in the second commit message, and the replicated /2 is a bit clunky - I prefer my buffer size vs padded size approach - but otherwise it looks OK.

@pelwell
Copy link
Contributor

pelwell commented Aug 31, 2022

Some feedback would be nice, but LGTM.

@kralo
Copy link
Contributor

kralo commented Sep 1, 2022

out of ~100 boot runs today, I got the guard word overwritten in every case, but the system never crashed.

[   40.293884] unicam fe801000.csi: Running with 2 data lanes
[   55.253116] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   55.253215] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   55.253254]  guard memcmp result 203
[   71.837750] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   71.837864] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   71.837901]  guard memcmp result 203
[   88.457060] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   88.457173] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   88.457212]  guard memcmp result 203
[  106.572008] cma: cma_release(page fffffffe006e0000, count 225)

Quite interesting, as you note it really happens by chance and not on every dummy buffer write.

I support merging this patch.

My two comments on this patch:

  • Due to a HW bug causing buffer overruns in circular buffer mode under certain
  • (not yet fully known) conditions, the dummy buffer allocation is twice as big

I suggest not to write "circular buffer", but "dummy buffer".

A circular buffer is to me a data structure that has an organized wrap-around on or after the last writable (entry/byte). This does not seem to be the case here, it is really just a dummy area that gets written to "at will" of the hardware.

  • Size of the dummy buffer. This is currently sized to fill a single line of
  • 64 Mpix resolution at 10bpp.

Your sizing of the dummy buffer currently encodes one known "maximum" case. I would suggest you program the size restrictions, so that upon driver additions or downstream changes, everyone gets a warning.

For example (pseudocode!):

#define DUMMY_BUF_SIZE		ceil(MAX_BYTESPERLINE / PAGE_SIZE)
#define DUMMY_BUF_ALLOC_SIZE	    (DUMMY_BUF_SIZE * 2)

or do a hard-wired check

	if (f->fmt.pix.bytesperline > min_bytesperline &&
	    f->fmt.pix.bytesperline <= MAX_BYTESPERLINE)
		f->fmt.pix.bytesperline = ALIGN(f->fmt.pix.bytesperline,
						BPL_ALIGNMENT);
	else
		f->fmt.pix.bytesperline = min_bytesperline;

	f->fmt.pix.sizeimage = f->fmt.pix.height * f->fmt.pix.bytesperline;

+        if (f->fmt.pix.bytesperline > sizeof(DUMMY_BUF_ALLOC_SIZE)) 
+ 	   unicam_err(dev, "bytesperline > DUMMY_BUF_ALLOC_SIZE, you risk DMA overflow");

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

Due to a HW bug causing buffer overruns in circular buffer mode under certain (not yet fully known) conditions, the dummy buffer allocation is twice as big

I suggest not to write "circular buffer", but "dummy buffer".

A circular buffer is to me a data structure that has an organized wrap-around on or after the last writable (entry/byte). This does not seem to be the case here, it is really just a dummy area that gets written to "at will" of the hardware.

It is a circular buffer in the hardware.
(In theory) once the hardware hits the end address (UNICAM_IBEA0), it wraps back to the start address (UNICAM_IBSA0).
The fact that we only the main image buffer is big enough for a complete frame and therefore doesn't wrap doesn't stop the peripheral being in circular buffer mode.
Indeed should the sensor be misconfigured and write more lines than Unicam is configured for, those lines will appear at the top of the image (try it by leaving the registers for your OV9281 the same, but define the image as only being 1280x600 or similar).

@naushir
Copy link
Contributor Author

naushir commented Sep 1, 2022

out of ~100 boot runs today, I got the guard word overwritten in every case, but the system never crashed.

[   40.293884] unicam fe801000.csi: Running with 2 data lanes
[   55.253116] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   55.253215] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   55.253254]  guard memcmp result 203
[   71.837750] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   71.837864] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   71.837901]  guard memcmp result 203
[   88.457060] unicam_process_buffer_complete: guard pattern corrupted:      ff00ff00ff00ff00ff00ff00ff00ff00
[   88.457173] unicam_process_buffer_complete: guard pattern is :            34125a5affffffffffffffffa5a57856
[   88.457212]  guard memcmp result 203
[  106.572008] cma: cma_release(page fffffffe006e0000, count 225)

Quite interesting, as you note it really happens by chance and not on every dummy buffer write.

I support merging this patch.

Thanks for testing this, and confirming the fix!

My two comments on this patch:

  • Due to a HW bug causing buffer overruns in circular buffer mode under certain
  • (not yet fully known) conditions, the dummy buffer allocation is twice as big

I suggest not to write "circular buffer", but "dummy buffer".

A circular buffer is to me a data structure that has an organized wrap-around on or after the last writable (entry/byte). This does not seem to be the case here, it is really just a dummy area that gets written to "at will" of the hardware.

This dummy buffer does act like a circular buffer to Unicam. Unicam ought to reset it's write pointer to the start of the buffer if it thinks it will write past the end address - although it clearly doesn't in certain conditions as we see here...

  • Size of the dummy buffer. This is currently sized to fill a single line of
  • 64 Mpix resolution at 10bpp.

Your sizing of the dummy buffer currently encodes one known "maximum" case. I would suggest you program the size restrictions, so that upon driver additions or downstream changes, everyone gets a warning.

For example (pseudocode!):

#define DUMMY_BUF_SIZE		ceil(MAX_BYTESPERLINE / PAGE_SIZE)
#define DUMMY_BUF_ALLOC_SIZE	    (DUMMY_BUF_SIZE * 2)

Using MAX_BYTESPERLINE for sizing the dummy buffer uses more memory from the CMA pool than we probably need for this fix. This is not such a big deal really, @6by9, what are your preferences here?

@kralo
Copy link
Contributor

kralo commented Sep 1, 2022

It is a circular buffer in the hardware. (In theory) once the hardware hits the end address (UNICAM_IBEA0), it wraps back to the start address (UNICAM_IBSA0). The fact that we only the main image buffer is big enough for a complete frame and therefore doesn't wrap doesn't stop the peripheral being in circular buffer mode. Indeed should the sensor be misconfigured and write more lines than Unicam is configured for, those lines will appear at the top of the image (try it by leaving the registers for your OV9281 the same, but define the image as only being 1280x600 or similar).

Aha! Thank you both for the explanation.

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

Using MAX_BYTESPERLINE for sizing the dummy buffer uses more memory from the CMA pool than we probably need for this fix. This is not such a big deal really, @6by9, what are your preferences here?

I'm just compiling a kernel with these mods to see how far the OV9281 is actually overwriting.

My other thought is to program the end address to be the stride, not DUMMY_BUF_SIZE, therefore it should wrap sooner.
If so then we really should set DUMMY_BUF_SIZE as MAX_BYTESPERLINE (aligned + 1 page?)

@naushir
Copy link
Contributor Author

naushir commented Sep 1, 2022

My other thought is to program the end address to be the stride, not DUMMY_BUF_SIZE, therefore it should wrap sooner. If so then we really should set DUMMY_BUF_SIZE as MAX_BYTESPERLINE (aligned + 1 page?)

This should be fine, but I would probably want stride (end address) <= DUMMY_BUF_SIZE / 2 given we do not have this characterised fully.

EDIT: Ignore that. Setting DUMMY_BUF_SIZE to MAX_BYTESPERLINE is enough.

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

Added some extra debug to log how much of the buffer gets corrupted. I've also deliberately added in a sleep in the app before calling QBUF, so we are deliberately dropping buffers.

Setting IBEA to BPL (1280 as this is ~180fps 1280x720 Y8), we get an overrun of 256 bytes. (Base buffer address dad42000, corruption starts at dad42500, and ends at dad42600).

Setting IBEA to BPL*2, we still get an overrun of 256 bytes. (Base buffer address dad42000, corruption starts at dad42a00, and ends at dad42b00).

Setting IBEA to BPL*2 and Y10, we still get an overrun of 256 bytes. (Base buffer address dad42000, corruption starts at dad42c80, and ends at dad42d80).

Requesting a stride of 1632 and Y10, still has an overrun of 256 bytes (Base buffer address dad42000, corruption starts at dad42cc0, and ends at dad42dc0).

Docs say that IBEA must be a multiple of 16. Using 2*BPL - 16 works without corruption for both 8 and 10 bit, but that feels a little nasty.
I'll have a quick test with IMX462 which runs the MIPI interface 890Mbit/s 2 lanes (just above the 800Mbit/s 2 lanes that OV9281 uses) to see what that does.

I have noted that the original script implies the use of CAM0 (address fe800000), but I'm reproducing this on CAM1 anyway.

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

FWIW IMX462 is running happily with no corruptions as RGGB10, so I guess OV9281 is doing something slightly odd on the bus.

Edit: I forgot that it was only running at 60fps, so the delay in the app needed to be increased.
When increased it also overwrites the defined bounds. Need to grab more info as the increased stride (1920x1080 sensor) confused my current logging.

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

So IMX462, 1920x1080 RAW10, bytesperline 2400, IBEA sized as BPL.
Base dc300000, corrupt from dc300960, OK from dc300a60 - again 256 bytes of corruption beyond the end pointer.

Following discussions I've just tried reducing the buffer size programmed into the hardware to 256 bytes.
Looking for changes to the guardword across the whole buffer, it is writing to a maximum of 512 bytes (256 byte overrun)
Setting a length of 0 (ie IBEA = IBSA), gives 256 bytes of altered data.
So unless we expect hitting the same block of memory repeatedly to have performance issues, either of those would appear to be an option, and remain with the buffer at PAGE_SIZE.

@pelwell
Copy link
Contributor

pelwell commented Sep 1, 2022

A length of 0 seems "purer" in some way, and is the least-magic magic number. If anything, hitting the same block repeatedly ought to be more efficient, not less.

@naushir
Copy link
Contributor Author

naushir commented Sep 1, 2022

Following discussions I've just tried reducing the buffer size programmed into the hardware to 256 bytes. Looking for changes to the guardword across the whole buffer, it is writing to a maximum of 512 bytes (256 byte overrun) Setting a length of 0 (ie IBEA = IBSA), gives 256 bytes of altered data. So unless we expect hitting the same block of memory repeatedly to have performance issues, either of those would appear to be an option, and remain with the buffer at PAGE_SIZE.

What stride value did you use for buffer size of 0?

@6by9
Copy link
Contributor

6by9 commented Sep 1, 2022

What stride value did you use for buffer size of 0?

Standard as defined by bytesperline. UNICAM_IBLS is only set in unicam_start_rx.

The Unicam hardware has been observed to cause a buffer overrun when using the
dummy buffer as a circular buffer. The conditions that cause the overrun are not
fully known, but it seems to occur when the memory bus is heavily loaded.

To avoid the overrun, program the hardware with a buffer size of 0 when using
the dummy buffer. This will cause overrun into the allocated dummy buffer, but
avoid out of bounds writes.

Signed-off-by: Naushir Patuck <[email protected]>
@naushir
Copy link
Contributor Author

naushir commented Sep 2, 2022

Latest revision sets the programmed dummy buffer size to 0, but keeps the allocation at a single page size.

@pelwell pelwell merged commit d331f45 into raspberrypi:rpi-5.15.y Sep 2, 2022
popcornmix added a commit to raspberrypi/firmware that referenced this pull request Sep 5, 2022
kernel: configs: Add LAN7430 driver on BCM2711
See: raspberrypi/linux#4117

kernel: Add a pullup/down parameter to PPS DTBO
See: raspberrypi/linux#5154

kernel: hwmon: (lm75) Add Atmel AT30TS74 support
See: raspberrypi/linux#5158

kernel: vc4-kms-v3d COB and interrupt fixes
See: raspberrypi/linux#5121

kernel: Unicam dummy buffer overrun fix
See: raspberrypi/linux#5157

kernel: ASoC:ma120x0p: Extend the volume range to -144dB (mute)
See: raspberrypi/linux#5160
popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request Sep 5, 2022
kernel: configs: Add LAN7430 driver on BCM2711
See: raspberrypi/linux#4117

kernel: Add a pullup/down parameter to PPS DTBO
See: raspberrypi/linux#5154

kernel: hwmon: (lm75) Add Atmel AT30TS74 support
See: raspberrypi/linux#5158

kernel: vc4-kms-v3d COB and interrupt fixes
See: raspberrypi/linux#5121

kernel: Unicam dummy buffer overrun fix
See: raspberrypi/linux#5157

kernel: ASoC:ma120x0p: Extend the volume range to -144dB (mute)
See: raspberrypi/linux#5160
@kralo
Copy link
Contributor

kralo commented Sep 27, 2022

This fix has been running very nicely on my systems with self-compiled kernel and those updated with rpi-update for a while now.

Is there a cadence or projected date when this will make it into the "raspberrypi-kernel".deb archive ?

@pelwell
Copy link
Contributor

pelwell commented Sep 27, 2022

The patch just missed the last stable bump. Although there's no official cadence, changes to the stable branch seem to happen roughly every month. Image releases are less frequent.

There's another camera-related fix likely in the near future, and perhaps a fix to the NVME booting, but once we've decided about those an update to stable seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants