-
Notifications
You must be signed in to change notification settings - Fork 5.2k
HQ Camera stream crashed with a kernel stack trace - vchiq_core.c:843 recycle_func+0x13c/0x4f8 #5133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Has this happened more than once? I only see one stack trace in the attached log. |
This has happened only once, just today and I haven't really experienced a similar bug earlier at all, but it will most likely come up eventually again. |
The bit of code in which the error occurred - the VCHIQ communications protocol - has been in all Raspberry Pis since day 1, largely unaltered (at least in the basic message handling), and I've never seen an error message like this before. The nature of the error smacks of memory corruption, which means that VCHIQ is likely to be an innocent victim - collateral damage - rather than the culprit. I say this because it implies that the cause of the crash will be much harder to discover, but I'd put money on it being the CSI peripheral. |
The only thing that comes to my mind that could cause a collateral bug, would be heat and overheating in general. Any ideas what are the downclocking and critical temps for Pi4B? I know there's a way to just display if it overheated in general since boot or not, but is there something that would allow me to link the overheating moment with specific temperature or at least time of throttling? Perhaps there are some traces of overheating in this dmesg log above? |
Overheating and under-voltage are possible explanations for memory corruption, but without any evidence for either I stand by my earlier comment:
by which I mean that data arriving from the camera is like water from a firehose which is being liberally sprayed by DMA into system memory, Any hiccups could easily cause image data to be written to the wrong location. Stream shutdown is the most likely scenario for a problem, which I don't think is the case here, but for me it's still the number one suspect. |
pi-crash.zip |
In addition, there is some not yet discovered issue, which causes around 1% frames drop at 120FPS mode. More details here: CTRL+F I've had RTSP stream opened on another PC on LAN network and it reports some dropped frames: But keep in mind that the above dropped frames on my screenshot, might be because of my network, using a wifi repeater that then connects to Pi via ethernet and it won't get much better than that, because of the placement of Pi being very far away from the original router and pulling RJ45 isn't viable right now. Maybe there's some command that I could use with I was thinking of testing this with |
It has just crashed once again for me around 2am, with basically the same error message as before. Same place in code where it crashed aswell: Here are fresh logs again: What makes me wonder about those crashes is this: What does the dmesg refer to when mentioning "driver imx477"? I am most likely running the most recent kernel release, also on the most modern camera stack, RPi eeprom is also up-to-date. Is the "imx477 driver" referring to the driver within the kernel source code, which is out of my reach, unless I build it myself? I think we can also dismiss thermals causing this error, since this time it happened at 02:45am, so it was most likely the coldest time of the day, while it was running perfectly fine at 70-80*C max and did not crash. (don't worry, I will soon get an additional fan, that will move the air around the heatsink, so those temps will drop) I noticed something very interesting, as somebody who used to do kernel development for Android (Xperia Z1), namely when there's a kernel panic, the entire system should lock-up unless power cycled. Since I didn't have much time in the morning, I did I might be a bit wrong about my last 2 paragraphs above, it might be due to some different issue that the camera doesn't come back up as expected after the crash, but when the system runs as normal and I do |
For several days the system hasn't crashed yet, despite running at 100FPS now, there was 0.9% frames dropped, but I think it's still a good result. I couldn't get a stream above 100fps to output any data |
I should have placed that bet. See d331f45 for the fix. |
Awesome news, I was just reading on the other issue! Good job @pelwell ! |
I just whipped the other engineers until they fixed it. We haven't yet released a kernel build including this fix, but there should be one in the next few days. |
Thank You! :D |
Yes, the fix went in on the 5th - see "Bump to 5.15.63" in this list of releases: https://github.com/raspberrypi/rpi-firmware/commits/master |
Describe the bug
I have a Pi 4B 4GB with HQ Camera hooked up via the ribbon cable
I am doing almost 24/7 recording with libcamera-vid timeout being 0
This setup works very stable with hours on end, but today the stream crashed and there was a stack trace in dmesg
Steps to reproduce the behaviour
libcamera-vid --denoise cdn_off --mode 1332:990:10 --framerate 90 --width 1332 --height 990 --level 4.2 -n -t 0 --tuning-file /usr/share/libcamera/ipa/raspberrypi/imx477.json --inline -o - | ffmpeg -i pipe: -codec copy -f rtsp rtsp://127.0.0.1:8554/mystream
which utilizes
https://github.com/aler9/rtsp-simple-server/ in order to get a valid RTSP stream out of this.
and then just wait until it crashes.
Device (s)
Raspberry Pi 4 Mod. B
System
Raspberry Pi reference 2022-04-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 226b479f8d32919c9fe36dd5b4c20c02682f8180, stage4
Aug 9 2022 13:44:40
Copyright (c) 2012 Broadcom
version 273b410636cf8854ca35af91fd738a3d5f8b39b6 (clean) (release) (start)
Linux pi 5.15.56-v7l+ #1575 SMP Fri Jul 22 20:29:46 BST 2022 armv7l GNU/Linux
Logs
messages-cropped.log
The error message is in line 717.
rtsp.log
Above is the log from the rtsp-simple-server which also provided an error message, but that was the only error message, previous lines are normal operation, until that error comes.
Additional context
I did one reboot with
sudo reboot
but I think that's the last reboot visible in the logs abovelibcamera-apps build: e1beb4512178-intree 14-08-2022 (22:37:46)
libcamera build: v0.0.0+3830-dfc6d711
If logs from
The text was updated successfully, but these errors were encountered: