Skip to content

Machine unresponsive after Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP #5539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sfatula opened this issue Jul 12, 2023 · 11 comments

Comments

@sfatula
Copy link

sfatula commented Jul 12, 2023

Describe the bug

Happens every few weeks seemingly "randomly". GUI becomes non responsive, I can connect via another machine via ssh can can't reboot as that hangs. So, end up having to power off via removing power.

Steps to reproduce the behaviour

I don't rreally know how to reproduce it. I am using autofs if that matters. It just seems to happen every few weeks. Always plenty of memory available on 8gb pi.

Device (s)

Raspberry Pi 4 Mod. B

System

cat /etc/rpi-issue
Raspberry Pi reference 2022-04-04
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 27a8050c3c06e567c794620394a8c2d74262a516, stage4
vcgencmd version
Mar 17 2023 10:50:39 
Copyright (c) 2012 Broadcom
version 82f3750a65fadae9a38077e3c2e217ad158c8d54 (clean) (release) (start)
uname -a
Linux stevepi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux

Logs

From syslog:

Jul 12 12:37:40 stevepi kernel: [391018.643998] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Jul 12 12:37:40 stevepi kernel: [391018.652983] Mem abort info:
Jul 12 12:37:40 stevepi kernel: [391018.655972] ESR = 0x0000000096000045
Jul 12 12:37:40 stevepi kernel: [391018.659867] EC = 0x25: DABT (current EL), IL = 32 bits
Jul 12 12:37:40 stevepi kernel: [391018.665311] SET = 0, FnV = 0
Jul 12 12:37:40 stevepi kernel: [391018.669818] EA = 0, S1PTW = 0
Jul 12 12:37:40 stevepi kernel: [391018.673179] FSC = 0x05: level 1 translation fault
Jul 12 12:37:40 stevepi kernel: [391018.678651] Data abort info:
Jul 12 12:37:40 stevepi kernel: [391018.681671] ISV = 0, ISS = 0x00000045
Jul 12 12:37:40 stevepi kernel: [391018.685606] CM = 0, WnR = 1
Jul 12 12:37:40 stevepi kernel: [391018.688657] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001d7d4a000
Jul 12 12:37:40 stevepi kernel: [391018.695194] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Jul 12 12:37:40 stevepi systemd[1]: NAS-Data-ClawsAddresses.mount: Succeeded.
Jul 12 12:37:40 stevepi systemd[1245]: NAS-Data-ClawsAddresses.mount: Succeeded.
Jul 12 12:37:40 stevepi kernel: [391018.704047] Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP
Jul 12 12:37:40 stevepi kernel: [391018.710405] Modules linked in: md5 cmac aes_arm64 aes_generic libaes hmac nls_utf8 cifs cifs_arc4 cifs_md4 xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc rfkill overlay zstd zstd_compress hid_logitech_hidpp binfmt_misc rtc_ds1307 regmap_i2c hid_logitech_dj joydev sg v3d vc4 gpu_sched drm_shmem_helper snd_soc_hdmi_codec raspberrypi_hwmon bcm2835_codec(C) bcm2835_v4l2(C) rpivid_hevc(C) bcm2835_isp(C) drm_display_helper v4l2_mem2mem bcm2835_mmal_vchiq(C) i2c_bcm2835 videobuf2_dma_contig videobuf2_vmalloc cec drm_dma_helper videobuf2_memops i2c_brcmstb drm_kms_helper videobuf2_v4l2 snd_soc_core snd_compress videobuf2_common videodev snd_pcm_dmaengine snd_bcm2835(C) snd_pcm vc_sm_cma(C) mc snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq uio nvmem_rmem squashfs drm fuse zram zsmalloc i2c_dev
Jul 12 12:37:40 stevepi kernel: [391018.710543] drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
Jul 12 12:37:40 stevepi kernel: [391018.803203] CPU: 3 PID: 58486 Comm: kworker/3:2 Tainted: G C 6.1.21-v8+ #1642
Jul 12 12:37:40 stevepi kernel: [391018.811807] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Jul 12 12:37:40 stevepi kernel: [391018.817718] Workqueue: cifsiod smb2_reconnect_server [cifs]
Jul 12 12:37:40 stevepi kernel: [391018.823441] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jul 12 12:37:40 stevepi kernel: [391018.830481] pc : smb2_reconnect_server+0x35c/0x420 [cifs]
Jul 12 12:37:40 stevepi kernel: [391018.835993] lr : smb2_reconnect_server+0x354/0x420 [cifs]
Jul 12 12:37:40 stevepi kernel: [391018.841504] sp : ffffffc01cde3d20
Jul 12 12:37:40 stevepi kernel: [391018.844895] x29: ffffffc01cde3d20 x28: ffffffc01cde3da8 x27: ffffff815fb60628
Jul 12 12:37:40 stevepi kernel: [391018.852110] x26: 0000000000000001 x25: 0000000000000000 x24: ffffff810e1e8800
Jul 12 12:37:40 stevepi kernel: [391018.859325] x23: ffffffc01cde3db8 x22: ffffff815fb60000 x21: ffffff815fb62010
Jul 12 12:37:40 stevepi kernel: [391018.866540] x20: 0000000000000001 x19: ffffff815fb62000 x18: 33e1a6f93ac72bfa
Jul 12 12:37:40 stevepi kernel: [391018.873756] x17: 000000000000004a x16: ffffffdbf5efce18 x15: 0000000000000030
Jul 12 12:37:40 stevepi kernel: [391018.880970] x14: 000000000000002e x13: ffffff815fb60074 x12: 0000000000000049
Jul 12 12:37:40 stevepi kernel: [391018.888185] x11: ffffffdbf68ad858 x10: 0000000000000001 x9 : ffffffdbf5b20f88
Jul 12 12:37:40 stevepi kernel: [391018.895399] x8 : 0000000000000003 x7 : 000000000000000c x6 : 00000000ffffffff
Jul 12 12:37:40 stevepi kernel: [391018.902614] x5 : 0000000000000000 x4 : ffffffdb8af823b8 x3 : 0000000000000001
Jul 12 12:37:40 stevepi kernel: [391018.909828] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 12 12:37:40 stevepi kernel: [391018.917043] Call trace:
Jul 12 12:37:40 stevepi kernel: [391018.919566] smb2_reconnect_server+0x35c/0x420 [cifs]
Jul 12 12:37:40 stevepi kernel: [391018.924756] process_one_work+0x208/0x480
Jul 12 12:37:40 stevepi kernel: [391018.928851] worker_thread+0x50/0x428
Jul 12 12:37:40 stevepi kernel: [391018.932591] kthread+0xfc/0x110
Jul 12 12:37:40 stevepi kernel: [391018.935811] ret_from_fork+0x10/0x20
Jul 12 12:37:40 stevepi kernel: [391018.939467] Code: 12800000 97ffe53c 7100001f a9410662 (f9000441)
Jul 12 12:37:40 stevepi kernel: [391018.945638] ---[ end trace 0000000000000000 ]---

Additional context

No response

@pelwell
Copy link
Contributor

pelwell commented Jul 14, 2023

It looks as though the CIFS driver in 6.1 may have a stale connection problem that manifests as a crash in smb2_reconnect_server. I'm presuming that reconnection is not a common activity, so it may be very rare and always fatal, or perhaps less rare with a chance of failure each time it is called.

The latest upstream kernels have two patches that seem to address this areas, and #5542 adds them to the 6.1 kernel as back-ports. Once the auto-builds have completed (about an hour, usually) you'll be able to install it as a beta test using sudo rpi-update pulls/5542. Given the rarity of the failures I'm more interested in knowing that the addition of these patches hasn't broken CIFS in a more fundamental way.

@pelwell
Copy link
Contributor

pelwell commented Aug 5, 2023

Do you have any feedback on my PR (#5542)?

@sfatula
Copy link
Author

sfatula commented Aug 5, 2023

Sorry, was hospitalized and am still ill. Don't have access to a machine really at this time.

@szmarczak
Copy link

szmarczak commented Dec 21, 2023

I can reproduce this by running ffmpeg (Jellyfin).

@popcornmix
Copy link
Collaborator

I can reproduce this by running ffmpeg (Jellyfin).

What is your kernel version (i.e. uname -a)

@szmarczak
Copy link

Linux raspberrypi 6.6.8-v8+ #1709 SMP PREEMPT Thu Dec 21 12:54:25 GMT 2023 aarch64 GNU/Linux

Previously I was running 6.1 and the same - v4l2 doesn't respond after the oops kernel message.

@popcornmix
Copy link
Collaborator

Can you post your dmesg oops output to confirm it's the same issue?

@szmarczak
Copy link

szmarczak commented Dec 21, 2023

dmesg
[  153.310804] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[  153.310841] Mem abort info:
[  153.310846]   ESR = 0x0000000096000045
[  153.310852]   EC = 0x25: DABT (current EL), IL = 32 bits
[  153.310859]   SET = 0, FnV = 0
[  153.310865]   EA = 0, S1PTW = 0
[  153.310870]   FSC = 0x05: level 1 translation fault
[  153.310877] Data abort info:
[  153.310881]   ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
[  153.310888]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  153.310894]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  153.310901] user pgtable: 4k pages, 39-bit VAs, pgdp=000000011e6e8000
[  153.310909] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  153.310927] Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP
[  153.310935] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep overlay 8021q garp stp llc binfmt_misc brcmfmac_wcc vc4 brcmfmac snd_soc_hdmi_codec brcmutil sg cfg80211 raspberrypi_hwmon drm_display_helper hci_uart cec drm_dma_helper drm_kms_helper btbcm v3d rpivid_hevc(C) bluetooth snd_soc_core bcm2835_codec(C) bcm2835_isp(C) bcm2835_v4l2(C) gpu_sched bcm2835_mmal_vchiq(C) snd_compress ecdh_generic v4l2_mem2mem snd_bcm2835(C) ecc snd_pcm_dmaengine drm_shmem_helper videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 rfkill videodev i2c_brcmstb snd_pcm libaes snd_timer videobuf2_common snd mc vc_sm_cma(C) raspberrypi_gpiomem nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[  153.311108] CPU: 1 PID: 5533 Comm: ffmpeg Tainted: G         C         6.6.8-v8+ #1709
[  153.311117] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  153.311123] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  153.311130] pc : vchiq_mmal_port_enable+0xd4/0x158 [bcm2835_mmal_vchiq]
[  153.311149] lr : vchiq_mmal_port_enable+0x110/0x158 [bcm2835_mmal_vchiq]
[  153.311161] sp : ffffffc08228ba70
[  153.311164] x29: ffffffc08228ba70 x28: 0000000000000002 x27: dead000000000100
[  153.311175] x26: dead000000000122 x25: ffffff8106a40218 x24: ffffff8106a40008
[  153.311185] x23: ffffff8106a40000 x22: ffffff8106a40170 x21: 0000000000000000
[  153.311194] x20: 0000000000000000 x19: ffffff811cb52920 x18: 0000000000000011
[  153.311203] x17: 0000000100000002 x16: ffffffe8dc29bde8 x15: ffffffffffffffff
[  153.311213] x14: ffffffe8dc8d0de8 x13: ffffffc10228b7e7 x12: ffffffc08228b7ed
[  153.311222] x11: bcbcbcbcbcbcbcbc x10: 000000000000003a x9 : ffffffe8dc29be24
[  153.311232] x8 : 00000000000000aa x7 : 0000000000000002 x6 : 0000000000000003
[  153.311240] x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffffff8105a6d81c
[  153.311250] x2 : 8791a9f923610700 x1 : 0000000000000000 x0 : 0000000000000000
[  153.311259] Call trace:
[  153.311263]  vchiq_mmal_port_enable+0xd4/0x158 [bcm2835_mmal_vchiq]
[  153.311275]  bcm2835_codec_start_streaming+0x1c4/0x438 [bcm2835_codec]
[  153.311288]  vb2_start_streaming+0x74/0x170 [videobuf2_common]
[  153.311307]  vb2_core_streamon+0x120/0x1e8 [videobuf2_common]
[  153.311322]  vb2_streamon+0x24/0x80 [videobuf2_v4l2]
[  153.311337]  v4l2_m2m_streamon+0x34/0x90 [v4l2_mem2mem]
[  153.311356]  v4l2_m2m_ioctl_streamon+0x20/0x38 [v4l2_mem2mem]
[  153.311369]  v4l_streamon+0x2c/0x40 [videodev]
[  153.311419]  __video_do_ioctl+0x17c/0x3f8 [videodev]
[  153.311447]  video_usercopy+0x320/0x750 [videodev]
[  153.311474]  video_ioctl2+0x20/0x40 [videodev]
[  153.311501]  v4l2_ioctl+0x48/0x70 [videodev]
[  153.311529]  __arm64_sys_ioctl+0xb0/0xf8
[  153.311538]  invoke_syscall+0x4c/0x118
[  153.311547]  el0_svc_common.constprop.1+0x88/0xf8
[  153.311555]  do_el0_svc+0x24/0x38
[  153.311561]  el0_svc+0x50/0xf8
[  153.311569]  el0t_64_sync_handler+0xa0/0xc8
[  153.311576]  el0t_64_sync+0x190/0x198
[  153.311584] Code: f2fbd5bb f2fbd5ba 1400000c a9400261 (f9000420)
[  153.311591] ---[ end trace 0000000000000000 ]---
[  178.286169] bcm2835-codec bcm2835-codec: Mutex fail

@szmarczak
Copy link

v4l2-ctl --list-devices hangs with no output.

@popcornmix
Copy link
Collaborator

@szmarczak your issue doesn't look related to the one posted (which is a hang related to connecting to remote SMB drives).

Yours looks more like this issue: #4920, so probably best you comment there.

@popcornmix
Copy link
Collaborator

@sfatula have you been able to test this with an updated kernel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants