odroid-m1: kernel panic: cpu serror #36

paralin · 2022-05-20T07:07:46Z

I saw this kernel panic happen one time, have not been able to reliably reproduce it:

[ 1569.468061] systemd-journald[30]: Received client request to flush runtime journal.
[ 1569.709490] rockchip-pm-domain fdd90000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x9fe
[ 1569.710500] SError Interrupt on CPU3, code 0xbe000011 -- SError
[ 1569.710513] CPU: 3 PID: 2930 Comm: Xorg Not tainted 5.18.0-rc7 #1
[ 1569.710518] Hardware name: Hardkernel ODROID-M1 (DT)
[ 1569.710521] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1569.710526] pc : _regmap_bus_reg_write+0x20/0x30
[ 1569.710540] lr : _regmap_write+0x5c/0xb0
[ 1569.710544] sp : ffff80000aea3900
[ 1569.710545] x29: ffff80000aea3900 x28: ffff00009ac7d700 x27: 0000000000000000
[ 1569.710554] x26: ffff0001008a9938 x25: ffff0001001dc080 x24: ffff0001000f2298
[ 1569.710559] x23: 0000000000000001 x22: ffff0001008a3000 x21: 0000000080000000
[ 1569.710565] x20: 0000000000000008 x19: ffff0001008a3000 x18: ffffffffffffffff
[ 1569.710570] x17: 66203a72656c6c6f x16: 72746e6f632d7265 x15: 776f703a746e656d
[ 1569.710575] x14: 6567616e616d2d72 x13: 65663978303d6c61 x12: 76202c2775706727
[ 1569.710580] x11: ffff800009eb3388 x10: ffff800009eb3388 x9 : 00000000ffffefff
[ 1569.710585] x8 : ffff800009f0b388 x7 : 0000000000017fe8 x6 : 00000000fffff000
[ 1569.710590] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800008873cc0
[ 1569.710595] x2 : 0000000080000000 x1 : ffff80000a293008 x0 : 0000000000000000
[ 1569.710601] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1569.710604] CPU: 3 PID: 2930 Comm: Xorg Not tainted 5.18.0-rc7 #1
[ 1569.710607] Hardware name: Hardkernel ODROID-M1 (DT)
[ 1569.710609] Call trace:
[ 1569.710612]  dump_backtrace.part.0+0xc8/0xe0
[ 1569.710621]  show_stack+0x18/0x70
[ 1569.710625]  dump_stack_lvl+0x68/0x84
[ 1569.710632]  dump_stack+0x18/0x34
[ 1569.710636]  panic+0x168/0x328
[ 1569.710639]  nmi_panic+0x88/0x90
[ 1569.710643]  arm64_serror_panic+0x6c/0x80
[ 1569.710647]  arm64_is_fatal_ras_serror+0x84/0x90
[ 1569.710650]  do_serror+0x34/0x60
[ 1569.710653]  el1h_64_error_handler+0x30/0x50
[ 1569.710659]  el1h_64_error+0x64/0x68
[ 1569.710662]  _regmap_bus_reg_write+0x20/0x30
[ 1569.710667]  regmap_write+0x4c/0x80
[ 1569.710671]  rockchip_pd_power+0x220/0x2d0
[ 1569.710677]  rockchip_pd_power_on+0x14/0x20
[ 1569.710681]  _genpd_power_on+0xc0/0x140
[ 1569.710685]  genpd_power_on.part.0+0xa4/0x1f0
[ 1569.710689]  genpd_runtime_resume+0xe4/0x280
[ 1569.710693]  __rpm_callback+0x48/0x170
[ 1569.710698]  rpm_callback+0x6c/0x80
[ 1569.710702]  rpm_resume+0x364/0x5e0
[ 1569.710706]  __pm_runtime_resume+0x4c/0x80
[ 1569.710710]  panfrost_perfcnt_close+0x34/0xa0 [panfrost]
[ 1569.710730]  panfrost_postclose+0x1c/0x50 [panfrost]
[ 1569.710739]  drm_file_free.part.0+0x1a4/0x290 [drm]
[ 1569.710853]  drm_close_helper.isra.0+0x5c/0x70 [drm]
[ 1569.710949]  drm_release+0x68/0x110 [drm]
[ 1569.711044]  __fput+0x70/0x230
[ 1569.711050]  ____fput+0x10/0x20
[ 1569.711053]  task_work_run+0x80/0x180
[ 1569.711059]  do_notify_resume+0x1ec/0x1120
[ 1569.711066]  el0_svc+0x9c/0xb0
[ 1569.711073]  el0t_64_sync_handler+0xa4/0x130
[ 1569.711077]  el0t_64_sync+0x18c/0x190
[ 1569.711084] SMP: stopping secondary CPUs
[ 1569.711097] Kernel Offset: disabled
[ 1569.711099] CPU features: 0x100,0000100d,19801c86
[ 1569.711103] Memory Limit: 4096 MB
[ 1569.735186] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Figured this might be worth reporting.

Linux m1 5.18.0-rc7 #1 SMP PREEMPT Thu May 19 23:15:47 PDT 2022 aarch64 GNU/Linux

Kernel odroid-5.18.y commit 035eaa6

The text was updated successfully, but these errors were encountered:

Vebryn · 2022-06-08T19:17:28Z

same here, error appear at boot with kernel 5.18.0-202205181718~jammy

I reconstruct boot partition using flash-kernel and system boot normally. So strange, was boot partition corrupted ?

paralin · 2022-07-19T23:24:40Z

Happening again with a completely brand new SD card:

[   73.812101] rockchip-pm-domain fdd90000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x1fe
[   73.813091] SError Interrupt on CPU1, code 0xbe000011 -- SError
[   73.813104] CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 5.18.12 #1
[   73.813110] Hardware name: Hardkernel ODROID-M1 (DT)
[   73.813113] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   73.813118] pc : _raw_spin_unlock_irqrestore+0x18/0x50
[   73.813130] lr : regmap_unlock_spinlock+0x14/0x20
[   73.813138] sp : ffff800009ce3720
[   73.813139] x29: ffff800009ce3720 x28: 0000000000000013 x27: 0000000000000100
[   73.813148] x26: ffff800000e04440 x25: ffff0001001ed080 x24: ffff0001000f1898
[   73.813154] x23: 0000000000000001 x22: 0000000000000000 x21: 0000000080000000
[   73.813159] x20: 0000000000000000 x19: ffff0001008a2c00 x18: ffffffffffffffff
[   73.813164] x17: 66203a72656c6c6f x16: 72746e6f632d7265 x15: 0720072007200720
[   73.813170] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[   73.813175] x11: ffff8000094d8250 x10: ffff8000094d8250 x9 : 00000000ffffefff
[   73.813181] x8 : ffff800009530250 x7 : 0000000000017fe8 x6 : 00000000fffff000
[   73.813186] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800008780810
[   73.813191] x2 : 0000000000000000 x1 : ffff000010928ec0 x0 : 0000000100000001
[   73.813198] Kernel panic - not syncing: Asynchronous SError Interrupt
[   73.813201] CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 5.18.12 #1
[   73.813205] Hardware name: Hardkernel ODROID-M1 (DT)
[   73.813207] Call trace:
[   73.813208]  dump_backtrace+0xb0/0x120
[   73.813216]  show_stack+0x18/0x70
[   73.813221]  dump_stack_lvl+0x68/0x84
[   73.813226]  dump_stack+0x18/0x34
[   73.813230]  panic+0x168/0x328
[   73.813233]  nmi_panic+0x88/0x90
[   73.813237]  arm64_serror_panic+0x6c/0x80
[   73.813241]  arm64_is_fatal_ras_serror+0x84/0x90
[   73.813245]  do_serror+0x34/0x60
[   73.813248]  el1h_64_error_handler+0x30/0x50
[   73.813253]  el1h_64_error+0x64/0x68
[   73.813256]  _raw_spin_unlock_irqrestore+0x18/0x50
[   73.813259]  regmap_write+0x58/0x80
[   73.813263]  rockchip_pd_power+0x220/0x2d0
[   73.813270]  rockchip_pd_power_on+0x14/0x20
[   73.813274]  _genpd_power_on+0xc0/0x170
[   73.813278]  genpd_power_on.part.0+0xa4/0x1f0
[   73.813283]  __genpd_dev_pm_attach+0x100/0x2b0
[   73.813287]  genpd_dev_pm_attach+0x60/0x70
[   73.813291]  dev_pm_domain_attach+0x24/0x40
[   73.813297]  platform_probe+0x50/0xe0
[   73.813302]  really_probe+0x17c/0x3d0
[   73.813308]  __driver_probe_device+0x114/0x190
[   73.813312]  driver_probe_device+0x3c/0xf0
[   73.813316]  __driver_attach+0xcc/0x1e0
[   73.813321]  bus_for_each_dev+0x70/0xd0
[   73.813325]  driver_attach+0x24/0x30
[   73.813329]  bus_add_driver+0x144/0x230
[   73.813333]  driver_register+0x78/0x130
[   73.813337]  __platform_driver_register+0x28/0x40
[   73.813340]  panfrost_driver_init+0x20/0x1000 [panfrost]
[   73.813369]  do_one_initcall+0x50/0x1c0
[   73.813373]  do_init_module+0x44/0x240
[   73.813380]  load_module+0x2078/0x2930
[   73.813384]  __do_sys_finit_module+0xac/0x130
[   73.813388]  __arm64_sys_finit_module+0x24/0x30
[   73.813392]  invoke_syscall+0x48/0x120
[   73.813396]  el0_svc_common.constprop.0+0xd4/0x100
[   73.813400]  do_el0_svc+0x28/0x90
[   73.813404]  el0_svc+0x34/0xb0
[   73.813408]  el0t_64_sync_handler+0xa4/0x130
[   73.813412]  el0t_64_sync+0x18c/0x190
[   73.813418] SMP: stopping secondary CPUs
[   73.813431] Kernel Offset: disabled
[   73.813433] CPU features: 0x100,0000100d,19801c86
[   73.813436] Memory Limit: 4096 MB
[   73.840551] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

[ Upstream commit 1ff3635 ] In ata_tdev_add(), the return value of transport_add_device() is not checked. As a result, it causes null-ptr-deref while removing the module, because transport_remove_device() is called to remove the device that was not added. Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0 CPU: 13 PID: 13603 Comm: rmmod Kdump: loaded Tainted: G W 6.1.0-rc3+ #36 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : device_del+0x48/0x3a0 lr : device_del+0x44/0x3a0 Call trace: device_del+0x48/0x3a0 attribute_container_class_device_del+0x28/0x40 transport_remove_classdev+0x60/0x7c attribute_container_device_trigger+0x118/0x120 transport_remove_device+0x20/0x30 ata_tdev_delete+0x24/0x50 [libata] ata_tlink_delete+0x40/0xa0 [libata] ata_tport_delete+0x2c/0x60 [libata] ata_port_detach+0x148/0x1b0 [libata] ata_pci_remove_one+0x50/0x80 [libata] ahci_remove_one+0x4c/0x8c [ahci] Fix this by checking and handling return value of transport_add_device() in ata_tdev_add(). In the error path, device_del() is called to delete the device which was added earlier in this function, and ata_tdev_free() is called to free ata_dev. Fixes: d902747 ("[libata] Add ATA transport class") Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>

[ Upstream commit bcd7026 ] By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 5fd9e45 upstream. 829 if (request->complete) { 830 spin_unlock(&priv_dev->lock); 831 usb_gadget_giveback_request(&priv_ep->endpoint, 832 request); 833 spin_lock(&priv_dev->lock); 834 } 835 836 if (request->buf == priv_dev->zlp_buf) 837 cdns3_gadget_ep_free_request(&priv_ep->endpoint, request); Driver append an additional zero packet request when queue a packet, which length mod max packet size is 0. When transfer complete, run to line 831, usb_gadget_giveback_request() will free this requestion. 836 condition is true, so cdns3_gadget_ep_free_request() free this request again. Log: [ 1920.140696][ T150] BUG: KFENCE: use-after-free read in cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.140696][ T150] [ 1920.151837][ T150] Use-after-free read at 0x000000003d1cd10b (in kfence-#36): [ 1920.159082][ T150] cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.164988][ T150] cdns3_transfer_completed+0x438/0x5f8 [cdns3] Add check at line 829, skip call usb_gadget_giveback_request() if it is additional zero length packet request. Needn't call usb_gadget_giveback_request() because it is allocated in this driver. Cc: [email protected] Fixes: 7733f6c ("usb: cdns3: Add Cadence USB3 DRD Driver") Signed-off-by: Frank Li <[email protected]> Reviewed-by: Roger Quadros <[email protected]> Acked-by: Peter Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit adf0398 ] There is a race condition between l2cap_chan_timeout() and l2cap_chan_del(). When we use l2cap_chan_del() to delete the channel, the chan->conn will be set to null. But the conn could be dereferenced again in the mutex_lock() of l2cap_chan_timeout(). As a result the null pointer dereference bug will happen. The KASAN report triggered by POC is shown below: [ 472.074580] ================================================================== [ 472.075284] BUG: KASAN: null-ptr-deref in mutex_lock+0x68/0xc0 [ 472.075308] Write of size 8 at addr 0000000000000158 by task kworker/0:0/7 [ 472.075308] [ 472.075308] CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.075308] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.075308] Workqueue: events l2cap_chan_timeout [ 472.075308] Call Trace: [ 472.075308] <TASK> [ 472.075308] dump_stack_lvl+0x137/0x1a0 [ 472.075308] print_report+0x101/0x250 [ 472.075308] ? __virt_addr_valid+0x77/0x160 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_report+0x139/0x170 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_check_range+0x2c3/0x2e0 [ 472.075308] mutex_lock+0x68/0xc0 [ 472.075308] l2cap_chan_timeout+0x181/0x300 [ 472.075308] process_one_work+0x5d2/0xe00 [ 472.075308] worker_thread+0xe1d/0x1660 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] kthread+0x2b7/0x350 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork+0x4d/0x80 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork_asm+0x11/0x20 [ 472.075308] </TASK> [ 472.075308] ================================================================== [ 472.094860] Disabling lock debugging due to kernel taint [ 472.096136] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 472.096136] #PF: supervisor write access in kernel mode [ 472.096136] #PF: error_code(0x0002) - not-present page [ 472.096136] PGD 0 P4D 0 [ 472.096136] Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI [ 472.096136] CPU: 0 PID: 7 Comm: kworker/0:0 Tainted: G B 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.096136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.096136] Workqueue: events l2cap_chan_timeout [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.096136] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.096136] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.096136] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.096136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.096136] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.096136] Call Trace: [ 472.096136] <TASK> [ 472.096136] ? __die_body+0x8d/0xe0 [ 472.096136] ? page_fault_oops+0x6b8/0x9a0 [ 472.096136] ? kernelmode_fixup_or_oops+0x20c/0x2a0 [ 472.096136] ? do_user_addr_fault+0x1027/0x1340 [ 472.096136] ? _printk+0x7a/0xa0 [ 472.096136] ? mutex_lock+0x68/0xc0 [ 472.096136] ? add_taint+0x42/0xd0 [ 472.096136] ? exc_page_fault+0x6a/0x1b0 [ 472.096136] ? asm_exc_page_fault+0x26/0x30 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] ? mutex_lock+0x88/0xc0 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] l2cap_chan_timeout+0x181/0x300 [ 472.096136] process_one_work+0x5d2/0xe00 [ 472.096136] worker_thread+0xe1d/0x1660 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] kthread+0x2b7/0x350 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork+0x4d/0x80 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork_asm+0x11/0x20 [ 472.096136] </TASK> [ 472.096136] Modules linked in: [ 472.096136] CR2: 0000000000000158 [ 472.096136] ---[ end trace 0000000000000000 ]--- [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.132932] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.132932] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.132932] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.132932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.132932] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.132932] Kernel panic - not syncing: Fatal exception [ 472.132932] Kernel Offset: disabled [ 472.132932] ---[ end Kernel panic - not syncing: Fatal exception ]--- Add a check to judge whether the conn is null in l2cap_chan_timeout() in order to mitigate the bug. Fixes: 3df91ea ("Bluetooth: Revert to mutexes from RCU list") Signed-off-by: Duoming Zhou <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 0bb2f7a upstream. When I ran the repro [0] and waited a few seconds, I observed two LOCKDEP splats: a warning immediately followed by a null-ptr-deref. [1] Reproduction Steps: 1) Mount CIFS 2) Add an iptables rule to drop incoming FIN packets for CIFS 3) Unmount CIFS 4) Unload the CIFS module 5) Remove the iptables rule At step 3), the CIFS module calls sock_release() for the underlying TCP socket, and it returns quickly. However, the socket remains in FIN_WAIT_1 because incoming FIN packets are dropped. At this point, the module's refcnt is 0 while the socket is still alive, so the following rmmod command succeeds. # ss -tan State Recv-Q Send-Q Local Address:Port Peer Address:Port FIN-WAIT-1 0 477 10.0.2.15:51062 10.0.0.137:445 # lsmod | grep cifs cifs 1159168 0 This highlights a discrepancy between the lifetime of the CIFS module and the underlying TCP socket. Even after CIFS calls sock_release() and it returns, the TCP socket does not die immediately in order to close the connection gracefully. While this is generally fine, it causes an issue with LOCKDEP because CIFS assigns a different lock class to the TCP socket's sk->sk_lock using sock_lock_init_class_and_name(). Once an incoming packet is processed for the socket or a timer fires, sk->sk_lock is acquired. Then, LOCKDEP checks the lock context in check_wait_context(), where hlock_class() is called to retrieve the lock class. However, since the module has already been unloaded, hlock_class() logs a warning and returns NULL, triggering the null-ptr-deref. If LOCKDEP is enabled, we must ensure that a module calling sock_lock_init_class_and_name() (CIFS, NFS, etc) cannot be unloaded while such a socket is still alive to prevent this issue. Let's hold the module reference in sock_lock_init_class_and_name() and release it when the socket is freed in sk_prot_free(). Note that sock_lock_init() clears sk->sk_owner for svc_create_socket() that calls sock_lock_init_class_and_name() for a listening socket, which clones a socket by sk_clone_lock() without GFP_ZERO. [0]: CIFS_SERVER="10.0.0.137" CIFS_PATH="//${CIFS_SERVER}/Users/Administrator/Desktop/CIFS_TEST" DEV="enp0s3" CRED="/root/WindowsCredential.txt" MNT=$(mktemp -d /tmp/XXXXXX) mount -t cifs ${CIFS_PATH} ${MNT} -o vers=3.0,credentials=${CRED},cache=none,echo_interval=1 iptables -A INPUT -s ${CIFS_SERVER} -j DROP for i in $(seq 10); do umount ${MNT} rmmod cifs sleep 1 done rm -r ${MNT} iptables -D INPUT -s ${CIFS_SERVER} -j DROP [1]: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 10 PID: 0 at kernel/locking/lockdep.c:234 hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Not tainted 6.14.0 #36 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:hlock_class (kernel/locking/lockdep.c:234 kernel/locking/lockdep.c:223) ... Call Trace: <IRQ> __lock_acquire (kernel/locking/lockdep.c:4853 kernel/locking/lockdep.c:5178) lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ... BUG: kernel NULL pointer dereference, address: 00000000000000c4 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Tainted: G W 6.14.0 #36 Tainted: [W]=WARN Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire (kernel/locking/lockdep.c:4852 kernel/locking/lockdep.c:5178) Code: 15 41 09 c7 41 8b 44 24 20 25 ff 1f 00 00 41 09 c7 8b 84 24 a0 00 00 00 45 89 7c 24 20 41 89 44 24 24 e8 e1 bc ff ff 4c 89 e7 <44> 0f b6 b8 c4 00 00 00 e8 d1 bc ff ff 0f b6 80 c5 00 00 00 88 44 RSP: 0018:ffa0000000468a10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ff1100010091cc38 RCX: 0000000000000027 RDX: ff1100081f09ca48 RSI: 0000000000000001 RDI: ff1100010091cc88 RBP: ff1100010091c200 R08: ff1100083fe6e228 R09: 00000000ffffbfff R10: ff1100081eca0000 R11: ff1100083fe10dc0 R12: ff1100010091cc88 R13: 0000000000000001 R14: 0000000000000000 R15: 00000000000424b1 FS: 0000000000000000(0000) GS:ff1100081f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 0000000002c4a003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853 kernel/locking/lockdep.c:5816) _raw_spin_lock_nested (kernel/locking/spinlock.c:379) tcp_v4_rcv (./include/linux/skbuff.h:1678 ./include/net/tcp.h:2547 net/ipv4/tcp_ipv4.c:2350) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1)) ip_local_deliver_finish (./include/linux/rcupdate.h:878 net/ipv4/ip_input.c:234) ip_sublist_rcv_finish (net/ipv4/ip_input.c:576) ip_list_rcv_finish (net/ipv4/ip_input.c:628) ip_list_rcv (net/ipv4/ip_input.c:670) __netif_receive_skb_list_core (net/core/dev.c:5939 net/core/dev.c:5986) netif_receive_skb_list_internal (net/core/dev.c:6040 net/core/dev.c:6129) napi_complete_done (./include/linux/list.h:37 ./include/net/gro.h:519 ./include/net/gro.h:514 net/core/dev.c:6496) e1000_clean (drivers/net/ethernet/intel/e1000/e1000_main.c:3815) __napi_poll.constprop.0 (net/core/dev.c:7191) net_rx_action (net/core/dev.c:7262 net/core/dev.c:7382) handle_softirqs (kernel/softirq.c:561) __irq_exit_rcu (kernel/softirq.c:596 kernel/softirq.c:435 kernel/softirq.c:662) irq_exit_rcu (kernel/softirq.c:680) common_interrupt (arch/x86/kernel/irq.c:280 (discriminator 14)) </IRQ> <TASK> asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693) RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:744) Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 2b 15 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 RSP: 0018:ffa00000000ffee8 EFLAGS: 00000202 RAX: 000000000000640b RBX: ff1100010091c200 RCX: 0000000000061aa4 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff812f30c5 RBP: 000000000000000a R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118) do_idle (kernel/sched/idle.c:186 kernel/sched/idle.c:325) cpu_startup_entry (kernel/sched/idle.c:422 (discriminator 1)) start_secondary (arch/x86/kernel/smpboot.c:315) common_startup_64 (arch/x86/kernel/head_64.S:421) </TASK> Modules linked in: cifs_arc4 nls_ucs2_utils cifs_md4 [last unloaded: cifs] CR2: 00000000000000c4 Fixes: ed07536 ("[PATCH] lockdep: annotate nfs/nfsd in-kernel sockets") Signed-off-by: Kuniyuki Iwashima <[email protected]> Cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

paralin changed the title ~~odroid-m1: kernel panic with panfrost: cpu serror~~ odroid-m1: kernel panic: cpu serror Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

odroid-m1: kernel panic: cpu serror #36

odroid-m1: kernel panic: cpu serror #36

paralin commented May 20, 2022 •

edited

Loading

Vebryn commented Jun 8, 2022 •

edited

Loading

Uh oh!

paralin commented Jul 19, 2022

Uh oh!

odroid-m1: kernel panic: cpu serror #36

odroid-m1: kernel panic: cpu serror #36

Comments

paralin commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vebryn commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paralin commented Jul 19, 2022

Uh oh!

paralin commented May 20, 2022 •

edited

Loading

Vebryn commented Jun 8, 2022 •

edited

Loading