-
Notifications
You must be signed in to change notification settings - Fork 450
ppc64le fails on RUST_OPT_LEVEL_{0,1}
#155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is caused by stack overflow. We have 16K stacks, but there are functions generated with ~20K frames! The stack trace for that panic is:
Looking up the stack I see this function:
The disassembly is:
That Building with |
I can fix the CI failure at opt level 0 by increasing the stack size, do we want to do that for now? |
It would be nice to do it, yeah. We should also add a comment in Also please update |
For the record here are the functions I see with frames > 2KB:
|
New elements that reside in the clone are not released in case that the transaction is aborted. [16302.231754] ------------[ cut here ]------------ [16302.231756] WARNING: CPU: 0 PID: 100509 at net/netfilter/nf_tables_api.c:1864 nf_tables_chain_destroy+0x26/0x127 [nf_tables] [...] [16302.231882] CPU: 0 PID: 100509 Comm: nft Tainted: G W 5.19.0-rc3+ #155 [...] [16302.231887] RIP: 0010:nf_tables_chain_destroy+0x26/0x127 [nf_tables] [16302.231899] Code: f3 fe ff ff 41 55 41 54 55 53 48 8b 6f 10 48 89 fb 48 c7 c7 82 96 d9 a0 8b 55 50 48 8b 75 58 e8 de f5 92 e0 83 7d 50 00 74 09 <0f> 0b 5b 5d 41 5c 41 5d c3 4c 8b 65 00 48 8b 7d 08 49 39 fc 74 05 [...] [16302.231917] Call Trace: [16302.231919] <TASK> [16302.231921] __nf_tables_abort.cold+0x23/0x28 [nf_tables] [16302.231934] nf_tables_abort+0x30/0x50 [nf_tables] [16302.231946] nfnetlink_rcv_batch+0x41a/0x840 [nfnetlink] [16302.231952] ? __nla_validate_parse+0x48/0x190 [16302.231959] nfnetlink_rcv+0x110/0x129 [nfnetlink] [16302.231963] netlink_unicast+0x211/0x340 [16302.231969] netlink_sendmsg+0x21e/0x460 Add nft_set_pipapo_match_destroy() helper function to release the elements in the lookup tables. Stefano Brivio says: "We additionally look for elements pointers in the cloned matching data if priv->dirty is set, because that means that cloned data might point to additional elements we did not commit to the working copy yet (such as the abort path case, but perhaps not limited to it)." Fixes: 3c4287f ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
If *any* object of a certain WW mutex class is locked, lockdep will consider *all* mutexes of that class as locked. Also the lock allocation tracking code will apparently register only the address of the first mutex of a given class locked in a sequence. This has the odd consequence that if that first mutex is unlocked while other mutexes of the same class remain locked and then its memory then freed, the lock alloc tracking code will incorrectly assume that memory is freed with a held lock in there. For now, work around that for drm_exec by releasing the first grabbed object lock last. v2: - Fix a typo (Danilo Krummrich) - Reword the commit message a bit. - Add a Fixes: tag Related lock alloc tracking warning: [ 322.660067] ========================= [ 322.660070] WARNING: held lock freed! [ 322.660074] 6.5.0-rc7+ #155 Tainted: G U N [ 322.660078] ------------------------- [ 322.660081] kunit_try_catch/4981 is freeing memory ffff888112adc000-ffff888112adc3ff, with a lock still held there! [ 322.660089] ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec] [ 322.660104] 2 locks held by kunit_try_catch/4981: [ 322.660108] #0: ffffc9000343fe18 (reservation_ww_class_acquire){+.+.}-{0:0}, at: test_early_put+0x22f/0x490 [drm_exec_test] [ 322.660123] #1: ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec] [ 322.660135] stack backtrace: [ 322.660139] CPU: 7 PID: 4981 Comm: kunit_try_catch Tainted: G U N 6.5.0-rc7+ #155 [ 322.660146] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 322.660152] Call Trace: [ 322.660155] <TASK> [ 322.660158] dump_stack_lvl+0x57/0x90 [ 322.660164] debug_check_no_locks_freed+0x20b/0x2b0 [ 322.660172] slab_free_freelist_hook+0xa1/0x160 [ 322.660179] ? drm_exec_unlock_all+0x168/0x2a0 [drm_exec] [ 322.660186] __kmem_cache_free+0xb2/0x290 [ 322.660192] drm_exec_unlock_all+0x168/0x2a0 [drm_exec] [ 322.660200] drm_exec_fini+0xf/0x1c0 [drm_exec] [ 322.660206] test_early_put+0x289/0x490 [drm_exec_test] [ 322.660215] ? __pfx_test_early_put+0x10/0x10 [drm_exec_test] [ 322.660222] ? __kasan_check_byte+0xf/0x40 [ 322.660227] ? __ksize+0x63/0x140 [ 322.660233] ? drmm_add_final_kfree+0x3e/0xa0 [drm] [ 322.660289] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 322.660294] ? lockdep_hardirqs_on+0x7d/0x100 [ 322.660301] ? __pfx_kunit_try_run_case+0x10/0x10 [kunit] [ 322.660310] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit] [ 322.660319] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [ 322.660328] kthread+0x2e7/0x3c0 [ 322.660334] ? __pfx_kthread+0x10/0x10 [ 322.660339] ret_from_fork+0x2d/0x70 [ 322.660345] ? __pfx_kthread+0x10/0x10 [ 322.660349] ret_from_fork_asm+0x1b/0x30 [ 322.660358] </TASK> [ 322.660818] ok 8 test_early_put Cc: Christian König <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Danilo Krummrich <[email protected]> Cc: [email protected] Fixes: 0959321 ("drm: execution context for GEM buffers v7") Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
when using __drm_kunit_helper_alloc_drm_device() the driver may be dereferenced by device-managed resources up until the device is freed, which is typically later than the kunit-managed resource code frees it. Fix this by simply make the driver device-managed as well. In short, the sequence leading to the UAF is as follows: INIT: Code allocates a struct device as a kunit-managed resource. Code allocates a drm driver as a kunit-managed resource. Code allocates a drm device as a device-managed resource. EXIT: Kunit resource cleanup frees the drm driver Kunit resource cleanup puts the struct device, which starts a device-managed resource cleanup device-managed cleanup calls drm_dev_put() drm_dev_put() dereferences the (now freed) drm driver -> Boom. Related KASAN message: [55272.551542] ================================================================== [55272.551551] BUG: KASAN: slab-use-after-free in drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551603] Read of size 8 at addr ffff888127502828 by task kunit_try_catch/10353 [55272.551612] CPU: 4 PID: 10353 Comm: kunit_try_catch Tainted: G U N 6.5.0-rc7+ #155 [55272.551620] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [55272.551626] Call Trace: [55272.551629] <TASK> [55272.551633] dump_stack_lvl+0x57/0x90 [55272.551639] print_report+0xcf/0x630 [55272.551645] ? _raw_spin_lock_irqsave+0x5f/0x70 [55272.551652] ? drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551694] kasan_report+0xd7/0x110 [55272.551699] ? drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551742] drm_dev_put.part.0+0xd4/0xe0 [drm] [55272.551783] devres_release_all+0x15d/0x1f0 [55272.551790] ? __pfx_devres_release_all+0x10/0x10 [55272.551797] device_unbind_cleanup+0x16/0x1a0 [55272.551802] device_release_driver_internal+0x3e5/0x540 [55272.551808] ? kobject_put+0x5d/0x4b0 [55272.551814] bus_remove_device+0x1f1/0x3f0 [55272.551819] device_del+0x342/0x910 [55272.551826] ? __pfx_device_del+0x10/0x10 [55272.551830] ? lock_release+0x339/0x5e0 [55272.551836] ? kunit_remove_resource+0x128/0x290 [kunit] [55272.551845] ? __pfx_lock_release+0x10/0x10 [55272.551851] platform_device_del.part.0+0x1f/0x1e0 [55272.551856] ? _raw_spin_unlock_irqrestore+0x30/0x60 [55272.551863] kunit_remove_resource+0x195/0x290 [kunit] [55272.551871] ? _raw_spin_unlock_irqrestore+0x30/0x60 [55272.551877] kunit_cleanup+0x78/0x120 [kunit] [55272.551885] ? __kthread_parkme+0xc1/0x1f0 [55272.551891] ? __pfx_kunit_try_run_case_cleanup+0x10/0x10 [kunit] [55272.551900] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit] [55272.551909] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.551919] kthread+0x2e7/0x3c0 [55272.551924] ? __pfx_kthread+0x10/0x10 [55272.551929] ret_from_fork+0x2d/0x70 [55272.551935] ? __pfx_kthread+0x10/0x10 [55272.551940] ret_from_fork_asm+0x1b/0x30 [55272.551948] </TASK> [55272.551953] Allocated by task 10351: [55272.551956] kasan_save_stack+0x1c/0x40 [55272.551962] kasan_set_track+0x21/0x30 [55272.551966] __kasan_kmalloc+0x8b/0x90 [55272.551970] __kmalloc+0x5e/0x160 [55272.551976] kunit_kmalloc_array+0x1c/0x50 [kunit] [55272.551984] drm_exec_test_init+0xfa/0x2c0 [drm_exec_test] [55272.551991] kunit_try_run_case+0xdd/0x250 [kunit] [55272.551999] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.552008] kthread+0x2e7/0x3c0 [55272.552012] ret_from_fork+0x2d/0x70 [55272.552017] ret_from_fork_asm+0x1b/0x30 [55272.552024] Freed by task 10353: [55272.552027] kasan_save_stack+0x1c/0x40 [55272.552032] kasan_set_track+0x21/0x30 [55272.552036] kasan_save_free_info+0x27/0x40 [55272.552041] __kasan_slab_free+0x106/0x180 [55272.552046] slab_free_freelist_hook+0xb3/0x160 [55272.552051] __kmem_cache_free+0xb2/0x290 [55272.552056] kunit_remove_resource+0x195/0x290 [kunit] [55272.552064] kunit_cleanup+0x78/0x120 [kunit] [55272.552072] kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit] [55272.552080] kthread+0x2e7/0x3c0 [55272.552085] ret_from_fork+0x2d/0x70 [55272.552089] ret_from_fork_asm+0x1b/0x30 [55272.552096] The buggy address belongs to the object at ffff888127502800 which belongs to the cache kmalloc-512 of size 512 [55272.552105] The buggy address is located 40 bytes inside of freed 512-byte region [ffff888127502800, ffff888127502a00) [55272.552115] The buggy address belongs to the physical page: [55272.552119] page:00000000af6c70ff refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x127500 [55272.552127] head:00000000af6c70ff order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [55272.552133] anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [55272.552141] page_type: 0xffffffff() [55272.552145] raw: 0017ffffc0010200 ffff888100042c80 0000000000000000 dead000000000001 [55272.552152] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [55272.552157] page dumped because: kasan: bad access detected [55272.552163] Memory state around the buggy address: [55272.552167] ffff888127502700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [55272.552173] ffff888127502780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [55272.552178] >ffff888127502800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552184] ^ [55272.552187] ffff888127502880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552193] ffff888127502900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [55272.552198] ================================================================== [55272.552203] Disabling lock debugging due to kernel taint v2: - Update commit message, add Fixes: tag and Cc stable. v3: - Further commit message updates (Maxime Ripard). Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: [email protected] # v6.3+ Fixes: d987803 ("drm/tests: helpers: Allow to pass a custom drm_driver") Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Francois Dugast <[email protected]> Acked-by: Maxime Ripard <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maxime Ripard <[email protected]>
Removing the large array does not fix it -- something else is going on.
When this is fixed, switch the CI debug config back to
RUST_OPT_LEVEL_0
.The text was updated successfully, but these errors were encountered: