idpf-linux: block changing ring params while af_xdp is active #25

michalQb · 2024-07-18T15:45:25Z

Changing ring parameters, especially ring size, should not be modified while AF_XDP socket is assigned to any Rx ring.

Implement a function for checking all Rx queues for AF_XDP socket assign and block changing queue parameters if at least one Rx queue has AF_XDP socket.

Make dev->priv_flags `u32` back and define bits higher than 31 as bitfield booleans as per Jakub's suggestion. This simplifies code which accesses these bits with no optimization loss (testb both before/after), allows to not extend &netdev_priv_flags each time, but also scales better as bits > 63 in the future would only add a new u64 to the structure with no complications, comparing to that extending ::priv_flags would require converting it to a bitmap. Note that I picked `unsigned long :1` to not lose any potential optimizations comparing to `bool :1` etc. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

NETIF_F_NO_CSUM was removed in 3.2-rc2 by commit 34324dc ("net: remove NETIF_F_NO_CSUM feature bit") and became __UNUSED_NETIF_F_1. It's not used anywhere in the code. Remove this bit waste. It wasn't needed to rename the flag instead of removing it as netdev features are not uAPI/ABI. Ethtool passes their names and values separately with no fixed positions and the userspace Ethtool code doesn't have any hardcoded feature names/bits, so that new Ethtool will work on older kernels and vice versa. Signed-off-by: Alexander Lobakin <[email protected]>

NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <[email protected]>

"Interface can't change network namespaces" is rather an attribute, not a feature, and it can't be changed via Ethtool. Make it a "cold" private flag instead of a netdev_feature and free one more bit. Signed-off-by: Alexander Lobakin <[email protected]>

Ability to handle maximum FCoE frames of 2158 bytes can never be changed and thus more of an attribute, not a toggleable feature. Move it from netdev_features_t to "cold" priv flags (bitfield bool) and free yet another feature bit. Signed-off-by: Alexander Lobakin <[email protected]>

NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only 2 bits, open-code it and remove the definition from netdev_features.h. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

The second tagged commit introduced a UAF, as it removed restoring q_vector->vport pointers after reinitializating the structures. This is due to that all queue allocation functions are performed here with the new temporary vport structure and those functions rewrite the backpointers to the vport. Then, this new struct is freed and the pointers start leading to nowhere. But generally speaking, the current logic is very fragile. It claims to be more reliable when the system is low on memory, but in fact, it consumes two times more memory as at the moment of running this function, there are two vports allocated with their queues and vectors. Moreover, it claims to prevent the driver from running into "bad state", but in fact, any error during the rebuild leaves the old vport in the partially allocated state. Finally, if the interface is down when the function is called, it always allocates a new queue set, but when the user decides to enable the interface later on, vport_open() allocates them once again, IOW there's a clear memory leak here. There's now oneliner way to fix this all. Instead, rewrite the function from scratch without playing with two vports and memcpy()s. Just perform everything on the current structure and do a minimum set of stuff needed to rebuild the vport. Don't allocate the queues at all, as vport_open(), no matter if it will be called here or during the next ifup, will do that for us. Fixes: 02cbfba ("idpf: add ethtool callbacks") Fixes: e4891e4 ("idpf: split &idpf_queue into 4 strictly-typed queue structures") Signed-off-by: Alexander Lobakin <[email protected]>

The initialization of vport interrupt consists of two functions: 1) idpf_vport_intr_init() where a generic configuration is done 2) idpf_vport_intr_req_irq() where the irq for each q_vector is requested. The first function used to create a base name for each interrupt using "kasprintf()" call. Unfortunately, although that call allocated memory for a text buffer, that memory was never released. Fix this by removing creating the interrupt base name in 1). Instead, always create a full interrupt name in the function 2), because there is no need to create a base name separately, considering that the function 2) is never called out of idpf_vport_intr_init() context. Fixes: d4d5587 ("idpf: initialize interrupts and enable vport") Cc: [email protected] # 6.7 Signed-off-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

The second tagged commit started sometimes (very rarely, but possible) throwing WARNs from net/core/page_pool.c:page_pool_disable_direct_recycling(). Turned out idpf frees interrupt vectors with embedded NAPIs *before* freeing the queues making page_pools' NAPI pointers lead to freed memory before these pools are destroyed by libeth. It's not clear whether there are other accesses to the freed vectors when destroying the queues, but anyway, we usually free queue/interrupt vectors only when the queues are destroyed and the NAPIs are guaranteed to not be referenced anywhere. Invert the allocation and freeing logic making queue/interrupt vectors be allocated first and freed last. Vectors don't require queues to be present, so this is safe. Additionally, this change allows to remove that useless queue->q_vector pointer cleanup, as vectors are still valid when freeing the queues (+ both are freed within one function, so it's not clear why nullify the pointers at all). Fixes: 1c325aa ("idpf: configure resources for TX queues") Fixes: 90912f9 ("idpf: convert header split mode to libeth + napi_build_skb()") Reported-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Make MIPS' unroll header a generic one to have "unroll always" macro, which would work on any compiler and system, and add compiler-specific attribute macros. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i = 0; i < UNROLL_BATCH; i++) op(var, i); Not that sometimes the compilers won't unroll loops if they think that would have worse optimization and perf than with a loop, and that unroll attributes are available only starting GCC 8. In this case, you can still use unrolled_call(UNROLL_BATCH, op), which works in the range of [1...32] iterations. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi <[email protected]> # pragmas Signed-off-by: Jose E. Marchesi <[email protected]> Co-developed-by: Paul Burton <[email protected]> # unrolled_call() Signed-off-by: Paul Burton <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

Define common structures, inline helpers and Ethtool helpers to collect, update and export the statistics (RQ, SQ, XDPSQ). Use u64_stats_t right from the start, as well as the corresponding helpers to ensure tear-free operations. For the NAPI parts of both Rx and Tx, also define small onstack containers to update them in polling loops and then sync the actual containers once a loop ends. In order to implement fully generic Netlink per-queue stats callbacks, &libeth_netdev_priv is introduced and is required to be embedded at the start of the driver's netdev_priv structure. Signed-off-by: Alexander Lobakin <[email protected]>

Software-side Tx buffers for storing DMA, frame size, skb pointers etc. are pretty much generic and every driver defines them the same way. The same can be said for software Tx completions -- same napi_consume_skb()s and all that... Add a couple simple wrappers for doing that to stop repeating the old tale at least within the Intel code. Drivers are free to use 'priv' member at the end of the structure. Signed-off-by: Alexander Lobakin <[email protected]>

&idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libie helpers to do it properly and reduce the copy-paste around the Tx code. Signed-off-by: Alexander Lobakin <[email protected]>

Add a shorthand similar to other net*_subqueue() helpers for resetting the queue by its index w/o obtaining &netdev_tx_queue beforehand manually. Signed-off-by: Alexander Lobakin <[email protected]>

This patch adds a mechanism to guard against stashing partial packets into the hash table. This makes the driver more robust, leads to more efficient decision making when cleaning. Doon't stash partial packets. This can happen when an RE completion is received in flow scheduling mode, or when an out of order RS completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field to track the number of fragments/ tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TXQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the TX ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d ("idpf: add splitq start_xmit") Fixes: a5ab9ee ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Michal Kubiak <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]>

Tell hardware to write back completed descriptors even when interrupts are disabled. Otherwise, descriptors might not be written back until the hardware can flush a full cacheline of descriptors. This can cause unnecessary delays when traffic is light (or even trigger Tx queue timeout). The example scenario to reproduce the Tx timeout if the fix is not applied: - configure at least 2 Tx queues to be assigned to the same q_vector, - generate a huge Tx traffic on the first Tx queue - try to send a few packets using the second Tx queue. In such a case Tx timeout will appear on the second Tx queue because no completion descriptors are written back for that queue while interrupts are disabled due to NAPI polling. The patch is necessary to start work on the AF_XDP implementation for the idpf driver, because there may be a case where a regular LAN Tx queue and an XDP queue share the same NAPI. Fixes: c2d548c ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee ("idpf: add singleq start_xmit and napi poll") Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Co-developed-by: Michal Kubiak <[email protected]> Signed-off-by: Michal Kubiak <[email protected]>

Fully reimplement idpf's per-queue stats using the libeth infra. Embed &libeth_netdev_priv to the beginning of &idpf_netdev_priv(), call the necessary init/deinit helpers and the corresponding Ethtool helpers. Update hotpath counters such as hsplit and tso/gso using the onstack containers instead of direct accesses to queue->stats. Signed-off-by: Alexander Lobakin <[email protected]>

In lots of places, bpf_prog pointer is used only for tracing or other stuff that doesn't modify the structure itself. Same for net_device. Address at least some of them and add `const` attributes there. The object code didn't change, but that may prevent unwanted data modifications and also allow more helpers to have const arguments. Signed-off-by: Alexander Lobakin <[email protected]>

Lots of read-only helpers for &xdp_buff and &xdp_frame, such as getting the frame length, skb_shared_info etc., don't have their arguments marked with `const` for no reason. Add the missing annotations to leave less place for mistakes and more for optimization. Signed-off-by: Alexander Lobakin <[email protected]>

One may need to register memory model separately from xdp_rxq_info. One simple example may be XDP test run code, but in general, it might be useful when memory model registering is managed by one layer and then XDP RxQ info by a different one. Allow such scenarios by adding a simple helper which "attaches" an already registered memory model to the desired xdp_rxq_info. As this is mostly needed for Page Pool, add a special function to do that for a &page_pool pointer. Signed-off-by: Alexander Lobakin <[email protected]>

To make the system page pool usable as a source for allocating XDP frames, we need to register it with xdp_reg_mem_model(), so that page return works correctly. This is done in preparation for using the system page pool for the XDP live frame mode in BPF_TEST_RUN; for the same reason, make the per-cpu variable non-static so we can access it from the test_run code as well. Reviewed-by: Alexander Lobakin <[email protected]> Tested-by: Alexander Lobakin <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>

Currently, page_pool_put_page_bulk() indeed takes an array of pointers to the data, not pages, despite the name. As one side effect, when you're freeing frags from &skb_shared_info, xdp_return_frame_bulk() converts page pointers to virtual addresses and then page_pool_put_page_bulk() converts them back. Make page_pool_put_page_bulk() actually handle array of pages. Pass frags directly and use virt_to_page() when freeing xdpf->data, so that the PP core will then get the compound head and take care of the rest. Signed-off-by: Alexander Lobakin <[email protected]>

The main reason for this change was to allow mixing pages from different &page_pools within one &xdp_buff/&xdp_frame. Why not? Adjust xdp_return_frame_bulk() and page_pool_put_page_bulk(), so that they won't be tied to a particular pool. Let the latter splice the bulk when it encounters a page whichs PP is different and flush it recursively. This greatly optimizes xdp_return_frame_bulk(): no more hashtable lookups. Also make xdp_flush_frame_bulk() inline, as it's just one if + function call + one u32 read, not worth extending the call ladder. Signed-off-by: Alexander Lobakin <[email protected]>

Initially, xdp_frame::mem.id was used to search for the corresponding &page_pool to return the page correctly. However, after that struct page now contains a direct pointer to its PP, further keeping of this field makes no sense. xdp_return_frame_bulk() still uses it to do a lookup, but this is rather a leftover. Remove xdp_frame::mem and replace it with ::mem_type, as only memory type still matters and we need to know it to be able to free the frame correctly. As a cute side effect, we can now make every scalar field in &xdp_frame of 4 byte width, speeding up accesses to them. Signed-off-by: Alexander Lobakin <[email protected]>

The code piece which would attach a frag to &xdp_buff is almost identical across the drivers supporting XDP multi-buffer on Rx. Make it a generic elegant onelner. Also, I see lots of drivers calculating frags_truesize as `xdp->frame_sz * nr_frags`. I can't say this is fully correct, since frags might be backed by chunks of different sizes, especially with stuff like the header split. Even page_pool_alloc() can give you two different truesizes on two subsequent requests to allocate the same buffer size. Add a field to &skb_shared_info (unionized as there's no free slot currently on x6_64) to track the "true" truesize. It can be used later when updating an skb. Signed-off-by: Alexander Lobakin <[email protected]>

The code which builds an skb from an &xdp_buff keeps multiplying itself around the drivers with almost no changes. Let's try to stop that by adding a generic function. There's __xdp_build_skb_from_frame() already, so just convert it to take &xdp_buff instead, while making the original one a wrapper. The original one always took an already allocated skb, allow both variants here -- if no skb passed, which is expected when calling from a driver, pick one via napi_build_skb(). Signed-off-by: Alexander Lobakin <[email protected]>

When you register an XSk pool as XDP Rxq info memory model, you then need to manually attach it after the registration. Let the user combine both actions into one by just passing a pointer to the pool directly to xdp_rxq_info_reg_mem_model(), which will take care of calling xsk_pool_set_rxq_info(). This looks similar to how a &page_pool gets registered and reduce repeating driver code. Signed-off-by: Alexander Lobakin <[email protected]>

Currently, xsk_buff_add_frag() only adds a frag to the pool linked list, not doing anythig with the &xdp_buff. The drivers do that manually and the logic is the same. Make it really add an skb frag, just like xdp_buff_add_frag() does that, and freeing frags on error if needed. This allows to remove repeating code from i40e and ice and not add the same code again and again. Signed-off-by: Alexander Lobakin <[email protected]>

Same as with converting &xdp_buff to skb on Rx, the code which allocates a new skb and copies the XSk frame there is identical across the drivers, so make it generic. This includes copying all the frags if they are present in the original buff. System percpu Page Pools help here a lot: when available, allocate pages from there instead of the MM layer. This greatly improves XDP_PASS performance on XSk: instead of page_alloc() + page_free(), the net core recycles the same pages, so the only overhead left is memcpy()s. Note that the passed buff gets freed if the conversion is done w/o any error, assuming you don't need this buffer after you convert it to an skb. Signed-off-by: Alexander Lobakin <[email protected]>

A dentry leak may be caused when a lookup cookie and a cull are concurrent: P1 | P2 ----------------------------------------------------------- cachefiles_lookup_cookie cachefiles_look_up_object lookup_one_positive_unlocked // get dentry cachefiles_cull inode->i_flags |= S_KERNEL_FILE; cachefiles_open_file cachefiles_mark_inode_in_use __cachefiles_mark_inode_in_use can_use = false if (!(inode->i_flags & S_KERNEL_FILE)) can_use = true return false return false // Returns an error but doesn't put dentry After that the following WARNING will be triggered when the backend folder is umounted: ================================================================== BUG: Dentry 000000008ad87947{i=7a,n=Dx_1_1.img} still in use (1) [unmount of ext4 sda] WARNING: CPU: 4 PID: 359261 at fs/dcache.c:1767 umount_check+0x5d/0x70 CPU: 4 PID: 359261 Comm: umount Not tainted 6.6.0-dirty #25 RIP: 0010:umount_check+0x5d/0x70 Call Trace: <TASK> d_walk+0xda/0x2b0 do_one_tree+0x20/0x40 shrink_dcache_for_umount+0x2c/0x90 generic_shutdown_super+0x20/0x160 kill_block_super+0x1a/0x40 ext4_kill_sb+0x22/0x40 deactivate_locked_super+0x35/0x80 cleanup_mnt+0x104/0x160 ================================================================== Whether cachefiles_open_file() returns true or false, the reference count obtained by lookup_positive_unlocked() in cachefiles_look_up_object() should be released. Therefore release that reference count in cachefiles_look_up_object() to fix the above issue and simplify the code. Fixes: 1f08c92 ("cachefiles: Implement backing file wrangling") Cc: [email protected] Signed-off-by: Baokun Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>

Due to incorrect dev->product reporting by certain devices, null pointer dereferences occur when dev->product is empty, leading to potential system crashes. This issue was found on EXCELSIOR DL37-D05 device with Loongson-LS3A6000-7A2000-DL37 motherboard. Kernel logs: [ 56.470885] usb 4-3: new full-speed USB device number 4 using ohci-pci [ 56.671638] usb 4-3: string descriptor 0 read error: -22 [ 56.671644] usb 4-3: New USB device found, idVendor=056a, idProduct=0374, bcdDevice= 1.07 [ 56.671647] usb 4-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 56.678839] hid-generic 0003:056A:0374.0004: hiddev0,hidraw3: USB HID v1.10 Device [HID 056a:0374] on usb-0000:00:05.0-3/input0 [ 56.697719] CPU 2 Unable to handle kernel paging request at virtual address 0000000000000000, era == 90000000066e35c8, ra == ffff800004f98a80 [ 56.697732] Oops[#1]: [ 56.697734] CPU: 2 PID: 2742 Comm: (udev-worker) Tainted: G OE 6.6.0-loong64-desktop #25.00.2000.015 [ 56.697737] Hardware name: Inspur CE520L2/C09901N000000000, BIOS 2.09.00 10/11/2024 [ 56.697739] pc 90000000066e35c8 ra ffff800004f98a80 tp 9000000125478000 sp 900000012547b8a0 [ 56.697741] a0 0000000000000000 a1 ffff800004818b28 a2 0000000000000000 a3 0000000000000000 [ 56.697743] a4 900000012547b8f0 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 56.697745] t0 ffff800004818b2d t1 0000000000000000 t2 0000000000000003 t3 0000000000000005 [ 56.697747] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 56.697748] t8 0000000000000000 u0 0000000000000000 s9 0000000000000000 s0 900000011aa48028 [ 56.697750] s1 0000000000000000 s2 0000000000000000 s3 ffff800004818e80 s4 ffff800004810000 [ 56.697751] s5 90000001000b98d0 s6 ffff800004811f88 s7 ffff800005470440 s8 0000000000000000 [ 56.697753] ra: ffff800004f98a80 wacom_update_name+0xe0/0x300 [wacom] [ 56.697802] ERA: 90000000066e35c8 strstr+0x28/0x120 [ 56.697806] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 56.697816] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 56.697821] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 56.697827] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 56.697831] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 56.697835] BADV: 0000000000000000 [ 56.697836] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000) [ 56.697838] Modules linked in: wacom(+) bnep bluetooth rfkill qrtr nls_iso8859_1 nls_cp437 snd_hda_codec_conexant snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore input_leds mousedev led_class joydev deepin_netmonitor(OE) fuse nfnetlink dmi_sysfs ip_tables x_tables overlay amdgpu amdxcp drm_exec gpu_sched drm_buddy radeon drm_suballoc_helper i2c_algo_bit drm_ttm_helper r8169 ttm drm_display_helper spi_loongson_pci xhci_pci cec xhci_pci_renesas spi_loongson_core hid_generic realtek gpio_loongson_64bit [ 56.697887] Process (udev-worker) (pid: 2742, threadinfo=00000000aee0d8b4, task=00000000a9eff1f3) [ 56.697890] Stack : 0000000000000000 ffff800004817e00 0000000000000000 0000251c00000000 [ 56.697896] 0000000000000000 00000011fffffffd 0000000000000000 0000000000000000 [ 56.697901] 0000000000000000 1b67a968695184b9 0000000000000000 90000001000b98d0 [ 56.697906] 90000001000bb8d0 900000011aa48028 0000000000000000 ffff800004f9d74c [ 56.697911] 90000001000ba000 ffff800004f9ce58 0000000000000000 ffff800005470440 [ 56.697916] ffff800004811f88 90000001000b98d0 9000000100da2aa8 90000001000bb8d0 [ 56.697921] 0000000000000000 90000001000ba000 900000011aa48028 ffff800004f9d74c [ 56.697926] ffff8000054704e8 90000001000bb8b8 90000001000ba000 0000000000000000 [ 56.697931] 90000001000bb8d0 9000000006307564 9000000005e666e0 90000001752359b8 [ 56.697936] 9000000008cbe400 900000000804d000 9000000005e666e0 0000000000000000 [ 56.697941] ... [ 56.697944] Call Trace: [ 56.697945] [<90000000066e35c8>] strstr+0x28/0x120 [ 56.697950] [<ffff800004f98a80>] wacom_update_name+0xe0/0x300 [wacom] [ 56.698000] [<ffff800004f9ce58>] wacom_parse_and_register+0x338/0x900 [wacom] [ 56.698050] [<ffff800004f9d74c>] wacom_probe+0x32c/0x420 [wacom] [ 56.698099] [<9000000006307564>] hid_device_probe+0x144/0x260 [ 56.698103] [<9000000005e65d68>] really_probe+0x208/0x540 [ 56.698109] [<9000000005e661dc>] __driver_probe_device+0x13c/0x1e0 [ 56.698112] [<9000000005e66620>] driver_probe_device+0x40/0x100 [ 56.698116] [<9000000005e6680c>] __device_attach_driver+0x12c/0x180 [ 56.698119] [<9000000005e62bc8>] bus_for_each_drv+0x88/0x160 [ 56.698123] [<9000000005e66468>] __device_attach+0x108/0x260 [ 56.698126] [<9000000005e63918>] device_reprobe+0x78/0x100 [ 56.698129] [<9000000005e62a68>] bus_for_each_dev+0x88/0x160 [ 56.698132] [<9000000006304e54>] __hid_bus_driver_added+0x34/0x80 [ 56.698134] [<9000000005e62bc8>] bus_for_each_drv+0x88/0x160 [ 56.698137] [<9000000006304df0>] __hid_register_driver+0x70/0xa0 [ 56.698142] [<9000000004e10fe4>] do_one_initcall+0x104/0x320 [ 56.698146] [<9000000004f38150>] do_init_module+0x90/0x2c0 [ 56.698151] [<9000000004f3a3d8>] init_module_from_file+0xb8/0x120 [ 56.698155] [<9000000004f3a590>] idempotent_init_module+0x150/0x3a0 [ 56.698159] [<9000000004f3a890>] sys_finit_module+0xb0/0x140 [ 56.698163] [<900000000671e4e8>] do_syscall+0x88/0xc0 [ 56.698166] [<9000000004e12404>] handle_syscall+0xc4/0x160 [ 56.698171] Code: 0011958f 00150224 5800cd85 <2a00022c> 00150004 4000c180 0015022c 03400000 03400000 [ 56.698192] ---[ end trace 0000000000000000 ]--- Fixes: 09dc28a ("HID: wacom: Improve generic name generation") Reported-by: Zhenxing Chen <[email protected]> Co-developed-by: Xu Rao <[email protected]> Signed-off-by: Xu Rao <[email protected]> Signed-off-by: WangYuli <[email protected]> Link: https://patch.msgid.link/[email protected] Cc: [email protected] Signed-off-by: Benjamin Tissoires <[email protected]>

Without the change `perf `hangs up on charaster devices. On my system it's enough to run system-wide sampler for a few seconds to get the hangup: $ perf record -a -g --call-graph=dwarf $ perf report # hung `strace` shows that hangup happens on reading on a character device `/dev/dri/renderD128` $ strace -y -f -p 2780484 strace: Process 2780484 attached pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached It's call trace descends into `elfutils`: $ gdb -p 2780484 (gdb) bt #0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:25 #1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1 #2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0) at util/dso.h:537 #6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114 #7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242 #8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0, thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127, best_effort=best_effort@entry=false) at util/thread.h:152 #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939 #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2920 #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440, sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true) at util/machine.c:2970 #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440, sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198 #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0, evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127 #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127, arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255 #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540, evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334 #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0, file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367 #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245 #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324 #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224, file_path=0x105ffbf0 "perf.data") at util/session.c:1419 #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0, --Type <RET> for more, q to quit, c to continue without paging-- quit prog=prog@entry=0x7fff9df81220) at util/session.c:2132 #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220) at util/session.c:2181 #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226 #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390 #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076 torvalds#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827 torvalds#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:351 torvalds#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404 torvalds#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448 torvalds#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556 The hangup happens because nothing in` perf` or `elfutils` checks if a mapped file is easily readable. The change conservatively skips all non-regular files. Signed-off-by: Sergei Trofimovich <[email protected]> Acked-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>

If we hit an error path in GEM obj creation before msm_gem_new_handle() updates obj->resv to point to the gpuvm resv object, then obj->resv still points to &obj->_resv. In this case we don't want to decrement the refcount of the object being freed (since the refcnt is already zero). This fixes the following splat: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 9 PID: 7013 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 Modules linked in: uinput snd_seq_dummy snd_hrtimer aes_ce_ccm snd_soc_wsa884x regmap_sdw q6prm_clocks q6apm_lpass_da> qcom_pil_info i2c_hid drm_kms_helper qcom_common qcom_q6v5 phy_snps_eusb2 qcom_geni_serial drm qcom_sysmon pinctrl_s> CPU: 9 UID: 1000 PID: 7013 Comm: deqp-vk Not tainted 6.16.0-rc4-debug+ #25 PREEMPT(voluntary) Hardware name: LENOVO 83ED/LNVNB161216, BIOS NHCN53WW 08/02/2024 pstate: 6140000 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xf4/0x148 lr : refcount_warn_saturate+0xf4/0x148 sp : ffff8000a2073920 x29: ffff8000a2073920 x28: 0000000000000010 x27: 0000000000000010 x26: 0000000000000042 x25: ffff000810e09800 x24: 0000000000000010 x23: ffff8000a2073b94 x22: ffff000ddb22de00 x21: ffff000ddb22dc00 x20: ffff000ddb22ddf8 x19: ffff0008024934e0 x18: 000000000000000a x17: 0000000000000000 x16: ffff9f8c67d77340 x15: 0000000000000000 x14: 00000000ffffffff x13: 2e656572662d7265 x12: 7466612d65737520 x11: 3b776f6c66726564 x10: 00000000ffff7fff x9 : ffff9f8c67506c70 x8 : ffff9f8c69fa26f0 x7 : 00000000000bffe8 x6 : c0000000ffff7fff x5 : ffff000f53e14548 x4 : ffff6082ea2b2000 x3 : ffff0008b86ab080 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0008b86ab080 Call trace: refcount_warn_saturate+0xf4/0x148 (P) msm_gem_free_object+0x248/0x260 [msm] drm_gem_object_free+0x24/0x40 [drm] msm_gem_new+0x1c4/0x1e0 [msm] msm_gem_new_handle+0x3c/0x1a0 [msm] msm_ioctl_gem_new+0x38/0x70 [msm] drm_ioctl_kernel+0xc8/0x138 [drm] drm_ioctl+0x2c8/0x618 [drm] __arm64_sys_ioctl+0xac/0x108 invoke_syscall.constprop.0+0x64/0xe8 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc+0x24/0x38 el0_svc+0x54/0x1d8 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x19c/0x1a0 irq event stamp: 3698694 hardirqs last enabled at (3698693): [<ffff9f8c675021dc>] __up_console_sem+0x74/0x90 hardirqs last disabled at (3698694): [<ffff9f8c68ce8164>] el1_dbg+0x24/0x90 softirqs last enabled at (3697578): [<ffff9f8c6744ec5c>] handle_softirqs+0x454/0x4b0 softirqs last disabled at (3697567): [<ffff9f8c67360244>] __do_softirq+0x1c/0x28 ---[ end trace 0000000000000000 ]--- Fixes: b58e12a ("drm/msm: Add _NO_SHARE flag") Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/665355/

submit_unpin_objects() should come before we unlock the objects. This fixes the splat: WARNING: CPU: 2 PID: 2171 at drivers/gpu/drm/msm/msm_gem.h:395 msm_gem_unpin_locked+0x8c/0xd8 [msm] Modules linked in: uinput snd_seq_dummy snd_hrtimer aes_ce_ccm snd_soc_wsa884x regmap_sdw q6prm_clocks q6apm_lpass_dais q6apm_dai snd_q6dsp_common q6prm snd_q6apm qcom_pd_mapper cdc_mbim cdc_wdm cdc_ncm r8153_ecm cdc_ether usbnet sunrpc nls_ascii nls_cp437 vfat fat snd_soc_x1e80100 snd_soc_lpass_rx_macro snd_soc_lpass_tx_macro snd_soc_lpass_va_macro snd_soc_lpass_wsa_macro snd_soc_qcom_common soundwire_qcom snd_soc_lpass_macro_common snd_soc_hdmi_codec snd_soc_qcom_sdw ext4 snd_soc_core snd_compress soundwire_bus snd_pcm_dmaengine snd_seq mbcache jbd2 snd_seq_device snd_pcm pm8941_pwrkey snd_timer r8152 qcom_spmi_temp_alarm industrialio snd lenovo_yoga_slim7x ath12k mii arm_smccc_trng soundcore rng_core evdev loop panel_samsung_atna33xc20 msm ubwc_config drm_client_lib drm_gpuvm drm_exec gpu_sched drm_display_helper pmic_glink_altmode aux_hpd_bridge ucsi_glink qcom_battmgr phy_qcom_qmp_combo ps883x cec aux_bridge drm_dp_aux_bus i2c_hid_of aes_ce_blk drm_kms_helper aes_ce_cipher i2c_hid qcom_q6v5_pas ghash_ce qcom_pil_info drm sha1_ce qcom_common phy_snps_eusb2 qcom_geni_serial qcom_q6v5 qcom_sysmon pinctrl_sm8550_lpass_lpi lpasscc_sc8280xp sbsa_gwdt mdt_loader gpio_keys pmic_glink i2c_dev efivarfs autofs4 CPU: 2 UID: 1000 PID: 2171 Comm: gnome-shell Not tainted 6.16.0-rc4-debug+ #25 PREEMPT(voluntary) Hardware name: LENOVO 83ED/LNVNB161216, BIOS NHCN53WW 08/02/2024 pstate: 6140000 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : msm_gem_unpin_locked+0x8c/0xd8 [msm] lr : msm_gem_unpin_locked+0x88/0xd8 [msm] sp : ffff80009c963820 x29: ffff80009c963820 x28: ffff80009c9639f8 x27: ffff00080552a830 x26: 0000000000000000 x25: ffff0009d5655800 x24: 0000000000000000 x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000 x20: ffff000831db5480 x19: ffff000816e74400 x18: 0000000000000000 x17: 0000000000000000 x16: ffffc1396afdd720 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: ffff0008c065bc00 x11: ffff0008c065c000 x10: 0000000000000000 x9 : ffffc13945b19074 x8 : 0000000000000000 x7 : 0000000000000209 x6 : 0000000000000002 x5 : 0000000000019d01 x4 : ffff0008ba8db080 x3 : 000000000004093f x2 : ffff3ed5e727f000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: msm_gem_unpin_locked+0x8c/0xd8 [msm] (P) msm_ioctl_gem_submit+0x32c/0x1760 [msm] drm_ioctl_kernel+0xc8/0x138 [drm] drm_ioctl+0x2c8/0x618 [drm] __arm64_sys_ioctl+0xac/0x108 invoke_syscall.constprop.0+0x64/0xe8 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc+0x24/0x38 el0_svc+0x54/0x1d8 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x19c/0x1a0 irq event stamp: 2185036 hardirqs last enabled at (2185035): [<ffffc1396afeef9c>] _raw_spin_unlock_irqrestore+0x74/0x80 hardirqs last disabled at (2185036): [<ffffc1396afd8164>] el1_dbg+0x24/0x90 softirqs last enabled at (2184778): [<ffffc13969675e44>] fpsimd_restore_current_state+0x3c/0x328 softirqs last disabled at (2184776): [<ffffc13969675e14>] fpsimd_restore_current_state+0xc/0x328 ---[ end trace 0000000000000000 ]--- Fixes: 111fdd2 ("drm/msm: drm_gpuvm conversion") Signed-off-by: Rob Clark <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/665357/

…machine After GPU reset with VRAM loss, a general protection fault occurs during user queue restoration when accessing vm_bo->vm after spinlock release in amdgpu_vm_bo_reset_state_machine. The root cause is that vm_bo points to the last entry from the list_for_each_entry loop, but this becomes invalid after the spinlock is released. Accessing vm_bo->vm at this point leads to memory corruption. Crash log shows: [ 326.981811] Oops: general protection fault, probably for non-canonical address 0x4156415741e58ac8: 0000 [#1] SMP NOPTI [ 326.981820] CPU: 13 UID: 0 PID: 1035 Comm: kworker/13:3 Tainted: G E 6.16.0+ #25 PREEMPT(voluntary) [ 326.981826] Tainted: [E]=UNSIGNED_MODULE [ 326.981827] Hardware name: Gigabyte Technology Co., Ltd. X870E AORUS PRO ICE/X870E AORUS PRO ICE, BIOS F3i 12/19/2024 [ 326.981831] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [ 326.981999] RIP: 0010:amdgpu_vm_assert_locked+0x16/0x70 [amdgpu] [ 326.982094] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 45 48 8b 87 80 03 00 00 48 85 c0 74 40 <48> 8b b8 80 01 00 00 48 85 ff 74 3b 8b 05 0c b7 0e f0 85 c0 75 05 [ 326.982098] RSP: 0018:ffffaa91c2a6bc20 EFLAGS: 00010206 [ 326.982100] RAX: 4156415741e58948 RBX: ffff9e8f013e8330 RCX: 0000000000000000 [ 326.982102] RDX: 0000000000000005 RSI: 000000001d254e88 RDI: ffffffffc144814a [ 326.982104] RBP: ffffaa91c2a6bc68 R08: 0000004c21a25674 R09: 0000000000000001 [ 326.982106] R10: 0000000000000001 R11: dccaf3f2f82863fc R12: ffff9e8f013e8000 [ 326.982108] R13: ffff9e8f013e8000 R14: 0000000000000000 R15: ffff9e8f09980000 [ 326.982110] FS: 0000000000000000(0000) GS:ffff9e9e79995000(0000) knlGS:0000000000000000 [ 326.982112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 326.982114] CR2: 000055ed6c9caa80 CR3: 0000000797060000 CR4: 0000000000750ef0 [ 326.982116] PKRU: 55555554 Reviewed-by: Christian König <[email protected]> Signed-off-by: Jesse Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

alobakin and others added 30 commits July 16, 2024 15:30

netdevice: add netdev_tx_reset_subqueue() shorthand

9194f24

Add a shorthand similar to other net*_subqueue() helpers for resetting the queue by its index w/o obtaining &netdev_tx_queue beforehand manually. Signed-off-by: Alexander Lobakin <[email protected]>

alobakin force-pushed the idpf-libie-new branch 2 times, most recently from 5004eb5 to 597ed35 Compare July 29, 2024 14:44

alobakin force-pushed the idpf-libie-new branch 6 times, most recently from 02af331 to 0794f39 Compare August 12, 2024 14:11

alobakin force-pushed the idpf-libie-new branch 10 times, most recently from afb0ef4 to 448e1cc Compare August 20, 2024 13:34

alobakin force-pushed the idpf-libie-new branch 5 times, most recently from 205aad8 to 07f2c7b Compare September 3, 2024 14:35

alobakin closed this Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

idpf-linux: block changing ring params while af_xdp is active #25

idpf-linux: block changing ring params while af_xdp is active #25

Uh oh!

michalQb commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

idpf-linux: block changing ring params while af_xdp is active #25

idpf-linux: block changing ring params while af_xdp is active #25

Uh oh!

Conversation

michalQb commented Jul 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants