Skip to content

Ciqcbr7 9 pr #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Ciqcbr7 9 pr #2

wants to merge 2 commits into from

Conversation

bmastbergen
Copy link
Owner

No description provided.

bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author minoura makoto <[email protected]>
commit b18cba0

Commit 9130b8d ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
 #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
 #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
 #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
 #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
 #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
 #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
 ctrliq#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
 ctrliq#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
 ctrliq#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.

Actual scheduing of that process might be after rpc.gssd processes the
next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.

This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.

	Signed-off-by: minoura makoto <[email protected]>
	Signed-off-by: Hiroshi Shimamoto <[email protected]>
	Tested-by: Hiroshi Shimamoto <[email protected]>
	Cc: Trond Myklebust <[email protected]>
Fixes: 9130b8d ("SUNRPC: allow for upcalls for same uid but different gss service")
	Signed-off-by: Trond Myklebust <[email protected]>
(cherry picked from commit b18cba0)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author David Howells <[email protected]>
commit ad25f5c

There's a locking issue with the per-netns list of calls in rxrpc.  The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it.  However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with.  Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list.  Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.

================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ ctrliq#10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&rxnet->call_lock);
  <Interrupt>
    lock(&rxnet->call_lock);

 *** DEADLOCK ***

1 lock held by ksoftirqd/2/25:
 #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d

Changes
=======
ver #2)
 - Changed to using list_next_rcu() rather than rcu_dereference() directly.

Fixes: 17926a7 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
	Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit ad25f5c)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author David Howells <[email protected]>
commit 75bfdbf

Implement an in-kernel rxperf server to allow kernel-based rxrpc services
to be tested directly, unlike with AFS where they're accessed by the
fileserver when the latter decides it wants to.

This is implemented as a module that, if loaded, opens UDP port 7009
(afs3-rmtsys) and listens on it for incoming calls.  Calls can be generated
using the rxperf command shipped with OpenAFS, for example.

Changes
=======
ver #2)
 - Use min_t() instead of min().

	Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
cc: Jakub Kicinski <[email protected]>
(cherry picked from commit 75bfdbf)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author David Howells <[email protected]>
commit 5e6ef4f
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-5.14.0-284.30.1.el9_2/5e6ef4f1.failed

Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.

The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets.  In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.

The call event processor becomes purely received-skb driven.  It only
transmits things in response to events.  We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data.  Timer expiry
also results in pokes.

The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.

The local event processor is removed and VERSION response packets are
generated directly from the packet parser.  Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.

Changes:
========
ver #2)
 - Fix a couple of introduced lock context imbalances.

	Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
(cherry picked from commit 5e6ef4f)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	include/trace/events/rxrpc.h
#	net/rxrpc/ar-internal.h
#	net/rxrpc/call_accept.c
#	net/rxrpc/call_event.c
#	net/rxrpc/call_object.c
#	net/rxrpc/conn_object.c
#	net/rxrpc/input.c
#	net/rxrpc/io_thread.c
#	net/rxrpc/local_event.c
#	net/rxrpc/local_object.c
#	net/rxrpc/output.c
#	net/rxrpc/peer_event.c
#	net/rxrpc/sendmsg.c
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author David Howells <[email protected]>
commit 608aecd
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
Will be included in final tarball splat. Ref for failed cherry-pick at:
ciq/ciq_backports/kernel-5.14.0-284.30.1.el9_2/608aecd1.failed

Now that rxrpc_put_local() may call kthread_stop(), it can't be called
under spinlock as it might sleep.  This can cause a problem in the peer
keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock
from the point it needs to re-add peer->keepalive_link to going round the
loop again in rxrpc_peer_keepalive_dispatch().

Fix this by just dropping the lock when we don't need it and accepting that
we'll have to take it again.  This code is only called about every 20s for
each peer, so not very often.

This allows rxrpc_put_peer_unlocked() to be removed also.

If triggered, this bug produces an oops like the following, as reproduced
by a syzbot reproducer for a different oops[1]:

BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
...
RCU nest depth: 0, expected: 0
3 locks held by kworker/u9:0/50:
 #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f
...
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 __might_resched+0x2cf/0x2f2
 __wait_for_common+0x87/0x1e8
 kthread_stop+0x14d/0x255
 rxrpc_peer_keepalive_dispatch+0x333/0x35f
 rxrpc_peer_keepalive_worker+0x2e9/0x449
 process_one_work+0x3c1/0x636
 worker_thread+0x25f/0x359
 kthread+0x1a6/0x1b5
 ret_from_fork+0x1f/0x30

Fixes: a275da6 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
	Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 608aecd)
	Signed-off-by: Jonathan Maple <[email protected]>

# Conflicts:
#	net/rxrpc/ar-internal.h
#	net/rxrpc/peer_event.c
#	net/rxrpc/peer_object.c
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Stefan Assmann <[email protected]>
commit 4e264be

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  ctrliq#7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  ctrliq#8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  ctrliq#9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 ctrliq#10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 ctrliq#11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 ctrliq#12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 ctrliq#13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 ctrliq#14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 ctrliq#15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 ctrliq#16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 ctrliq#17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 ctrliq#18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 ctrliq#19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 ctrliq#20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 ctrliq#21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 ctrliq#22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
	Reported-by: Marius Cornea <[email protected]>
	Signed-off-by: Stefan Assmann <[email protected]>
	Reviewed-by: Michal Kubiak <[email protected]>
	Tested-by: Rafal Romanowski <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 4e264be)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Michael Ellerman <[email protected]>
commit 6d65028

As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d805 ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").

DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.

The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x00007fffffffd960 in ?? ()
  #3  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  Backtrace stopped: frame did not save the PC

Alan helpfully describes some rules for correctly maintaining the CFI information:

  1) Every adjustment to the current frame address reg (ie. r1) must be
     described, and exactly at the instruction where r1 changes. Why?
     Because stack unwinding might want to access previous frames.

  2) If a function changes LR or any non-volatile register, the save
     location for those regs must be given. The CFI can be at any
     instruction after the saves up to the point that the reg is
     changed.
     (Exception: LR save should be described before a bl. not after)

  3) If asychronous unwind info is needed then restores of LR and
     non-volatile regs must also be described. The CFI can be at any
     instruction after the reg is restored up to the point where the
     save location is (potentially) trashed.

Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.

Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.

Finally, add CFI directives describing the save/restore of r2.

With the fix gdb can correctly back trace and navigate up and down the stack:

  Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb) bt
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  #2  0x0000000100015b60 in gettime ()
  #3  0x000000010000c8bc in print_long_format ()
  #4  0x000000010000d180 in print_current_files ()
  #5  0x00000001000054ac in main ()
  (gdb) up
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #5  0x00000001000054ac in main ()
  (gdb)
  Initial frame selected; you cannot go up.
  (gdb) down
  #4  0x000000010000d180 in print_current_files ()
  (gdb)
  #3  0x000000010000c8bc in print_long_format ()
  (gdb)
  #2  0x0000000100015b60 in gettime ()
  (gdb)
  #1  0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff7f804dc in __kernel_clock_gettime ()
  (gdb)

Fixes: ce7d805 ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
	Cc: [email protected] # v5.11+
	Reported-by: Alan Modra <[email protected]>
	Signed-off-by: Michael Ellerman <[email protected]>
	Reviewed-by: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 6d65028)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Eelco Chaudron <[email protected]>
commit de9df6c

Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   ctrliq#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   #3 [ffff80000a5d3120] die at ffffb70f0628234c
   #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   ctrliq#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   ctrliq#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   ctrliq#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  ctrliq#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  ctrliq#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  ctrliq#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  ctrliq#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  ctrliq#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  ctrliq#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  ctrliq#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  ctrliq#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d9 ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets")
	Signed-off-by: Eelco Chaudron <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
	Acked-by: Aaron Conole <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit de9df6c)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Ding Hui <[email protected]>
commit 5f4fa16

We do netif_napi_add() for all allocated q_vectors[], but potentially
do netif_napi_del() for part of them, then kfree q_vectors and leave
invalid pointers at dev->napi_list.

Reproducer:

  [root@host ~]# cat repro.sh
  #!/bin/bash

  pf_dbsf="0000:41:00.0"
  vf0_dbsf="0000:41:02.0"
  g_pids=()

  function do_set_numvf()
  {
      echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
      sleep $((RANDOM%3+1))
      echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
      sleep $((RANDOM%3+1))
  }

  function do_set_channel()
  {
      local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
      [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
      ifconfig $nic 192.168.18.5 netmask 255.255.255.0
      ifconfig $nic up
      ethtool -L $nic combined 1
      ethtool -L $nic combined 4
      sleep $((RANDOM%3))
  }

  function on_exit()
  {
      local pid
      for pid in "${g_pids[@]}"; do
          kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
      done
      g_pids=()
  }

  trap "on_exit; exit" EXIT

  while :; do do_set_numvf ; done &
  g_pids+=($!)
  while :; do do_set_channel ; done &
  g_pids+=($!)

  wait

Result:

[ 4093.900222] ==================================================================
[ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
[ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
[ 4093.900233]
[ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
[ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
[ 4093.900239] Call Trace:
[ 4093.900244]  dump_stack+0x71/0xab
[ 4093.900249]  print_address_description+0x6b/0x290
[ 4093.900251]  ? free_netdev+0x308/0x390
[ 4093.900252]  kasan_report+0x14a/0x2b0
[ 4093.900254]  free_netdev+0x308/0x390
[ 4093.900261]  iavf_remove+0x825/0xd20 [iavf]
[ 4093.900265]  pci_device_remove+0xa8/0x1f0
[ 4093.900268]  device_release_driver_internal+0x1c6/0x460
[ 4093.900271]  pci_stop_bus_device+0x101/0x150
[ 4093.900273]  pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900275]  pci_iov_remove_virtfn+0x187/0x420
[ 4093.900277]  ? pci_iov_add_virtfn+0xe10/0xe10
[ 4093.900278]  ? pci_get_subsys+0x90/0x90
[ 4093.900280]  sriov_disable+0xed/0x3e0
[ 4093.900282]  ? bus_find_device+0x12d/0x1a0
[ 4093.900290]  i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900298]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
[ 4093.900299]  ? pci_get_device+0x7c/0x90
[ 4093.900300]  ? pci_get_subsys+0x90/0x90
[ 4093.900306]  ? pci_vfs_assigned.part.7+0x144/0x210
[ 4093.900309]  ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900315]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900318]  sriov_numvfs_store+0x214/0x290
[ 4093.900320]  ? sriov_totalvfs_show+0x30/0x30
[ 4093.900321]  ? __mutex_lock_slowpath+0x10/0x10
[ 4093.900323]  ? __check_object_size+0x15a/0x350
[ 4093.900326]  kernfs_fop_write+0x280/0x3f0
[ 4093.900329]  vfs_write+0x145/0x440
[ 4093.900330]  ksys_write+0xab/0x160
[ 4093.900332]  ? __ia32_sys_read+0xb0/0xb0
[ 4093.900334]  ? fput_many+0x1a/0x120
[ 4093.900335]  ? filp_close+0xf0/0x130
[ 4093.900338]  do_syscall_64+0xa0/0x370
[ 4093.900339]  ? page_fault+0x8/0x30
[ 4093.900341]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900357] RIP: 0033:0x7f16ad4d22c0
[ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
[ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
[ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
[ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
[ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
[ 4093.900367]
[ 4093.900368] Allocated by task 820:
[ 4093.900371]  kasan_kmalloc+0xa6/0xd0
[ 4093.900373]  __kmalloc+0xfb/0x200
[ 4093.900376]  iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
[ 4093.900380]  iavf_watchdog_task+0x3d51/0x52c0 [iavf]
[ 4093.900382]  process_one_work+0x56a/0x11f0
[ 4093.900383]  worker_thread+0x8f/0xf40
[ 4093.900384]  kthread+0x2a0/0x390
[ 4093.900385]  ret_from_fork+0x1f/0x40
[ 4093.900387]  0xffffffffffffffff
[ 4093.900387]
[ 4093.900388] Freed by task 6699:
[ 4093.900390]  __kasan_slab_free+0x137/0x190
[ 4093.900391]  kfree+0x8b/0x1b0
[ 4093.900394]  iavf_free_q_vectors+0x11d/0x1a0 [iavf]
[ 4093.900397]  iavf_remove+0x35a/0xd20 [iavf]
[ 4093.900399]  pci_device_remove+0xa8/0x1f0
[ 4093.900400]  device_release_driver_internal+0x1c6/0x460
[ 4093.900401]  pci_stop_bus_device+0x101/0x150
[ 4093.900402]  pci_stop_and_remove_bus_device+0xe/0x20
[ 4093.900403]  pci_iov_remove_virtfn+0x187/0x420
[ 4093.900404]  sriov_disable+0xed/0x3e0
[ 4093.900409]  i40e_free_vfs+0x754/0x1210 [i40e]
[ 4093.900415]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
[ 4093.900416]  sriov_numvfs_store+0x214/0x290
[ 4093.900417]  kernfs_fop_write+0x280/0x3f0
[ 4093.900418]  vfs_write+0x145/0x440
[ 4093.900419]  ksys_write+0xab/0x160
[ 4093.900420]  do_syscall_64+0xa0/0x370
[ 4093.900421]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 4093.900422]  0xffffffffffffffff
[ 4093.900422]
[ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
                which belongs to the cache kmalloc-8k of size 8192
[ 4093.900425] The buggy address is located 5184 bytes inside of
                8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
[ 4093.900425] The buggy address belongs to the page:
[ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
[ 4093.900430] flags: 0x10000000008100(slab|head)
[ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
[ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
[ 4093.900434] page dumped because: kasan: bad access detected
[ 4093.900435]
[ 4093.900435] Memory state around the buggy address:
[ 4093.900436]  ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900437]  ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900438]                                            ^
[ 4093.900439]  ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440]  ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 4093.900440] ==================================================================

Although the patch #2 (of 2) can avoid the issue triggered by this
repro.sh, there still are other potential risks that if num_active_queues
is changed to less than allocated q_vectors[] by unexpected, the
mismatched netif_napi_add/del() can also cause UAF.

Since we actually call netif_napi_add() for all allocated q_vectors
unconditionally in iavf_alloc_q_vectors(), so we should fix it by
letting netif_napi_del() match to netif_napi_add().

Fixes: 5eae00c ("i40evf: main driver core")
	Signed-off-by: Ding Hui <[email protected]>
	Cc: Donglin Peng <[email protected]>
	Cc: Huang Cun <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
	Reviewed-by: Madhu Chittim <[email protected]>
	Reviewed-by: Leon Romanovsky <[email protected]>
	Tested-by: Rafal Romanowski <[email protected]>
	Signed-off-by: Tony Nguyen <[email protected]>
(cherry picked from commit 5f4fa16)
	Signed-off-by: Jonathan Maple <[email protected]>
bmastbergen pushed a commit that referenced this pull request Apr 25, 2025
jira LE-1907
cve CVE-2023-21102
Rebuild_History Non-Buildable kernel-5.14.0-284.30.1.el9_2
commit-author Pierre Gondois <[email protected]>
commit 0e68b55

Running a rt-kernel base on 6.2.0-rc3-rt1 on an Ampere Altra outputs
the following:
  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: kworker/u320:0
  preempt_count: 2, expected: 0
  RCU nest depth: 0, expected: 0
  3 locks held by kworker/u320:0/9:
  #0: ffff3fff8c27d128 ((wq_completion)efi_rts_wq){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
  #1: ffff80000861bdd0 ((work_completion)(&efi_rts_work.work)){+.+.}-{0:0}, at: process_one_work (./include/linux/atomic/atomic-long.h:41)
  #2: ffffdf7e1ed3e460 (efi_rt_lock){+.+.}-{3:3}, at: efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
  Preemption disabled at:
  efi_virtmap_load (./arch/arm64/include/asm/mmu_context.h:248)
  CPU: 0 PID: 9 Comm: kworker/u320:0 Tainted: G        W          6.2.0-rc3-rt1
  Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
  Workqueue: efi_rts_wq efi_call_rts
  Call trace:
  dump_backtrace (arch/arm64/kernel/stacktrace.c:158)
  show_stack (arch/arm64/kernel/stacktrace.c:165)
  dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
  dump_stack (lib/dump_stack.c:114)
  __might_resched (kernel/sched/core.c:10134)
  rt_spin_lock (kernel/locking/rtmutex.c:1769 (discriminator 4))
  efi_call_rts (drivers/firmware/efi/runtime-wrappers.c:101)
  [...]

This seems to come from commit ff7a167 ("arm64: efi: Execute
runtime services from a dedicated stack") which adds a spinlock. This
spinlock is taken through:
efi_call_rts()
\-efi_call_virt()
  \-efi_call_virt_pointer()
    \-arch_efi_call_virt_setup()

Make 'efi_rt_lock' a raw_spinlock to avoid being preempted.

[ardb: The EFI runtime services are called with a different set of
       translation tables, and are permitted to use the SIMD registers.
       The context switch code preserves/restores neither, and so EFI
       calls must be made with preemption disabled, rather than only
       disabling migration.]

Fixes: ff7a167 ("arm64: efi: Execute runtime services from a dedicated stack")
	Signed-off-by: Pierre Gondois <[email protected]>
	Cc: <[email protected]> # v6.1+
	Signed-off-by: Ard Biesheuvel <[email protected]>
(cherry picked from commit 0e68b55)
	Signed-off-by: Jonathan Maple <[email protected]>
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
…ux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.15, round #2

 - Single fix for broken usage of 'multi-MIDR' infrastructure in PI
   code, adding an open-coded erratum check for everyone's favorite pile
   of sand: Cavium ThunderX
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
[ Upstream commit a104042 ]

The ieee80211 skb control block key (set when skb was queued) could have
been removed before ieee80211_tx_dequeue() call. ieee80211_tx_dequeue()
already called ieee80211_tx_h_select_key() to get the current key, but
the latter do not update the key in skb control block in case it is
NULL. Because some drivers actually use this key in their TX callbacks
(e.g. ath1{1,2}k_mac_op_tx()) this could lead to the use after free
below:

  BUG: KASAN: slab-use-after-free in ath11k_mac_op_tx+0x590/0x61c
  Read of size 4 at addr ffffff803083c248 by task kworker/u16:4/1440

  CPU: 3 UID: 0 PID: 1440 Comm: kworker/u16:4 Not tainted 6.13.0-ge128f627f404 #2
  Hardware name: HW (DT)
  Workqueue: bat_events batadv_send_outstanding_bcast_packet
  Call trace:
   show_stack+0x14/0x1c (C)
   dump_stack_lvl+0x58/0x74
   print_report+0x164/0x4c0
   kasan_report+0xac/0xe8
   __asan_report_load4_noabort+0x1c/0x24
   ath11k_mac_op_tx+0x590/0x61c
   ieee80211_handle_wake_tx_queue+0x12c/0x1c8
   ieee80211_queue_skb+0xdcc/0x1b4c
   ieee80211_tx+0x1ec/0x2bc
   ieee80211_xmit+0x224/0x324
   __ieee80211_subif_start_xmit+0x85c/0xcf8
   ieee80211_subif_start_xmit+0xc0/0xec4
   dev_hard_start_xmit+0xf4/0x28c
   __dev_queue_xmit+0x6ac/0x318c
   batadv_send_skb_packet+0x38c/0x4b0
   batadv_send_outstanding_bcast_packet+0x110/0x328
   process_one_work+0x578/0xc10
   worker_thread+0x4bc/0xc7c
   kthread+0x2f8/0x380
   ret_from_fork+0x10/0x20

  Allocated by task 1906:
   kasan_save_stack+0x28/0x4c
   kasan_save_track+0x1c/0x40
   kasan_save_alloc_info+0x3c/0x4c
   __kasan_kmalloc+0xac/0xb0
   __kmalloc_noprof+0x1b4/0x380
   ieee80211_key_alloc+0x3c/0xb64
   ieee80211_add_key+0x1b4/0x71c
   nl80211_new_key+0x2b4/0x5d8
   genl_family_rcv_msg_doit+0x198/0x240
  <...>

  Freed by task 1494:
   kasan_save_stack+0x28/0x4c
   kasan_save_track+0x1c/0x40
   kasan_save_free_info+0x48/0x94
   __kasan_slab_free+0x48/0x60
   kfree+0xc8/0x31c
   kfree_sensitive+0x70/0x80
   ieee80211_key_free_common+0x10c/0x174
   ieee80211_free_keys+0x188/0x46c
   ieee80211_stop_mesh+0x70/0x2cc
   ieee80211_leave_mesh+0x1c/0x60
   cfg80211_leave_mesh+0xe0/0x280
   cfg80211_leave+0x1e0/0x244
  <...>

Reset SKB control block key before calling ieee80211_tx_h_select_key()
to avoid that.

Fixes: bb42f2d ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
Signed-off-by: Remi Pommarel <[email protected]>
Link: https://patch.msgid.link/06aa507b853ca385ceded81c18b0a6dd0f081bc8.1742833382.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
[ Upstream commit 752e221 ]

SMC consists of two sockets: smc_sock and kernel TCP socket.

Currently, there are two ways of creating the sockets, and syzbot reported
a lockdep splat [0] for the newer way introduced by commit d25a92c
("net/smc: Introduce IPPROTO_SMC").

  socket(AF_SMC             , SOCK_STREAM, SMCPROTO_SMC or SMCPROTO_SMC6)
  socket(AF_INET or AF_INET6, SOCK_STREAM, IPPROTO_SMC)

When a socket is allocated, sock_lock_init() sets a lockdep lock class to
sk->sk_lock.slock based on its protocol family.  In the IPPROTO_SMC case,
AF_INET or AF_INET6 lock class is assigned to smc_sock.

The repro sets IPV6_JOIN_ANYCAST for IPv6 UDP and SMC socket and exercises
smc_switch_to_fallback() for IPPROTO_SMC.

  1. smc_switch_to_fallback() is called under lock_sock() and holds
     smc->clcsock_release_lock.

      sk_lock-AF_INET6 -> &smc->clcsock_release_lock
      (sk_lock-AF_SMC)

  2. Setting IPV6_JOIN_ANYCAST to SMC holds smc->clcsock_release_lock
     and calls setsockopt() for the kernel TCP socket, which holds RTNL
     and the kernel socket's lock_sock().

      &smc->clcsock_release_lock -> rtnl_mutex (-> k-sk_lock-AF_INET6)

  3. Setting IPV6_JOIN_ANYCAST to UDP holds RTNL and lock_sock().

      rtnl_mutex -> sk_lock-AF_INET6

Then, lockdep detects a false-positive circular locking,

  .-> sk_lock-AF_INET6 -> &smc->clcsock_release_lock -> rtnl_mutex -.
  `-----------------------------------------------------------------'

but IPPROTO_SMC should have the same locking rule as AF_SMC.

      sk_lock-AF_SMC   -> &smc->clcsock_release_lock -> rtnl_mutex -> k-sk_lock-AF_INET6

Let's set the same lock class for smc_sock.

Given AF_SMC uses the same lock class for SMCPROTO_SMC and SMCPROTO_SMC6,
we do not need to separate the class for AF_INET and AF_INET6.

[0]:
WARNING: possible circular locking dependency detected
6.14.0-rc3-syzkaller-00267-gff202c5028a1 #0 Not tainted

syz.4.1528/11571 is trying to acquire lock:
ffffffff8fef8de8 (rtnl_mutex){+.+.}-{4:4}, at: ipv6_sock_ac_close+0xd9/0x110 net/ipv6/anycast.c:220

but task is already holding lock:
ffff888027f596a8 (&smc->clcsock_release_lock){+.+.}-{4:4}, at: smc_clcsock_release+0x75/0xe0 net/smc/smc_close.c:30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

 -> #2 (&smc->clcsock_release_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
       smc_switch_to_fallback+0x2d/0xa00 net/smc/af_smc.c:903
       smc_sendmsg+0x13d/0x520 net/smc/af_smc.c:2781
       sock_sendmsg_nosec net/socket.c:718 [inline]
       __sock_sendmsg net/socket.c:733 [inline]
       ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2573
       ___sys_sendmsg+0x135/0x1e0 net/socket.c:2627
       __sys_sendmsg+0x16e/0x220 net/socket.c:2659
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

 -> #1 (sk_lock-AF_INET6){+.+.}-{0:0}:
       lock_sock_nested+0x3a/0xf0 net/core/sock.c:3645
       lock_sock include/net/sock.h:1624 [inline]
       sockopt_lock_sock net/core/sock.c:1133 [inline]
       sockopt_lock_sock+0x54/0x70 net/core/sock.c:1124
       do_ipv6_setsockopt+0x2160/0x4520 net/ipv6/ipv6_sockglue.c:567
       ipv6_setsockopt+0xcb/0x170 net/ipv6/ipv6_sockglue.c:993
       udpv6_setsockopt+0x7d/0xd0 net/ipv6/udp.c:1850
       do_sock_setsockopt+0x222/0x480 net/socket.c:2303
       __sys_setsockopt+0x1a0/0x230 net/socket.c:2328
       __do_sys_setsockopt net/socket.c:2334 [inline]
       __se_sys_setsockopt net/socket.c:2331 [inline]
       __x64_sys_setsockopt+0xbd/0x160 net/socket.c:2331
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

 -> #0 (rtnl_mutex){+.+.}-{4:4}:
       check_prev_add kernel/locking/lockdep.c:3163 [inline]
       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
       validate_chain kernel/locking/lockdep.c:3906 [inline]
       __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5228
       lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5851
       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
       __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
       ipv6_sock_ac_close+0xd9/0x110 net/ipv6/anycast.c:220
       inet6_release+0x47/0x70 net/ipv6/af_inet6.c:485
       __sock_release net/socket.c:647 [inline]
       sock_release+0x8e/0x1d0 net/socket.c:675
       smc_clcsock_release+0xb7/0xe0 net/smc/smc_close.c:34
       __smc_release+0x5c2/0x880 net/smc/af_smc.c:301
       smc_release+0x1fc/0x5f0 net/smc/af_smc.c:344
       __sock_release+0xb0/0x270 net/socket.c:647
       sock_close+0x1c/0x30 net/socket.c:1398
       __fput+0x3ff/0xb70 fs/file_table.c:464
       task_work_run+0x14e/0x250 kernel/task_work.c:227
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
       exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
       syscall_exit_to_user_mode+0x27b/0x2a0 kernel/entry/common.c:218
       do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  rtnl_mutex --> sk_lock-AF_INET6 --> &smc->clcsock_release_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&smc->clcsock_release_lock);
                               lock(sk_lock-AF_INET6);
                               lock(&smc->clcsock_release_lock);
  lock(rtnl_mutex);

 *** DEADLOCK ***

2 locks held by syz.4.1528/11571:
 #0: ffff888077e88208 (&sb->s_type->i_mutex_key#10){+.+.}-{4:4}, at: inode_lock include/linux/fs.h:877 [inline]
 #0: ffff888077e88208 (&sb->s_type->i_mutex_key#10){+.+.}-{4:4}, at: __sock_release+0x86/0x270 net/socket.c:646
 #1: ffff888027f596a8 (&smc->clcsock_release_lock){+.+.}-{4:4}, at: smc_clcsock_release+0x75/0xe0 net/smc/smc_close.c:30

stack backtrace:
CPU: 0 UID: 0 PID: 11571 Comm: syz.4.1528 Not tainted 6.14.0-rc3-syzkaller-00267-gff202c5028a1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_circular_bug+0x490/0x760 kernel/locking/lockdep.c:2076
 check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2208
 check_prev_add kernel/locking/lockdep.c:3163 [inline]
 check_prevs_add kernel/locking/lockdep.c:3282 [inline]
 validate_chain kernel/locking/lockdep.c:3906 [inline]
 __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5228
 lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5851
 __mutex_lock_common kernel/locking/mutex.c:585 [inline]
 __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730
 ipv6_sock_ac_close+0xd9/0x110 net/ipv6/anycast.c:220
 inet6_release+0x47/0x70 net/ipv6/af_inet6.c:485
 __sock_release net/socket.c:647 [inline]
 sock_release+0x8e/0x1d0 net/socket.c:675
 smc_clcsock_release+0xb7/0xe0 net/smc/smc_close.c:34
 __smc_release+0x5c2/0x880 net/smc/af_smc.c:301
 smc_release+0x1fc/0x5f0 net/smc/af_smc.c:344
 __sock_release+0xb0/0x270 net/socket.c:647
 sock_close+0x1c/0x30 net/socket.c:1398
 __fput+0x3ff/0xb70 fs/file_table.c:464
 task_work_run+0x14e/0x250 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x27b/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8b4b38d169
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe4efd22d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00000000000b14a3 RCX: 00007f8b4b38d169
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007f8b4b5a7ba0 R08: 0000000000000001 R09: 000000114efd25cf
R10: 00007f8b4b200000 R11: 0000000000000246 R12: 00007f8b4b5a5fac
R13: 00007f8b4b5a5fa0 R14: ffffffffffffffff R15: 00007ffe4efd23f0
 </TASK>

Fixes: d25a92c ("net/smc: Introduce IPPROTO_SMC")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=be6f4b383534d88989f7
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
commit d54d610 upstream.

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Fixes: 6c32117 ("x86/sev: Add SNP-specific unaccepted memory support")
Tested-by: Tom Lendacky <[email protected]>
Co-developed-by: Tom Lendacky <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Cc: Dionna Amalie Glaze <[email protected]>
Cc: Kevin Loughlin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected] # discussion thread #1
Link: https://lore.kernel.org/r/[email protected] # discussion thread #2
Link: https://lore.kernel.org/r/[email protected] # final submission
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Apr 26, 2025
commit afcdf51 upstream.

Commit 7da55c2 ("drm/amd/display: Remove incorrect FP context
start") removes the FP context protection of dml2_create(), and it said
"All the DC_FP_START/END should be used before call anything from DML2".

However, dml2_init()/dml21_init() are not protected from their callers,
causing such errors:

 do_fpu invoked from kernel context![#1]:
 CPU: 0 UID: 0 PID: 239 Comm: kworker/0:5 Not tainted 6.14.0-rc6+ #2
 Workqueue: events work_for_cpu_fn
 pc ffff80000319de80 ra ffff80000319de5c tp 900000010575c000 sp 900000010575f840
 a0 0000000000000000 a1 900000012f210130 a2 900000012f000000 a3 ffff80000357e268
 a4 ffff80000357e260 a5 900000012ea52cf0 a6 0000000400000004 a7 0000012c00001388
 t0 00001900000015e0 t1 ffff80000379d000 t2 0000000010624dd3 t3 0000006400000014
 t4 00000000000003e8 t5 0000005000000018 t6 0000000000000020 t7 0000000f00000064
 t8 000000000000002f u0 5f5e9200f8901912 s9 900000012d380010 s0 900000012ea51fd8
 s1 900000012f000000 s2 9000000109296000 s3 0000000000000001 s4 0000000000001fd8
 s5 0000000000000001 s6 ffff800003415000 s7 900000012d390000 s8 ffff800003211f80
    ra: ffff80000319de5c dml21_apply_soc_bb_overrides+0x3c/0x960 [amdgpu]
   ERA: ffff80000319de80 dml21_apply_soc_bb_overrides+0x60/0x960 [amdgpu]
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 00000004 (PPLV0 +PIE -PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 000f0000 [FPD] (IS= ECode=15 EsubCode=0)
  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
 Process kworker/0:5 (pid: 239, threadinfo=00000000927eadc6, task=000000008fd31682)
 Stack : 00040dc000003164 0000000000000001 900000012f210130 900000012eabeeb8
         900000012f000000 ffff80000319fe48 900000012f210000 900000012f210130
         900000012f000000 900000012eabeeb8 0000000000000001 ffff8000031a0064
         900000010575f9f0 900000012f210130 900000012eac0000 900000012ea80000
         900000012f000000 ffff8000031cefc4 900000010575f9f0 ffff8000035859c0
         ffff800003414000 900000010575fa78 900000012f000000 ffff8000031b4c50
         0000000000000000 9000000101c9d700 9000000109c40000 5f5e9200f8901912
         900000012d3c4bd0 900000012d3c5000 ffff8000034aed18 900000012d380010
         900000012d3c4bd0 ffff800003414000 900000012d380000 ffff800002ea49dc
         0000000000000001 900000012d3c6000 00000000ffffe423 0000000000010000
         ...
 Call Trace:
 [<ffff80000319de80>] dml21_apply_soc_bb_overrides+0x60/0x960 [amdgpu]
 [<ffff80000319fe44>] dml21_init+0xa4/0x280 [amdgpu]
 [<ffff8000031a0060>] dml21_create+0x40/0x80 [amdgpu]
 [<ffff8000031cefc0>] dc_state_create+0x100/0x160 [amdgpu]
 [<ffff8000031b4c4c>] dc_create+0x44c/0x640 [amdgpu]
 [<ffff800002ea49d8>] amdgpu_dm_init+0x3f8/0x2060 [amdgpu]
 [<ffff800002ea6658>] dm_hw_init+0x18/0x60 [amdgpu]
 [<ffff800002b16738>] amdgpu_device_init+0x1938/0x27e0 [amdgpu]
 [<ffff800002b18e80>] amdgpu_driver_load_kms+0x20/0xa0 [amdgpu]
 [<ffff800002b0c8f0>] amdgpu_pci_probe+0x1b0/0x580 [amdgpu]
 [<900000000448eae4>] local_pci_probe+0x44/0xc0
 [<9000000003b02b18>] work_for_cpu_fn+0x18/0x40
 [<9000000003b05da0>] process_one_work+0x160/0x300
 [<9000000003b06718>] worker_thread+0x318/0x440
 [<9000000003b11b8c>] kthread+0x12c/0x220
 [<9000000003ac1484>] ret_from_kernel_thread+0x8/0xa4

Unfortunately, protecting dml2_init()/dml21_init() out of DML2 causes
"sleeping function called from invalid context", so protect them with
DC_FP_START() and DC_FP_END() inside.

Fixes: 7da55c2 ("drm/amd/display: Remove incorrect FP context start")
Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>
Reviewed-by: Aurabindo Pillai <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 2, 2025
JIRA: https://issues.redhat.com/browse/RHEL-74483

CVE: CVE-2024-54191

commit 7a17308
Author: Iulia Tanasescu <[email protected]>
Date:   Mon Dec 9 11:42:18 2024 +0200

    Bluetooth: iso: Fix circular lock in iso_conn_big_sync

    This fixes the circular locking dependency warning below, by reworking
    iso_sock_recvmsg, to ensure that the socket lock is always released
    before calling a function that locks hdev.

    [  561.670344] ======================================================
    [  561.670346] WARNING: possible circular locking dependency detected
    [  561.670349] 6.12.0-rc6+ ctrliq#26 Not tainted
    [  561.670351] ------------------------------------------------------
    [  561.670353] iso-tester/3289 is trying to acquire lock:
    [  561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
                   at: iso_conn_big_sync+0x73/0x260 [bluetooth]
    [  561.670405]
                   but task is already holding lock:
    [  561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
                   at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
    [  561.670450]
                   which lock already depends on the new lock.

    [  561.670452]
                   the existing dependency chain (in reverse order) is:
    [  561.670453]
                   -> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}:
    [  561.670458]        lock_acquire+0x7c/0xc0
    [  561.670463]        lock_sock_nested+0x3b/0xf0
    [  561.670467]        bt_accept_dequeue+0x1a5/0x4d0 [bluetooth]
    [  561.670510]        iso_sock_accept+0x271/0x830 [bluetooth]
    [  561.670547]        do_accept+0x3dd/0x610
    [  561.670550]        __sys_accept4+0xd8/0x170
    [  561.670553]        __x64_sys_accept+0x74/0xc0
    [  561.670556]        x64_sys_call+0x17d6/0x25f0
    [  561.670559]        do_syscall_64+0x87/0x150
    [  561.670563]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670567]
                   -> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
    [  561.670571]        lock_acquire+0x7c/0xc0
    [  561.670574]        lock_sock_nested+0x3b/0xf0
    [  561.670577]        iso_sock_listen+0x2de/0xf30 [bluetooth]
    [  561.670617]        __sys_listen_socket+0xef/0x130
    [  561.670620]        __x64_sys_listen+0xe1/0x190
    [  561.670623]        x64_sys_call+0x2517/0x25f0
    [  561.670626]        do_syscall_64+0x87/0x150
    [  561.670629]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670632]
                   -> #0 (&hdev->lock){+.+.}-{3:3}:
    [  561.670636]        __lock_acquire+0x32ad/0x6ab0
    [  561.670639]        lock_acquire.part.0+0x118/0x360
    [  561.670642]        lock_acquire+0x7c/0xc0
    [  561.670644]        __mutex_lock+0x18d/0x12f0
    [  561.670647]        mutex_lock_nested+0x1b/0x30
    [  561.670651]        iso_conn_big_sync+0x73/0x260 [bluetooth]
    [  561.670687]        iso_sock_recvmsg+0x3e9/0x500 [bluetooth]
    [  561.670722]        sock_recvmsg+0x1d5/0x240
    [  561.670725]        sock_read_iter+0x27d/0x470
    [  561.670727]        vfs_read+0x9a0/0xd30
    [  561.670731]        ksys_read+0x1a8/0x250
    [  561.670733]        __x64_sys_read+0x72/0xc0
    [  561.670736]        x64_sys_call+0x1b12/0x25f0
    [  561.670738]        do_syscall_64+0x87/0x150
    [  561.670741]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
    [  561.670744]
                   other info that might help us debug this:

    [  561.670745] Chain exists of:
    &hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH

    [  561.670751]  Possible unsafe locking scenario:

    [  561.670753]        CPU0                    CPU1
    [  561.670754]        ----                    ----
    [  561.670756]   lock(sk_lock-AF_BLUETOOTH);
    [  561.670758]                                lock(sk_lock
                                                  AF_BLUETOOTH-BTPROTO_ISO);
    [  561.670761]                                lock(sk_lock-AF_BLUETOOTH);
    [  561.670764]   lock(&hdev->lock);
    [  561.670767]
                    *** DEADLOCK ***

    Fixes: 07a9342 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
    Signed-off-by: Iulia Tanasescu <[email protected]>
    Signed-off-by: Luiz Augusto von Dentz <[email protected]>

Signed-off-by: Bastien Nocera <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 2, 2025
[BUG]
There is a bug report that a syzbot reproducer can lead to the following
busy inode at unmount time:

  BTRFS info (device loop1): last unmount of filesystem 1680000e-3c1e-4c46-84b6-56bd3909af50
  VFS: Busy inodes after unmount of loop1 (btrfs)
  ------------[ cut here ]------------
  kernel BUG at fs/super.c:650!
  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
  CPU: 0 UID: 0 PID: 48168 Comm: syz-executor Not tainted 6.15.0-rc2-00471-g119009db2674 #2 PREEMPT(full)
  Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  RIP: 0010:generic_shutdown_super+0x2e9/0x390 fs/super.c:650
  Call Trace:
   <TASK>
   kill_anon_super+0x3a/0x60 fs/super.c:1237
   btrfs_kill_super+0x3b/0x50 fs/btrfs/super.c:2099
   deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
   deactivate_super fs/super.c:506 [inline]
   deactivate_super+0xe2/0x100 fs/super.c:502
   cleanup_mnt+0x21f/0x440 fs/namespace.c:1435
   task_work_run+0x14d/0x240 kernel/task_work.c:227
   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x269/0x290 kernel/entry/common.c:218
   do_syscall_64+0xd4/0x250 arch/x86/entry/syscall_64.c:100
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   </TASK>

[CAUSE]
When btrfs_alloc_path() failed, btrfs_iget() directly returned without
releasing the inode already allocated by btrfs_iget_locked().

This results the above busy inode and trigger the kernel BUG.

[FIX]
Fix it by calling iget_failed() if btrfs_alloc_path() failed.

If we hit error inside btrfs_read_locked_inode(), it will properly call
iget_failed(), so nothing to worry about.

Although the iget_failed() cleanup inside btrfs_read_locked_inode() is a
break of the normal error handling scheme, let's fix the obvious bug
and backport first, then rework the error handling later.

Reported-by: Penglei Jiang <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Fixes: 7c855e1 ("btrfs: remove conditional path allocation in btrfs_read_locked_inode()")
CC: [email protected] # 6.13+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Penglei Jiang <[email protected]>
Signed-off-by: David Sterba <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 9, 2025
commit ab680dc upstream.

Fix deadlock in job submission and abort handling.
When a thread aborts currently executing jobs due to a fault,
it first locks the global lock protecting submitted_jobs (#1).

After the last job is destroyed, it proceeds to release the related context
and locks file_priv (#2). Meanwhile, in the job submission thread,
the file_priv lock (#2) is taken first, and then the submitted_jobs
lock (#1) is obtained when a job is added to the submitted jobs list.

       CPU0                            CPU1
       ----                    	       ----
  (for example due to a fault)         (jobs submissions keep coming)

  lock(&vdev->submitted_jobs_lock) #1
  ivpu_jobs_abort_all()
  job_destroy()
                                      lock(&file_priv->lock)           #2
                                      lock(&vdev->submitted_jobs_lock) #1
  file_priv_release()
  lock(&vdev->context_list_lock)
  lock(&file_priv->lock)           #2

This order of locking causes a deadlock. To resolve this issue,
change the order of locking in ivpu_job_submit().

Signed-off-by: Karol Wachowski <[email protected]>
Signed-off-by: Maciej Falkowski <[email protected]>
Reviewed-by: Jacek Lawrynowicz <[email protected]>
Signed-off-by: Jacek Lawrynowicz <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
[ This backport required small adjustments to ivpu_job_submit(), which
  lacks support for explicit command queue creation added in 6.15. ]
Signed-off-by: Jacek Lawrynowicz <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 10, 2025
commit 93ae6e6
Author: Lu Baolu <[email protected]>
Date: Wed Mar 19 10:21:01 2025 +0800

    iommu/vt-d: Fix possible circular locking dependency

    We have recently seen report of lockdep circular lock dependency warnings
    on platforms like Skylake and Kabylake:

     ======================================================
     WARNING: possible circular locking dependency detected
     6.14.0-rc6-CI_DRM_16276-gca2c04fe76e8+ #1 Not tainted
     ------------------------------------------------------
     swapper/0/1 is trying to acquire lock:
     ffffffff8360ee48 (iommu_probe_device_lock){+.+.}-{3:3},
       at: iommu_probe_device+0x1d/0x70

     but task is already holding lock:
     ffff888102c7efa8 (&device->physical_node_lock){+.+.}-{3:3},
       at: intel_iommu_init+0xe75/0x11f0

     which lock already depends on the new lock.

     the existing dependency chain (in reverse order) is:

     -> #6 (&device->physical_node_lock){+.+.}-{3:3}:
	    __mutex_lock+0xb4/0xe40
	    mutex_lock_nested+0x1b/0x30
	    intel_iommu_init+0xe75/0x11f0
	    pci_iommu_init+0x13/0x70
	    do_one_initcall+0x62/0x3f0
	    kernel_init_freeable+0x3da/0x6a0
	    kernel_init+0x1b/0x200
	    ret_from_fork+0x44/0x70
	    ret_from_fork_asm+0x1a/0x30

     -> #5 (dmar_global_lock){++++}-{3:3}:
	    down_read+0x43/0x1d0
	    enable_drhd_fault_handling+0x21/0x110
	    cpuhp_invoke_callback+0x4c6/0x870
	    cpuhp_issue_call+0xbf/0x1f0
	    __cpuhp_setup_state_cpuslocked+0x111/0x320
	    __cpuhp_setup_state+0xb0/0x220
	    irq_remap_enable_fault_handling+0x3f/0xa0
	    apic_intr_mode_init+0x5c/0x110
	    x86_late_time_init+0x24/0x40
	    start_kernel+0x895/0xbd0
	    x86_64_start_reservations+0x18/0x30
	    x86_64_start_kernel+0xbf/0x110
	    common_startup_64+0x13e/0x141

     -> #4 (cpuhp_state_mutex){+.+.}-{3:3}:
	    __mutex_lock+0xb4/0xe40
	    mutex_lock_nested+0x1b/0x30
	    __cpuhp_setup_state_cpuslocked+0x67/0x320
	    __cpuhp_setup_state+0xb0/0x220
	    page_alloc_init_cpuhp+0x2d/0x60
	    mm_core_init+0x18/0x2c0
	    start_kernel+0x576/0xbd0
	    x86_64_start_reservations+0x18/0x30
	    x86_64_start_kernel+0xbf/0x110
	    common_startup_64+0x13e/0x141

     -> #3 (cpu_hotplug_lock){++++}-{0:0}:
	    __cpuhp_state_add_instance+0x4f/0x220
	    iova_domain_init_rcaches+0x214/0x280
	    iommu_setup_dma_ops+0x1a4/0x710
	    iommu_device_register+0x17d/0x260
	    intel_iommu_init+0xda4/0x11f0
	    pci_iommu_init+0x13/0x70
	    do_one_initcall+0x62/0x3f0
	    kernel_init_freeable+0x3da/0x6a0
	    kernel_init+0x1b/0x200
	    ret_from_fork+0x44/0x70
	    ret_from_fork_asm+0x1a/0x30

     -> #2 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
	    __mutex_lock+0xb4/0xe40
	    mutex_lock_nested+0x1b/0x30
	    iommu_setup_dma_ops+0x16b/0x710
	    iommu_device_register+0x17d/0x260
	    intel_iommu_init+0xda4/0x11f0
	    pci_iommu_init+0x13/0x70
	    do_one_initcall+0x62/0x3f0
	    kernel_init_freeable+0x3da/0x6a0
	    kernel_init+0x1b/0x200
	    ret_from_fork+0x44/0x70
	    ret_from_fork_asm+0x1a/0x30

     -> #1 (&group->mutex){+.+.}-{3:3}:
	    __mutex_lock+0xb4/0xe40
	    mutex_lock_nested+0x1b/0x30
	    __iommu_probe_device+0x24c/0x4e0
	    probe_iommu_group+0x2b/0x50
	    bus_for_each_dev+0x7d/0xe0
	    iommu_device_register+0xe1/0x260
	    intel_iommu_init+0xda4/0x11f0
	    pci_iommu_init+0x13/0x70
	    do_one_initcall+0x62/0x3f0
	    kernel_init_freeable+0x3da/0x6a0
	    kernel_init+0x1b/0x200
	    ret_from_fork+0x44/0x70
	    ret_from_fork_asm+0x1a/0x30

     -> #0 (iommu_probe_device_lock){+.+.}-{3:3}:
	    __lock_acquire+0x1637/0x2810
	    lock_acquire+0xc9/0x300
	    __mutex_lock+0xb4/0xe40
	    mutex_lock_nested+0x1b/0x30
	    iommu_probe_device+0x1d/0x70
	    intel_iommu_init+0xe90/0x11f0
	    pci_iommu_init+0x13/0x70
	    do_one_initcall+0x62/0x3f0
	    kernel_init_freeable+0x3da/0x6a0
	    kernel_init+0x1b/0x200
	    ret_from_fork+0x44/0x70
	    ret_from_fork_asm+0x1a/0x30

     other info that might help us debug this:

     Chain exists of:
       iommu_probe_device_lock --> dmar_global_lock -->
	 &device->physical_node_lock

      Possible unsafe locking scenario:

	    CPU0                    CPU1
	    ----                    ----
       lock(&device->physical_node_lock);
				    lock(dmar_global_lock);
				    lock(&device->physical_node_lock);
       lock(iommu_probe_device_lock);

      *** DEADLOCK ***

    This driver uses a global lock to protect the list of enumerated DMA
    remapping units. It is necessary due to the driver's support for dynamic
    addition and removal of remapping units at runtime.

    Two distinct code paths require iteration over this remapping unit list:

    - Device registration and probing: the driver iterates the list to
      register each remapping unit with the upper layer IOMMU framework
      and subsequently probe the devices managed by that unit.
    - Global configuration: Upper layer components may also iterate the list
      to apply configuration changes.

    The lock acquisition order between these two code paths was reversed. This
    caused lockdep warnings, indicating a risk of deadlock. Fix this warning
    by releasing the global lock before invoking upper layer interfaces for
    device registration.

    Fixes: b150654 ("iommu/vt-d: Fix suspicious RCU usage")
    Closes: https://lore.kernel.org/linux-iommu/SJ1PR11MB612953431F94F18C954C4A9CB9D32@SJ1PR11MB6129.namprd11.prod.outlook.com/
    Tested-by: Chaitanya Kumar Borah <[email protected]>
    Cc: [email protected]
    Signed-off-by: Lu Baolu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Joerg Roedel <[email protected]>

(cherry picked from commit 93ae6e6)
Signed-off-by: Jerry Snitselaar <[email protected]>
Upstream-Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
JIRA: https://issues.redhat.com/browse/RHEL-78704
github-actions bot pushed a commit that referenced this pull request May 16, 2025
…unload

Kernel panic occurs when a devmem TCP socket is closed after NIC module
is unloaded.

This is Devmem TCP unregistration scenarios. number is an order.
(a)netlink socket close    (b)pp destroy    (c)uninstall    result
1                          2                3               OK
1                          3                2               (d)Impossible
2                          1                3               OK
3                          1                2               (e)Kernel panic
2                          3                1               (d)Impossible
3                          2                1               (d)Impossible

(a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
    closed.
(b) page_pool_destroy() is called when the interface is down.
(c) mp_ops->uninstall() is called when an interface is unregistered.
(d) There is no scenario in mp_ops->uninstall() is called before
    page_pool_destroy().
    Because unregister_netdevice_many_notify() closes interfaces first
    and then calls mp_ops->uninstall().
(e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
    netdev_lock().
    But if the interface module has already been removed, net_device
    pointer is invalid, so it causes kernel panic.

In summary, there are only 3 possible scenarios.
 A. sk close -> pp destroy -> uninstall.
 B. pp destroy -> sk close -> uninstall.
 C. pp destroy -> uninstall -> sk close.

Case C is a kernel panic scenario.

In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
binding->dev to NULL.
It indicates an bound net_device was unregistered.

It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
if binding->dev is NULL.

A new binding->lock is added to protect a dev of a binding.
So, lock ordering is like below.
 priv->lock
 netdev_lock(dev)
 binding->lock

Tests:
Scenario A:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    kill $pid
    ip link set $interface down
    modprobe -rv $module

Scenario B:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    ip link set $interface down
    kill $pid
    modprobe -rv $module

Scenario C:
    ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
        -v 7 -t 1 -q 1 &
    pid=$!
    sleep 10
    modprobe -rv $module
    sleep 5
    kill $pid

Splat looks like:
Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G    B   W           6.15.0-rc1+ #2 PREEMPT(undef)  0947ec89efa0fd68838b78e36aa1617e97ff5d7f
Tainted: [B]=BAD_PAGE, [W]=WARN
RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
FS:  0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
 ...
 netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
 genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
 ...
 netlink_release (net/netlink/af_netlink.c:737)
 ...
 __sock_release (net/socket.c:647)
 sock_close (net/socket.c:1393)

Fixes: 1d22d30 ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Taehee Yoo <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 21, 2025
JIRA: https://issues.redhat.com/browse/RHEL-84571
Upstream Status: net.git commit 443041d
Conflicts: context mismatch as we don't have MPCapableSYNTXDrop _ upstream
  commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops") and
  MPCapableSYNTXDisabled _ upstream commit 27069e7 ("mptcp: disable
  active MPTCP in case of blackhole")

commit 3d04139
Author: Paolo Abeni <[email protected]>
Date:   Mon Oct 14 16:06:00 2024 +0200

    mptcp: prevent MPC handshake on port-based signal endpoints

    Syzkaller reported a lockdep splat:

      ============================================
      WARNING: possible recursive locking detected
      6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
      --------------------------------------------
      syz-executor364/5113 is trying to acquire lock:
      ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

      but task is already holding lock:
      ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

      other info that might help us debug this:
       Possible unsafe locking scenario:

             CPU0
             ----
        lock(k-slock-AF_INET);
        lock(k-slock-AF_INET);

       *** DEADLOCK ***

       May be due to missing lock nesting notation

      7 locks held by syz-executor364/5113:
       #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
       #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
       #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
       #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
       #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
       #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
       #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
       #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
       #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
       #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
       #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
       #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
       #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
       #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
       #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
       #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
       #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

      stack backtrace:
      CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:93 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
       check_deadlock kernel/locking/lockdep.c:3061 [inline]
       validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
       __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
       mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
       subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
       tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
       tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
       ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
       ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
       NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
       NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
       __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
       __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
       process_backlog+0x662/0x15b0 net/core/dev.c:6108
       __napi_poll+0xcb/0x490 net/core/dev.c:6772
       napi_poll net/core/dev.c:6841 [inline]
       net_rx_action+0x89b/0x1240 net/core/dev.c:6963
       handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
       do_softirq+0x11b/0x1e0 kernel/softirq.c:455
       </IRQ>
       <TASK>
       __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
       local_bh_enable include/linux/bottom_half.h:33 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
       __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
       dev_queue_xmit include/linux/netdevice.h:3105 [inline]
       neigh_hh_output include/net/neighbour.h:526 [inline]
       neigh_output include/net/neighbour.h:540 [inline]
       ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
       ip_local_out net/ipv4/ip_output.c:129 [inline]
       __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
       __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
       tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
       tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
       tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
       sk_backlog_rcv include/net/sock.h:1111 [inline]
       __release_sock+0x214/0x350 net/core/sock.c:3004
       release_sock+0x61/0x1f0 net/core/sock.c:3558
       mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
       mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x1a6/0x270 net/socket.c:745
       ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
       ___sys_sendmsg net/socket.c:2651 [inline]
       __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
       __do_sys_sendmmsg net/socket.c:2766 [inline]
       __se_sys_sendmmsg net/socket.c:2763 [inline]
       __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f04fb13a6b9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9
      RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004
      RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240
      R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300
       </TASK>

    As noted by Cong Wang, the splat is false positive, but the code
    path leading to the report is an unexpected one: a client is
    attempting an MPC handshake towards the in-kernel listener created
    by the in-kernel PM for a port based signal endpoint.

    Such connection will be never accepted; many of them can make the
    listener queue full and preventing the creation of MPJ subflow via
    such listener - its intended role.

    Explicitly detect this scenario at initial-syn time and drop the
    incoming MPC request.

    Fixes: 1729cf1 ("mptcp: create the listening socket for new port")
    Cc: [email protected]
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
    Cc: Cong Wang <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Reviewed-by: Mat Martineau <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>

Signed-off-by: Davide Caratti <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 27, 2025
…_USERCOPY=y crash

Borislav Petkov reported the following boot crash on x86-32,
with CONFIG_HARDENED_USERCOPY=y:

  |  usercopy: Kernel memory overwrite attempt detected to SLUB object 'task_struct' (offset 2112, size 160)!
  |  ...
  |  kernel BUG at mm/usercopy.c:102!

So the useroffset and usersize arguments are what control the allowed
window of copying in/out of the "task_struct" kmem cache:

        /* create a slab on which task_structs can be allocated */
        task_struct_whitelist(&useroffset, &usersize);
        task_struct_cachep = kmem_cache_create_usercopy("task_struct",
                        arch_task_struct_size, align,
                        SLAB_PANIC|SLAB_ACCOUNT,
                        useroffset, usersize, NULL);

task_struct_whitelist() positions this window based on the location of
the thread_struct within task_struct, and gets the arch-specific details
via arch_thread_struct_whitelist(offset, size):

	static void __init task_struct_whitelist(unsigned long *offset, unsigned long *size)
	{
		/* Fetch thread_struct whitelist for the architecture. */
		arch_thread_struct_whitelist(offset, size);

		/*
		 * Handle zero-sized whitelist or empty thread_struct, otherwise
		 * adjust offset to position of thread_struct in task_struct.
		 */
		if (unlikely(*size == 0))
			*offset = 0;
		else
			*offset += offsetof(struct task_struct, thread);
	}

Commit cb7ca40 ("x86/fpu: Make task_struct::thread constant size")
removed the logic for the window, leaving:

	static inline void
	arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
	{
		*offset = 0;
		*size = 0;
	}

So now there is no window that usercopy hardening will allow to be copied
in/out of task_struct.

But as reported above, there *is* a copy in copy_uabi_to_xstate(). (It
seems there are several, actually.)

	int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
					      const void __user *ubuf)
	{
		return copy_uabi_to_xstate(x86_task_fpu(tsk)->fpstate, NULL, ubuf, &tsk->thread.pkru);
	}

This appears to be writing into x86_task_fpu(tsk)->fpstate. With or
without CONFIG_X86_DEBUG_FPU, this resolves to:

	((struct fpu *)((void *)(task) + sizeof(*(task))))

i.e. the memory "after task_struct" is cast to "struct fpu", and the
uses the "fpstate" pointer. How that pointer gets set looks to be
variable, but I think the one we care about here is:

        fpu->fpstate = &fpu->__fpstate;

And struct fpu::__fpstate says:

        struct fpstate                  __fpstate;
        /*
         * WARNING: '__fpstate' is dynamically-sized.  Do not put
         * anything after it here.
         */

So we're still dealing with a dynamically sized thing, even if it's not
within the literal struct task_struct -- it's still in the kmem cache,
though.

Looking at the kmem cache size, it has allocated "arch_task_struct_size"
bytes, which is calculated in fpu__init_task_struct_size():

        int task_size = sizeof(struct task_struct);

        task_size += sizeof(struct fpu);

        /*
         * Subtract off the static size of the register state.
         * It potentially has a bunch of padding.
         */
        task_size -= sizeof(union fpregs_state);

        /*
         * Add back the dynamically-calculated register state
         * size.
         */
        task_size += fpu_kernel_cfg.default_size;

        /*
         * We dynamically size 'struct fpu', so we require that
         * 'state' be at the end of 'it:
         */
        CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);

        arch_task_struct_size = task_size;

So, this is still copying out of the kmem cache for task_struct, and the
window seems unchanged (still fpu regs). This is what the window was
before:

	void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
	{
		*offset = offsetof(struct thread_struct, fpu.__fpstate.regs);
		*size = fpu_kernel_cfg.default_size;
	}

And the same commit I mentioned above removed it.

I think the misunderstanding is here:

  | The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
  | now that the FPU structure is not embedded in the task struct anymore, which
  | reduces text footprint a bit.

Yes, FPU is no longer in task_struct, but it IS in the kmem cache named
"task_struct", since the fpstate is still being allocated there.

Partially revert the earlier mentioned commit, along with a
recalculation of the fpstate regs location.

Fixes: cb7ca40 ("x86/fpu: Make task_struct::thread constant size")
Reported-by: Borislav Petkov (AMD) <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]/ # Discussion #1
Link: https://lore.kernel.org/r/202505041418.F47130C4C8@keescook             # Discussion #2
github-actions bot pushed a commit that referenced this pull request May 27, 2025
Move hctx debugfs/sysfs register out of freezing queue in
__blk_mq_update_nr_hw_queues(), so that the following lockdep dependency
can be killed:

	#2 (&q->q_usage_counter(io)ctrliq#16){++++}-{0:0}:
	#1 (fs_reclaim){+.+.}-{0:0}:
	#0 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}: //debugfs

And registering/un-registering hctx debugfs/sysfs does not require queue to
be frozen:

- hctx sysfs attributes show() are drained when removing kobject, and
  there isn't store() implementation for hctx sysfs attributes

- debugfs entry read() is drained too when removing debugfs directory,
  and there isn't write() implementation for hctx debugfs too

- so it is safe to register/unregister hctx sysfs/debugfs without
  freezing queue because the cod paths changes nothing, and we just
  need to keep hctx live

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 27, 2025
…xit()

scheduler's ->exit() is called with queue frozen and elevator lock is held, and
wbt_enable_default() can't be called with queue frozen, otherwise the
following lockdep warning is triggered:

	#6 (&q->rq_qos_mutex){+.+.}-{4:4}:
	#5 (&eq->sysfs_lock){+.+.}-{4:4}:
	#4 (&q->elevator_lock){+.+.}-{4:4}:
	#3 (&q->q_usage_counter(io)#3){++++}-{0:0}:
	#2 (fs_reclaim){+.+.}-{0:0}:
	#1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
	#0 (&q->debugfs_mutex){+.+.}-{4:4}:

Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
call it from elevator_change_done().

Meantime add disk->rqos_state_mutex for covering wbt state change, which
matches the purpose more than ->elevator_lock.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 27, 2025
Amir Goldstein <[email protected]> says:

This adds a test for fanotify mount ns notifications inside userns [1].

While working on the test I ended up making lots of cleanups to reduce
build dependency on make headers_install.

These patches got rid of the dependency for my kvm setup for the
affected filesystems tests.

Building with TOOLS_INCLUDES dir was recommended by John Hubbard [2].

NOTE #1: these patches are based on a merge of vfs-6.16.mount
(changes wrappers.h) into v6.15-rc5 (changes mount-notify_test.c),
so if this cleanup is acceptable, we should probably setup a selftests
branch for 6.16, so that it can be used to test the fanotify patches.

NOTE #2: some of the defines in wrappers.h are left for overlayfs and
mount_setattr tests, which were not converted to use TOOLS_INCLUDES.
I did not want to mess with those tests.

* patches from https://lore.kernel.org/[email protected]:
  selftests/fs/mount-notify: add a test variant running inside userns
  selftests/filesystems: create setup_userns() helper
  selftests/filesystems: create get_unique_mnt_id() helper
  selftests/fs/mount-notify: build with tools include dir
  selftests/mount_settattr: remove duplicate syscall definitions
  selftests/pidfd: move syscall definitions into wrappers.h
  selftests/fs/statmount: build with tools include dir
  selftests/filesystems: move wrapper.h out of overlayfs subdir

Link: https://lore.kernel.org/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 28, 2025
ACPICA commit 1c28da2242783579d59767617121035dafba18c3

This was originally done in NetBSD:
NetBSD/src@b69d1ac
and is the correct alternative to the smattering of `memcpy`s I
previously contributed to this repository.

This also sidesteps the newly strict checks added in UBSAN:
llvm/llvm-project@7926744

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #1.2  0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c
  #1.1  0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c
  #1    0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c
  #2    0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f
  #3    0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723
  #4    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #5    0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089
  #6    0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169
  ctrliq#7    0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a
  ctrliq#8    0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7
  ctrliq#9    0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979
  ctrliq#10   0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f
  ctrliq#11   0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf
  ctrliq#12   0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278
  ctrliq#13   0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87
  ctrliq#14   0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d
  ctrliq#15   0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e
  ctrliq#16   0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad
  ctrliq#17   0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e
  ctrliq#18   0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7
  ctrliq#19   0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342
  ctrliq#20   0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3
  ctrliq#21   0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616
  ctrliq#22   0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323
  ctrliq#23   0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76
  ctrliq#24   0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831
  ctrliq#25   0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc
  ctrliq#26   0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58
  ctrliq#27   0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159
  ctrliq#28   0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414
  ctrliq#29   0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d
  ctrliq#30   0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7
  ctrliq#31   0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66
  ctrliq#32   0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9
  ctrliq#33   0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d
  ctrliq#34   0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983
  ctrliq#35   0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e
  ctrliq#36   0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509
  ctrliq#37   0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958
  ctrliq#38   0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247
  ctrliq#39   0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962
  ctrliq#40   0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30
  ctrliq#41   0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d

Link: acpica/acpica@1c28da22
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
[ rjw: Pick up the tag from Tamir ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 28, 2025
Lockdep reports a possible circular locking dependency [1] when
cpu_hotplug_lock is acquired inside store_local_boost(), after
policy->rwsem has already been taken by store().

However, the boost update is strictly per-policy and does not
access shared state or iterate over all policies.

Since policy->rwsem is already held, this is enough to serialize
against concurrent topology changes for the current policy.

Remove the cpus_read_lock() to resolve the lockdep warning and
avoid unnecessary locking.

 [1]
 ======================================================
 WARNING: possible circular locking dependency detected
 6.15.0-rc6-debug-gb01fc4eca73c #1 Not tainted
 ------------------------------------------------------
 power-profiles-/588 is trying to acquire lock:
 ffffffffb3a7d910 (cpu_hotplug_lock){++++}-{0:0}, at: store_local_boost+0x56/0xd0

 but task is already holding lock:
 ffff8b6e5a12c380 (&policy->rwsem){++++}-{4:4}, at: store+0x37/0x90

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&policy->rwsem){++++}-{4:4}:
        down_write+0x29/0xb0
        cpufreq_online+0x7e8/0xa40
        cpufreq_add_dev+0x82/0xa0
        subsys_interface_register+0x148/0x160
        cpufreq_register_driver+0x15d/0x260
        amd_pstate_register_driver+0x36/0x90
        amd_pstate_init+0x1e7/0x270
        do_one_initcall+0x68/0x2b0
        kernel_init_freeable+0x231/0x270
        kernel_init+0x15/0x130
        ret_from_fork+0x2c/0x50
        ret_from_fork_asm+0x11/0x20

 -> #1 (subsys mutex#3){+.+.}-{4:4}:
        __mutex_lock+0xc2/0x930
        subsys_interface_register+0x7f/0x160
        cpufreq_register_driver+0x15d/0x260
        amd_pstate_register_driver+0x36/0x90
        amd_pstate_init+0x1e7/0x270
        do_one_initcall+0x68/0x2b0
        kernel_init_freeable+0x231/0x270
        kernel_init+0x15/0x130
        ret_from_fork+0x2c/0x50
        ret_from_fork_asm+0x11/0x20

 -> #0 (cpu_hotplug_lock){++++}-{0:0}:
        __lock_acquire+0x10ed/0x1850
        lock_acquire.part.0+0x69/0x1b0
        cpus_read_lock+0x2a/0xc0
        store_local_boost+0x56/0xd0
        store+0x50/0x90
        kernfs_fop_write_iter+0x132/0x200
        vfs_write+0x2b3/0x590
        ksys_write+0x74/0xf0
        do_syscall_64+0xbb/0x1d0
        entry_SYSCALL_64_after_hwframe+0x56/0x5e

Signed-off-by: Seyediman Seyedarab <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
github-actions bot pushed a commit that referenced this pull request May 29, 2025
JIRA: https://issues.redhat.com/browse/RHEL-79791
CVE: CVE-2024-56607

commit 8fac326
Author: Kalle Valo <[email protected]>
Date:   Mon Oct 7 19:59:27 2024 +0300

    wifi: ath12k: fix atomic calls in ath12k_mac_op_set_bitrate_mask()
    
    When I try to manually set bitrates:
    
    iw wlan0 set bitrates legacy-2.4 1
    
    I get sleeping from invalid context error, see below. Fix that by switching to
    use recently introduced ieee80211_iterate_stations_mtx().
    
    Do note that WCN6855 firmware is still crashing, I'm not sure if that firmware
    even supports bitrate WMI commands and should we consider disabling
    ath12k_mac_op_set_bitrate_mask() for WCN6855? But that's for another patch.
    
    BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath12k/wmi.c:420
    in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 2236, name: iw
    preempt_count: 0, expected: 0
    RCU nest depth: 1, expected: 0
    3 locks held by iw/2236:
     #0: ffffffffabc6f1d8 (cb_lock){++++}-{3:3}, at: genl_rcv+0x14/0x40
     #1: ffff888138410810 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: nl80211_pre_doit+0x54d/0x800 [cfg80211]
     #2: ffffffffab2cfaa0 (rcu_read_lock){....}-{1:2}, at: ieee80211_iterate_stations_atomic+0x2f/0x200 [mac80211]
    CPU: 3 UID: 0 PID: 2236 Comm: iw Not tainted 6.11.0-rc7-wt-ath+ #1772
    Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
    Call Trace:
     <TASK>
     dump_stack_lvl+0xa4/0xe0
     dump_stack+0x10/0x20
     __might_resched+0x363/0x5a0
     ? __alloc_skb+0x165/0x340
     __might_sleep+0xad/0x160
     ath12k_wmi_cmd_send+0xb1/0x3d0 [ath12k]
     ? ath12k_wmi_init_wcn7850+0xa40/0xa40 [ath12k]
     ? __netdev_alloc_skb+0x45/0x7b0
     ? __asan_memset+0x39/0x40
     ? ath12k_wmi_alloc_skb+0xf0/0x150 [ath12k]
     ? reacquire_held_locks+0x4d0/0x4d0
     ath12k_wmi_set_peer_param+0x340/0x5b0 [ath12k]
     ath12k_mac_disable_peer_fixed_rate+0xa3/0x110 [ath12k]
     ? ath12k_mac_vdev_stop+0x4f0/0x4f0 [ath12k]
     ieee80211_iterate_stations_atomic+0xd4/0x200 [mac80211]
     ath12k_mac_op_set_bitrate_mask+0x5d2/0x1080 [ath12k]
     ? ath12k_mac_vif_chan+0x320/0x320 [ath12k]
     drv_set_bitrate_mask+0x267/0x470 [mac80211]
     ieee80211_set_bitrate_mask+0x4cc/0x8a0 [mac80211]
     ? __this_cpu_preempt_check+0x13/0x20
     nl80211_set_tx_bitrate_mask+0x2bc/0x530 [cfg80211]
     ? nl80211_parse_tx_bitrate_mask+0x2320/0x2320 [cfg80211]
     ? trace_contention_end+0xef/0x140
     ? rtnl_unlock+0x9/0x10
     ? nl80211_pre_doit+0x557/0x800 [cfg80211]
     genl_family_rcv_msg_doit+0x1f0/0x2e0
     ? genl_family_rcv_msg_attrs_parse.isra.0+0x250/0x250
     ? ns_capable+0x57/0xd0
     genl_family_rcv_msg+0x34c/0x600
     ? genl_family_rcv_msg_dumpit+0x310/0x310
     ? __lock_acquire+0xc62/0x1de0
     ? he_set_mcs_mask.isra.0+0x8d0/0x8d0 [cfg80211]
     ? nl80211_parse_tx_bitrate_mask+0x2320/0x2320 [cfg80211]
     ? cfg80211_external_auth_request+0x690/0x690 [cfg80211]
     genl_rcv_msg+0xa0/0x130
     netlink_rcv_skb+0x14c/0x400
     ? genl_family_rcv_msg+0x600/0x600
     ? netlink_ack+0xd70/0xd70
     ? rwsem_optimistic_spin+0x4f0/0x4f0
     ? genl_rcv+0x14/0x40
     ? down_read_killable+0x580/0x580
     ? netlink_deliver_tap+0x13e/0x350
     ? __this_cpu_preempt_check+0x13/0x20
     genl_rcv+0x23/0x40
     netlink_unicast+0x45e/0x790
     ? netlink_attachskb+0x7f0/0x7f0
     netlink_sendmsg+0x7eb/0xdb0
     ? netlink_unicast+0x790/0x790
     ? __this_cpu_preempt_check+0x13/0x20
     ? selinux_socket_sendmsg+0x31/0x40
     ? netlink_unicast+0x790/0x790
     __sock_sendmsg+0xc9/0x160
     ____sys_sendmsg+0x620/0x990
     ? kernel_sendmsg+0x30/0x30
     ? __copy_msghdr+0x410/0x410
     ? __kasan_check_read+0x11/0x20
     ? mark_lock+0xe6/0x1470
     ___sys_sendmsg+0xe9/0x170
     ? copy_msghdr_from_user+0x120/0x120
     ? __lock_acquire+0xc62/0x1de0
     ? do_fault_around+0x2c6/0x4e0
     ? do_user_addr_fault+0x8c1/0xde0
     ? reacquire_held_locks+0x220/0x4d0
     ? do_user_addr_fault+0x8c1/0xde0
     ? __kasan_check_read+0x11/0x20
     ? __fdget+0x4e/0x1d0
     ? sockfd_lookup_light+0x1a/0x170
     __sys_sendmsg+0xd2/0x180
     ? __sys_sendmsg_sock+0x20/0x20
     ? reacquire_held_locks+0x4d0/0x4d0
     ? debug_smp_processor_id+0x17/0x20
     __x64_sys_sendmsg+0x72/0xb0
     ? lockdep_hardirqs_on+0x7d/0x100
     x64_sys_call+0x894/0x9f0
     do_syscall_64+0x64/0x130
     entry_SYSCALL_64_after_hwframe+0x4b/0x53
    RIP: 0033:0x7f230fe04807
    Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
    RSP: 002b:00007ffe996a7ea8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 0000556f9f9c3390 RCX: 00007f230fe04807
    RDX: 0000000000000000 RSI: 00007ffe996a7ee0 RDI: 0000000000000003
    RBP: 0000556f9f9c88c0 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000556f965ca190 R11: 0000000000000246 R12: 0000556f9f9c8780
    R13: 00007ffe996a7ee0 R14: 0000556f9f9c87d0 R15: 0000556f9f9c88c0
     </TASK>
    
    Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
    
    Signed-off-by: Kalle Valo <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jeff Johnson <[email protected]>

Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 17, 2025
JIRA: https://issues.redhat.com/browse/RHEL-102648

When I run the NVME over TCP test in virtme-ng, I get the following
"suspicious RCU usage" warning in nvme_mpath_add_sysfs_link():

'''
[    5.024557][   T44] nvmet: Created nvm controller 1 for subsystem nqn.2025-06.org.nvmexpress.mptcp for NQN nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77.
[    5.027401][  T183] nvme nvme0: creating 2 I/O queues.
[    5.029017][  T183] nvme nvme0: mapped 2/0/0 default/read/poll queues.
[    5.032587][  T183] nvme nvme0: new ctrl: NQN "nqn.2025-06.org.nvmexpress.mptcp", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:f7f6b5e0-ff97-4894-98ac-c85309e0bc77
[    5.042214][   T25]
[    5.042440][   T25] =============================
[    5.042579][   T25] WARNING: suspicious RCU usage
[    5.042705][   T25] 6.16.0-rc3+ ctrliq#23 Not tainted
[    5.042812][   T25] -----------------------------
[    5.042934][   T25] drivers/nvme/host/multipath.c:1203 RCU-list traversed in non-reader section!!
[    5.043111][   T25]
[    5.043111][   T25] other info that might help us debug this:
[    5.043111][   T25]
[    5.043341][   T25]
[    5.043341][   T25] rcu_scheduler_active = 2, debug_locks = 1
[    5.043502][   T25] 3 locks held by kworker/u9:0/25:
[    5.043615][   T25]  #0: ffff888008730948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x7ed/0x1350
[    5.043830][   T25]  #1: ffffc900001afd40 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0xcf3/0x1350
[    5.044084][   T25]  #2: ffff888013ee0020 (&head->srcu){.+.+}-{0:0}, at: nvme_mpath_add_sysfs_link.part.0+0xb4/0x3a0
[    5.044300][   T25]
[    5.044300][   T25] stack backtrace:
[    5.044439][   T25] CPU: 0 UID: 0 PID: 25 Comm: kworker/u9:0 Not tainted 6.16.0-rc3+ ctrliq#23 PREEMPT(full)
[    5.044441][   T25] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    5.044442][   T25] Workqueue: async async_run_entry_fn
[    5.044445][   T25] Call Trace:
[    5.044446][   T25]  <TASK>
[    5.044449][   T25]  dump_stack_lvl+0x6f/0xb0
[    5.044453][   T25]  lockdep_rcu_suspicious.cold+0x4f/0xb1
[    5.044457][   T25]  nvme_mpath_add_sysfs_link.part.0+0x2fb/0x3a0
[    5.044459][   T25]  ? queue_work_on+0x90/0xf0
[    5.044461][   T25]  ? lockdep_hardirqs_on+0x78/0x110
[    5.044466][   T25]  nvme_mpath_set_live+0x1e9/0x4f0
[    5.044470][   T25]  nvme_mpath_add_disk+0x240/0x2f0
[    5.044472][   T25]  ? __pfx_nvme_mpath_add_disk+0x10/0x10
[    5.044475][   T25]  ? add_disk_fwnode+0x361/0x580
[    5.044480][   T25]  nvme_alloc_ns+0x81c/0x17c0
[    5.044483][   T25]  ? kasan_quarantine_put+0x104/0x240
[    5.044487][   T25]  ? __pfx_nvme_alloc_ns+0x10/0x10
[    5.044495][   T25]  ? __pfx_nvme_find_get_ns+0x10/0x10
[    5.044496][   T25]  ? rcu_read_lock_any_held+0x45/0xa0
[    5.044498][   T25]  ? validate_chain+0x232/0x4f0
[    5.044503][   T25]  nvme_scan_ns+0x4c8/0x810
[    5.044506][   T25]  ? __pfx_nvme_scan_ns+0x10/0x10
[    5.044508][   T25]  ? find_held_lock+0x2b/0x80
[    5.044512][   T25]  ? ktime_get+0x16d/0x220
[    5.044517][   T25]  ? kvm_clock_get_cycles+0x18/0x30
[    5.044520][   T25]  ? __pfx_nvme_scan_ns_async+0x10/0x10
[    5.044522][   T25]  async_run_entry_fn+0x97/0x560
[    5.044523][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044526][   T25]  process_one_work+0xd3c/0x1350
[    5.044532][   T25]  ? __pfx_process_one_work+0x10/0x10
[    5.044536][   T25]  ? assign_work+0x16c/0x240
[    5.044539][   T25]  worker_thread+0x4da/0xd50
[    5.044545][   T25]  ? __pfx_worker_thread+0x10/0x10
[    5.044546][   T25]  kthread+0x356/0x5c0
[    5.044548][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044549][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044552][   T25]  ? __lock_release.isra.0+0x5d/0x180
[    5.044553][   T25]  ? ret_from_fork+0x1b/0x2e0
[    5.044555][   T25]  ? rcu_is_watching+0x12/0xc0
[    5.044557][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044559][   T25]  ret_from_fork+0x218/0x2e0
[    5.044561][   T25]  ? __pfx_kthread+0x10/0x10
[    5.044562][   T25]  ret_from_fork_asm+0x1a/0x30
[    5.044570][   T25]  </TASK>
'''

This patch uses sleepable RCU version of helper list_for_each_entry_srcu()
instead of list_for_each_entry_rcu() to fix it.

Fixes: 4dbd2b2 ("nvme-multipath: Add visibility for round-robin io-policy")
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nilay Shroff <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
(cherry picked from commit d681107)
Signed-off-by: Chris Leech <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 18, 2025
…void Priority Inversion in SRIOV

commit dc0297f upstream.

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 ctrliq#14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 ctrliq#14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  <TASK>
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  </TASK>

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180 ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <[email protected]>
Cc: Jingwen Chen <[email protected]>
Cc: Victor Skvortsov <[email protected]>
Cc: Zhigang Luo <[email protected]>
Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
[ Minor context change fixed. ]
Signed-off-by: Wenshan Lan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 18, 2025
…-flight

commit ecf371f upstream.

Reject migration of SEV{-ES} state if either the source or destination VM
is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
section between incrementing created_vcpus and online_vcpus.  The bulk of
vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
in parallel, and so sev_info.es_active can get toggled from false=>true in
the destination VM after (or during) svm_vcpu_create(), resulting in an
SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.

The issue manifests most visibly as a crash when trying to free a vCPU's
NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.

  BUG: unable to handle page fault for address: ffffebde00000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP KASAN NOPTI
  CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G     U     O        6.15.0-smp-DEV #2 NONE
  Tainted: [U]=USER, [O]=OOT_MODULE
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
  RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
  RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
  RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
  RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
  RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
  Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
  RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
  RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
  RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
  R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
  R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
  FS:  0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
  DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
   svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
   kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
   kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
   kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
   kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
   kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
   kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
   __fput+0x3e4/0x9e0 fs/file_table.c:465
   task_work_run+0x1a9/0x220 kernel/task_work.c:227
   exit_task_work include/linux/task_work.h:40 [inline]
   do_exit+0x7f0/0x25b0 kernel/exit.c:953
   do_group_exit+0x203/0x2d0 kernel/exit.c:1102
   get_signal+0x1357/0x1480 kernel/signal.c:3034
   arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
   do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f87a898e969
   </TASK>
  Modules linked in: gq(O)
  gsmi: Log Shutdown Reason 0x03
  CR2: ffffebde00000000
  ---[ end trace 0000000000000000 ]---

Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
the host is likely desirable due to the VMSA being consumed by hardware.
E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
bogus VMSA page.  Accessing PFN 0 is "fine"-ish now that it's sequestered
away thanks to L1TF, but panicking in this scenario is preferable to
potentially running with corrupted state.

Reported-by: Alexander Potapenko <[email protected]>
Tested-by: Alexander Potapenko <[email protected]>
Fixes: 0b020f5 ("KVM: SEV: Add support for SEV-ES intra host migration")
Fixes: b566393 ("KVM: SEV: Add support for SEV intra host migration")
Cc: [email protected]
Cc: James Houghton <[email protected]>
Cc: Peter Gonda <[email protected]>
Reviewed-by: Liam Merwick <[email protected]>
Tested-by: Liam Merwick <[email protected]>
Reviewed-by: James Houghton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
If the PHY driver uses another PHY internally (e.g. in case of eUSB2,
repeaters are represented as PHYs), then it would trigger the following
lockdep splat because all PHYs use a single static lockdep key and thus
lockdep can not identify whether there is a dependency or not and
reports a false positive.

Make PHY subsystem use dynamic lockdep keys, assigning each driver a
separate key. This way lockdep can correctly identify dependency graph
between mutexes.

 ============================================
 WARNING: possible recursive locking detected
 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 Not tainted
 --------------------------------------------
 kworker/u51:0/78 is trying to acquire lock:
 ffff0008116554f0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 but task is already holding lock:
 ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&phy->mutex);
   lock(&phy->mutex);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by kworker/u51:0/78:
  #0: ffff000800010948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x18c/0x5ec
  #1: ffff80008036bdb0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x5ec
  #2: ffff0008094ac8f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x188
  #3: ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c

 stack backtrace:
 CPU: 0 UID: 0 PID: 78 Comm: kworker/u51:0 Not tainted 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 PREEMPT
 Hardware name: Qualcomm CRD, BIOS 6.0.240904.BOOT.MXF.2.4-00528.1-HAMOA-1 09/ 4/2024
 Workqueue: events_unbound deferred_probe_work_func
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  print_deadlock_bug+0x258/0x348
  __lock_acquire+0x10fc/0x1f84
  lock_acquire+0x1c8/0x338
  __mutex_lock+0xb8/0x59c
  mutex_lock_nested+0x24/0x30
  phy_init+0x4c/0x12c
  snps_eusb2_hsphy_init+0x54/0x1a0
  phy_init+0xe0/0x12c
  dwc3_core_init+0x450/0x10b4
  dwc3_core_probe+0xce4/0x15fc
  dwc3_probe+0x64/0xb0
  platform_probe+0x68/0xc4
  really_probe+0xbc/0x298
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0x3c/0x160
  __device_attach_driver+0xb8/0x138
  bus_for_each_drv+0x84/0xe0
  __device_attach+0x9c/0x188
  device_initial_probe+0x14/0x20
  bus_probe_device+0xac/0xb0
  deferred_probe_work_func+0x8c/0xc8
  process_one_work+0x208/0x5ec
  worker_thread+0x1c0/0x368
  kthread+0x14c/0x20c
  ret_from_fork+0x10/0x20

Fixes: 3584f63 ("phy: qcom: phy-qcom-snps-eusb2: Add support for eUSB2 repeater")
Fixes: e246355 ("phy: amlogic: Add Amlogic AXG PCIE PHY Driver")
Reviewed-by: Neil Armstrong <[email protected]>
Reviewed-by: Abel Vesa <[email protected]>
Reported-by: Johan Hovold <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Reviewed-by: Johan Hovold <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
This reverts commit 7796c97.

This patch broke Dragonboard 845c (sdm845). I see:

    Unexpected kernel BRK exception at EL1
    Internal error: BRK handler: 00000000f20003e8 [#1]  SMP
    pc : qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom]
    lr : snd_soc_dai_set_channel_map+0x34/0x78
    Call trace:
     qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom] (P)
     sdm845_dai_init+0x18c/0x2e0 [snd_soc_sdm845]
     snd_soc_link_init+0x28/0x6c
     snd_soc_bind_card+0x5f4/0xb0c
     snd_soc_register_card+0x148/0x1a4
     devm_snd_soc_register_card+0x50/0xb0
     sdm845_snd_platform_probe+0x124/0x148 [snd_soc_sdm845]
     platform_probe+0x6c/0xd0
     really_probe+0xc0/0x2a4
     __driver_probe_device+0x7c/0x130
     driver_probe_device+0x40/0x118
     __device_attach_driver+0xc4/0x108
     bus_for_each_drv+0x8c/0xf0
     __device_attach+0xa4/0x198
     device_initial_probe+0x18/0x28
     bus_probe_device+0xb8/0xbc
     deferred_probe_work_func+0xac/0xfc
     process_one_work+0x244/0x658
     worker_thread+0x1b4/0x360
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Kernel panic - not syncing: BRK handler: Fatal exception

Dan has also reported following issues with the original patch
https://lore.kernel.org/all/[email protected]/

Bug #1:
The zeroeth element of ctrl->pconfig[] is supposed to be unused.  We
start counting at 1.  However this code sets ctrl->pconfig[0].ch_mask = 128.

Bug #2:
There are SLIM_MAX_TX_PORTS (16) elements in tx_ch[] array but only
QCOM_SDW_MAX_PORTS + 1 (15) in the ctrl->pconfig[] array so it corrupts
memory like Yongqin Liu pointed out.

Bug 3:
Like Jie Gan pointed out, it erases all the tx information with the rx
information.

Cc: [email protected] # v6.15+
Signed-off-by: Amit Pundir <[email protected]>
Acked-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 19, 2025
 into HEAD

KVM/riscv fixes for 6.16, take #2

- Disable vstimecmp before exiting to user-space
- Move HGEI[E|P] CSR access to IMSIC virtualization
github-actions bot pushed a commit that referenced this pull request Jul 24, 2025
commit b1bf1a7 upstream.

If "try_verify_in_tasklet" is set for dm-verity, DM_BUFIO_CLIENT_NO_SLEEP
is enabled for dm-bufio. However, when bufio tries to evict buffers, there
is a chance to trigger scheduling in spin_lock_bh, the following warning
is hit:

BUG: sleeping function called from invalid context at drivers/md/dm-bufio.c:2745
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 123, name: kworker/2:2
preempt_count: 201, expected: 0
RCU nest depth: 0, expected: 0
4 locks held by kworker/2:2/123:
 #0: ffff88800a2d1548 ((wq_completion)dm_bufio_cache){....}-{0:0}, at: process_one_work+0xe46/0x1970
 #1: ffffc90000d97d20 ((work_completion)(&dm_bufio_replacement_work)){....}-{0:0}, at: process_one_work+0x763/0x1970
 #2: ffffffff8555b528 (dm_bufio_clients_lock){....}-{3:3}, at: do_global_cleanup+0x1ce/0x710
 #3: ffff88801d5820b8 (&c->spinlock){....}-{2:2}, at: do_global_cleanup+0x2a5/0x710
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 2 UID: 0 PID: 123 Comm: kworker/2:2 Not tainted 6.16.0-rc3-g90548c634bd0 ctrliq#305 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: dm_bufio_cache do_global_cleanup
Call Trace:
 <TASK>
 dump_stack_lvl+0x53/0x70
 __might_resched+0x360/0x4e0
 do_global_cleanup+0x2f5/0x710
 process_one_work+0x7db/0x1970
 worker_thread+0x518/0xea0
 kthread+0x359/0x690
 ret_from_fork+0xf3/0x1b0
 ret_from_fork_asm+0x1a/0x30
 </TASK>

That can be reproduced by:

  veritysetup format --data-block-size=4096 --hash-block-size=4096 /dev/vda /dev/vdb
  SIZE=$(blockdev --getsz /dev/vda)
  dmsetup create myverity -r --table "0 $SIZE verity 1 /dev/vda /dev/vdb 4096 4096 <data_blocks> 1 sha256 <root_hash> <salt> 1 try_verify_in_tasklet"
  mount /dev/dm-0 /mnt -o ro
  echo 102400 > /sys/module/dm_bufio/parameters/max_cache_size_bytes
  [read files in /mnt]

Cc: [email protected]	# v6.4+
Fixes: 450e8de ("dm bufio: improve concurrent IO performance")
Signed-off-by: Wang Shuai <[email protected]>
Signed-off-by: Sheng Yong <[email protected]>
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
github-actions bot pushed a commit that referenced this pull request Jul 30, 2025
JIRA: https://issues.redhat.com/browse/RHEL-104401

commit 711741f
Author: Paulo Alcantara <[email protected]>
Date:   Wed Jun 25 12:22:38 2025 -0300

    smb: client: fix potential deadlock when reconnecting channels
    
    Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
    and prevent the following deadlock from happening
    
    ======================================================
    WARNING: possible circular locking dependency detected
    6.16.0-rc3-build2+ #1301 Tainted: G S      W
    ------------------------------------------------------
    cifsd/6055 is trying to acquire lock:
    ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200
    
    but task is already holding lock:
    ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
           validate_chain+0x1cf/0x270
           __lock_acquire+0x60e/0x780
           lock_acquire.part.0+0xb4/0x1f0
           _raw_spin_lock+0x2f/0x40
           cifs_setup_session+0x81/0x4b0
           cifs_get_smb_ses+0x771/0x900
           cifs_mount_get_session+0x7e/0x170
           cifs_mount+0x92/0x2d0
           cifs_smb3_do_mount+0x161/0x460
           smb3_get_tree+0x55/0x90
           vfs_get_tree+0x46/0x180
           do_new_mount+0x1b0/0x2e0
           path_mount+0x6ee/0x740
           do_mount+0x98/0xe0
           __do_sys_mount+0x148/0x180
           do_syscall_64+0xa4/0x260
           entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    -> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
           validate_chain+0x1cf/0x270
           __lock_acquire+0x60e/0x780
           lock_acquire.part.0+0xb4/0x1f0
           _raw_spin_lock+0x2f/0x40
           cifs_match_super+0x101/0x320
           sget+0xab/0x270
           cifs_smb3_do_mount+0x1e0/0x460
           smb3_get_tree+0x55/0x90
           vfs_get_tree+0x46/0x180
           do_new_mount+0x1b0/0x2e0
           path_mount+0x6ee/0x740
           do_mount+0x98/0xe0
           __do_sys_mount+0x148/0x180
           do_syscall_64+0xa4/0x260
           entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    -> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
           check_noncircular+0x95/0xc0
           check_prev_add+0x115/0x2f0
           validate_chain+0x1cf/0x270
           __lock_acquire+0x60e/0x780
           lock_acquire.part.0+0xb4/0x1f0
           _raw_spin_lock+0x2f/0x40
           cifs_signal_cifsd_for_reconnect+0x134/0x200
           __cifs_reconnect+0x8f/0x500
           cifs_handle_standard+0x112/0x280
           cifs_demultiplex_thread+0x64d/0xbc0
           kthread+0x2f7/0x310
           ret_from_fork+0x2a/0x230
           ret_from_fork_asm+0x1a/0x30
    
    other info that might help us debug this:
    
    Chain exists of:
      &tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock(&ret_buf->chan_lock);
                                   lock(&ret_buf->ses_lock);
                                   lock(&ret_buf->chan_lock);
      lock(&tcp_ses->srv_lock);
    
     *** DEADLOCK ***
    
    3 locks held by cifsd/6055:
     #0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
     #1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
     #2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
    
    Cc: [email protected]
    Reported-by: David Howells <[email protected]>
    Fixes: d7d7a66 ("cifs: avoid use of global locks for high contention data")
    Reviewed-by: David Howells <[email protected]>
    Tested-by: David Howells <[email protected]>
    Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
    Signed-off-by: David Howells <[email protected]>
    Signed-off-by: Steve French <[email protected]>

Signed-off-by: Paulo Alcantara <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 1, 2025
JIRA: https://issues.redhat.com/browse/RHEL-75594

commit 62531a1
Author: Amit Cohen <[email protected]>
Date:   Wed Mar 5 14:15:09 2025 +0200

    net: switchdev: Convert blocking notification chain to a raw one

    A blocking notification chain uses a read-write semaphore to protect the
    integrity of the chain. The semaphore is acquired for writing when
    adding / removing notifiers to / from the chain and acquired for reading
    when traversing the chain and informing notifiers about an event.

    In case of the blocking switchdev notification chain, recursive
    notifications are possible which leads to the semaphore being acquired
    twice for reading and to lockdep warnings being generated [1].

    Specifically, this can happen when the bridge driver processes a
    SWITCHDEV_BRPORT_UNOFFLOADED event which causes it to emit notifications
    about deferred events when calling switchdev_deferred_process().

    Fix this by converting the notification chain to a raw notification
    chain in a similar fashion to the netdev notification chain. Protect
    the chain using the RTNL mutex by acquiring it when modifying the chain.
    Events are always informed under the RTNL mutex, but add an assertion in
    call_switchdev_blocking_notifiers() to make sure this is not violated in
    the future.

    Maintain the "blocking" prefix as events are always emitted from process
    context and listeners are allowed to block.

    [1]:
    WARNING: possible recursive locking detected
    6.14.0-rc4-custom-g079270089484 #1 Not tainted
    --------------------------------------------
    ip/52731 is trying to acquire lock:
    ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0

    but task is already holding lock:
    ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0

    other info that might help us debug this:
    Possible unsafe locking scenario:
    CPU0
    ----
    lock((switchdev_blocking_notif_chain).rwsem);
    lock((switchdev_blocking_notif_chain).rwsem);

    *** DEADLOCK ***
    May be due to missing lock nesting notation
    3 locks held by ip/52731:
     #0: ffffffff84f795b0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x727/0x1dc0
     #1: ffffffff8731f628 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x790/0x1dc0
     #2: ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0

    stack backtrace:
    ...
    ? __pfx_down_read+0x10/0x10
    ? __pfx_mark_lock+0x10/0x10
    ? __pfx_switchdev_port_attr_set_deferred+0x10/0x10
    blocking_notifier_call_chain+0x58/0xa0
    switchdev_port_attr_notify.constprop.0+0xb3/0x1b0
    ? __pfx_switchdev_port_attr_notify.constprop.0+0x10/0x10
    ? mark_held_locks+0x94/0xe0
    ? switchdev_deferred_process+0x11a/0x340
    switchdev_port_attr_set_deferred+0x27/0xd0
    switchdev_deferred_process+0x164/0x340
    br_switchdev_port_unoffload+0xc8/0x100 [bridge]
    br_switchdev_blocking_event+0x29f/0x580 [bridge]
    notifier_call_chain+0xa2/0x440
    blocking_notifier_call_chain+0x6e/0xa0
    switchdev_bridge_port_unoffload+0xde/0x1a0
    ...

    Fixes: f7a70d6 ("net: bridge: switchdev: Ensure deferred event delivery on unoffload")
    Signed-off-by: Amit Cohen <[email protected]>
    Reviewed-by: Ido Schimmel <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Reviewed-by: Vladimir Oltean <[email protected]>
    Tested-by: Vladimir Oltean <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>

Signed-off-by: Mohammad Heib <[email protected]>
bmastbergen pushed a commit that referenced this pull request Aug 1, 2025
jira VULN-69389
cve CVE-2022-50066
commit-author Chia-Lin Kao (AceLan) <[email protected]>
commit 2ba5e47

The final update statement of the for loop exceeds the array range, the
dereference of self->aq_vec[i] is not checked and then leads to the
index out of range error.
Also fixed this kind of coding style in other for loop.

[   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
[   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
[   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
[   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
[   97.937611] Workqueue: events_unbound async_run_entry_fn
[   97.937616] Call Trace:
[   97.937617]  <TASK>
[   97.937619]  dump_stack_lvl+0x49/0x63
[   97.937624]  dump_stack+0x10/0x16
[   97.937626]  ubsan_epilogue+0x9/0x3f
[   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[   97.937629]  ? __scm_send+0x348/0x440
[   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
[   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
[   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
[   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
[   97.937653]  pci_pm_suspend+0x7e/0x1a0
[   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
[   97.937657]  dpm_run_callback+0x54/0x190
[   97.937660]  __device_suspend+0x14c/0x4d0
[   97.937661]  async_suspend+0x23/0x70
[   97.937663]  async_run_entry_fn+0x33/0x120
[   97.937664]  process_one_work+0x21f/0x3f0
[   97.937666]  worker_thread+0x4a/0x3c0
[   97.937668]  ? process_one_work+0x3f0/0x3f0
[   97.937669]  kthread+0xf0/0x120
[   97.937671]  ? kthread_complete_and_exit+0x20/0x20
[   97.937672]  ret_from_fork+0x22/0x30
[   97.937676]  </TASK>

v2. fixed "warning: variable 'aq_vec' set but not used"

v3. simplified a for loop

Fixes: 97bde5c ("net: ethernet: aquantia: Support for NIC-specific code")
	Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
	Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 2ba5e47)
	Signed-off-by: Anmol Jain <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 2, 2025
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	ctrliq#7  0x0000000010251618 in perf_session.process_events ()
	ctrliq#8  0x0000000010053620 in cmd_script ()
	ctrliq#9  0x00000000100b5a28 in run_builtin ()
	ctrliq#10 0x00000000100b5f94 in handle_internal_command ()
	ctrliq#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

Reported-by: Disha Goel <[email protected]>
Signed-off-by: Aditya Bodkhe <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Tested-by: Disha Goel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Adrian Hunter <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 2, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    #6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    ctrliq#7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    ctrliq#8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    ctrliq#9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    ctrliq#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    ctrliq#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    ctrliq#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    ctrliq#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    ctrliq#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    ctrliq#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    ctrliq#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    ctrliq#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    ctrliq#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    ctrliq#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    ctrliq#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    ctrliq#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    ctrliq#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    ctrliq#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    ctrliq#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    ctrliq#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    ctrliq#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    ctrliq#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    ctrliq#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    ctrliq#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    ctrliq#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    ctrliq#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    ctrliq#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    ctrliq#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 2, 2025
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.

Example output:
```
$ perf test -vv PERF_RECORD_
...
  7: PERF_RECORD_* events & perf_sample fields:
  7: PERF_RECORD_* events & perf_sample fields                       : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
:  7: PERF_RECORD_* events & perf_sample fields:

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #5 0x7fc12ff02d68 in __sleep sleep.c:55
    #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    ctrliq#7 0x55788c620fb0 in run_test_child builtin-test.c:0
    ctrliq#8 0x55788c5bd18d in start_command run-command.c:127
    ctrliq#9 0x55788c621ef3 in __cmd_test builtin-test.c:0
    ctrliq#10 0x55788c6225bf in cmd_test ??:0
    ctrliq#11 0x55788c5afbd0 in run_builtin perf.c:0
    ctrliq#12 0x55788c5afeeb in handle_internal_command perf.c:0
    ctrliq#13 0x55788c52b383 in main ??:0
    ctrliq#14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    ctrliq#15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    ctrliq#16 0x55788c52b9d1 in _start ??:0

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
    #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
    #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
    #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
    #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    ctrliq#7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    ctrliq#8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    ctrliq#9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    ctrliq#10 0x7fc12ff02d68 in __sleep sleep.c:55
    ctrliq#11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    ctrliq#12 0x55788c620fb0 in run_test_child builtin-test.c:0
    ctrliq#13 0x55788c5bd18d in start_command run-command.c:127
    ctrliq#14 0x55788c621ef3 in __cmd_test builtin-test.c:0
    ctrliq#15 0x55788c6225bf in cmd_test ??:0
    ctrliq#16 0x55788c5afbd0 in run_builtin perf.c:0
    ctrliq#17 0x55788c5afeeb in handle_internal_command perf.c:0
    ctrliq#18 0x55788c52b383 in main ??:0
    ctrliq#19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    ctrliq#20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    ctrliq#21 0x55788c52b9d1 in _start ??:0
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Signed-off-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 2, 2025
Calling perf top with branch filters enabled on Intel CPU's
with branch counters logging (A.K.A LBR event logging [1]) support
results in a segfault.

$ perf top  -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter
...
Thread 27 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafff76c0 (LWP 949003)]
perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
653			*width = env->cpu_pmu_caps ? env->br_cntr_width :
(gdb) bt
 #0  perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
 #1  0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345
 #2  0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389
 #3  0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422
 #4  0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850
 #5  0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737
 #6  0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359
 ctrliq#7  0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845
 ctrliq#8  0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211
 ctrliq#9  0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245
 ctrliq#10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324
 ctrliq#11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342
 ctrliq#12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120
 ctrliq#13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448
 ctrliq#14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The cause is that perf_env__find_br_cntr_info tries to access a
null pointer pmu_caps in the perf_env struct. A similar issue exists
for homogeneous core systems which use the cpu_pmu_caps structure.

Fix this by populating cpu_pmu_caps and pmu_caps structures with
values from sysfs when calling perf top with branch stack sampling
enabled.

[1], LBR event logging introduced here:
https://lore.kernel.org/all/[email protected]/

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Thomas Falcon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 2, 2025
Some architectures (e.g. arm64) only support memory hotplug operations on
a restricted set of physical addresses. This applies even when we are
faking some CXL fixed memory windows for the purposes of cxl_test.
That range can be queried with mhp_get_pluggable_range(true). Use the
minimum of that the top of that range and iomem_resource.end to establish
the 64GiB region used by cxl_test.

From thread #2 which was related to the issue in #1.

[ dj: Add CONFIG_MEMORY_HOTPLUG config check, from Alison ]

Link: https://lore.kernel.org/linux-cxl/[email protected]/ #2
Reported-by: Itaru Kitayama <[email protected]>
Closes: pmem/ndctl#278 #1
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Itaru Kitayama <[email protected] <mailto:[email protected]>
Tested-by: Marc Herbert <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Dave Jiang <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 4, 2025
When the root of a nested PCIe bridge configuration is unplugged, the
pnv_php driver leaked the allocated IRQ resources for the child bridges'
hotplug event notifications, resulting in a panic.

Fix this by walking all child buses and deallocating all its IRQ resources
before calling pci_hp_remove_devices().

Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so
that it is only destroyed in pnv_php_free_slot(), instead of
pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will
now be called by workers triggered by hot unplug interrupts, so the
workqueue needs to stay allocated.

The abridged kernel panic that occurs without this patch is as follows:

  WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c
  CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2
  Call Trace:
   msi_device_data_release+0x34/0x9c (unreliable)
   release_nodes+0x64/0x13c
   devres_release_all+0xc0/0x140
   device_del+0x2d4/0x46c
   pci_destroy_dev+0x5c/0x194
   pci_hp_remove_devices+0x90/0x128
   pci_hp_remove_devices+0x44/0x128
   pnv_php_disable_slot+0x54/0xd4
   power_write_file+0xf8/0x18c
   pci_slot_attr_store+0x40/0x5c
   sysfs_kf_write+0x64/0x78
   kernfs_fop_write_iter+0x1b0/0x290
   vfs_write+0x3bc/0x50c
   ksys_write+0x84/0x140
   system_call_exception+0x124/0x230
   system_call_vectored_common+0x15c/0x2ec

Signed-off-by: Shawn Anastasio <[email protected]>
Signed-off-by: Timothy Pearson <[email protected]>
[bhelgaas: tidy comments]
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com
github-actions bot pushed a commit that referenced this pull request Aug 7, 2025
JIRA: https://issues.redhat.com/browse/RHEL-79685

commit 396f016
Author: Emil Tantilov <[email protected]>
Date:   Thu Nov 21 20:40:59 2024 -0800

    idpf: add read memory barrier when checking descriptor done bit

    Add read memory barrier to ensure the order of operations when accessing
    control queue descriptors. Specifically, we want to avoid cases where loads
    can be reordered:

    1. Load #1 is dispatched to read descriptor flags.
    2. Load #2 is dispatched to read some other field from the descriptor.
    3. Load #2 completes, accessing memory/cache at a point in time when the DD
       flag is zero.
    4. NIC DMA overwrites the descriptor, now the DD flag is one.
    5. Any fields loaded before step 4 are now inconsistent with the actual
       descriptor state.

    Add read memory barrier between steps 1 and 2, so that load #2 is not
    executed until load #1 has completed.

    Fixes: 8077c72 ("idpf: add controlq init and reset checks")
    Reviewed-by: Przemek Kitszel <[email protected]>
    Reviewed-by: Sridhar Samudrala <[email protected]>
    Suggested-by: Lance Richardson <[email protected]>
    Signed-off-by: Emil Tantilov <[email protected]>
    Tested-by: Krishneil Singh <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>

Signed-off-by: Michal Schmidt <[email protected]>
bmastbergen pushed a commit that referenced this pull request Aug 12, 2025
jira VULN-71367
cve CVE-2022-50066
commit-author Chia-Lin Kao (AceLan) <[email protected]>
commit 2ba5e47
upstream-diff There was a merge conflict due to the CentOS 7 tree being old but it was minor.

The final update statement of the for loop exceeds the array range, the dereference of self->aq_vec[i] is not checked and then leads to the index out of range error.
Also fixed this kind of coding style in other for loop.

[   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
[   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
[   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
[   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
[   97.937611] Workqueue: events_unbound async_run_entry_fn
[   97.937616] Call Trace:
[   97.937617]  <TASK>
[   97.937619]  dump_stack_lvl+0x49/0x63
[   97.937624]  dump_stack+0x10/0x16
[   97.937626]  ubsan_epilogue+0x9/0x3f
[   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
[   97.937629]  ? __scm_send+0x348/0x440
[   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
[   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
[   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
[   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
[   97.937653]  pci_pm_suspend+0x7e/0x1a0
[   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
[   97.937657]  dpm_run_callback+0x54/0x190
[   97.937660]  __device_suspend+0x14c/0x4d0
[   97.937661]  async_suspend+0x23/0x70
[   97.937663]  async_run_entry_fn+0x33/0x120
[   97.937664]  process_one_work+0x21f/0x3f0
[   97.937666]  worker_thread+0x4a/0x3c0
[   97.937668]  ? process_one_work+0x3f0/0x3f0
[   97.937669]  kthread+0xf0/0x120
[   97.937671]  ? kthread_complete_and_exit+0x20/0x20
[   97.937672]  ret_from_fork+0x22/0x30
[   97.937676]  </TASK>

v2. fixed "warning: variable 'aq_vec' set but not used"

v3. simplified a for loop

Fixes: 97bde5c ("net: ethernet: aquantia: Support for NIC-specific code")
	Signed-off-by: Chia-Lin Kao (AceLan) <[email protected]>
	Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 2ba5e47)
	Signed-off-by: Anmol Jain <[email protected]>
	Resolve merge conflicts
        Kept upstream logic and adjusted local error accordingly

Update aq_nic.c

Update aq_nic.c
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-15711
Upstream status: https://git.kernel.org/pub/scm/virt/kvm/kvm.git

Intel TDX protects guest VM's from malicious host and certain physical
attacks.  TDX introduces a new operation mode, Secure Arbitration Mode
(SEAM) to isolate and protect guest VM's.  A TDX guest VM runs in SEAM and,
unlike VMX, direct control and interaction with the guest by the host VMM
is not possible.  Instead, Intel TDX Module, which also runs in SEAM,
provides a SEAMCALL API.

The SEAMCALL that provides the ability to enter a guest is TDH.VP.ENTER.
The TDX Module processes TDH.VP.ENTER, and enters the guest via VMX
VMLAUNCH/VMRESUME instructions.  When a guest VM-exit requires host VMM
interaction, the TDH.VP.ENTER SEAMCALL returns to the host VMM (KVM).

Add tdh_vp_enter() to wrap the SEAMCALL invocation of TDH.VP.ENTER;
tdh_vp_enter() needs to be noinstr because VM entry in KVM is noinstr
as well, which is for two reasons:
* marking the area as CT_STATE_GUEST via guest_state_enter_irqoff() and
  guest_state_exit_irqoff()
* IRET must be avoided between VM-exit and NMI handling, in order to
  avoid prematurely releasing the NMI inhibit.

TDH.VP.ENTER is different from other SEAMCALLs in several ways: it
uses more arguments, and after it returns some host state may need to be
restored.  Therefore tdh_vp_enter() uses __seamcall_saved_ret() instead of
__seamcall_ret(); since it is the only caller of __seamcall_saved_ret(),
it can be made noinstr also.

TDH.VP.ENTER arguments are passed through General Purpose Registers (GPRs).
For the special case of the TD guest invoking TDG.VP.VMCALL, nearly any GPR
can be used, as well as XMM0 to XMM15. Notably, RBP is not used, and Linux
mandates the TDX Module feature NO_RBP_MOD, which is enforced elsewhere.
Additionally, XMM registers are not required for the existing Guest
Hypervisor Communication Interface and are handled by existing KVM code
should they be modified by the guest.

There are 2 input formats and 5 output formats for TDH.VP.ENTER arguments.
Input #1 : Initial entry or following a previous async. TD Exit
Input #2 : Following a previous TDCALL(TDG.VP.VMCALL)
Output #1 : On Error (No TD Entry)
Output #2 : Async. Exits with a VMX Architectural Exit Reason
Output #3 : Async. Exits with a non-VMX TD Exit Status
Output #4 : Async. Exits with Cross-TD Exit Details
Output #5 : On TDCALL(TDG.VP.VMCALL)

Currently, to keep things simple, the wrapper function does not attempt
to support different formats, and just passes all the GPRs that could be
used.  The GPR values are held by KVM in the area set aside for guest
GPRs.  KVM code uses the guest GPR area (vcpu->arch.regs[]) to set up for
or process results of tdh_vp_enter().

Therefore changing tdh_vp_enter() to use more complex argument formats
would also alter the way KVM code interacts with tdh_vp_enter().

Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Message-ID: <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
(cherry picked from commit 69e23fa)
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-15711

Restore KVM's handling of a NULL kvm_x86_ops.mem_enc_ioctl, as the hook is
NULL on SVM when CONFIG_KVM_AMD_SEV=n, and TDX will soon follow suit.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/kvm-x86-ops.h:130 kvm_x86_vendor_init+0x178b/0x18e0
  Modules linked in:
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-rc2-dc1aead1a985-sink-vm #2 NONE
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_x86_vendor_init+0x178b/0x18e0
  Call Trace:
   <TASK>
   svm_init+0x2e/0x60
   do_one_initcall+0x56/0x290
   kernel_init_freeable+0x192/0x1e0
   kernel_init+0x16/0x130
   ret_from_fork+0x30/0x50
   ret_from_fork_asm+0x1a/0x30
   </TASK>
  ---[ end trace 0000000000000000 ]---

Opportunistically drop the superfluous curly braces.

Link: https://lore.kernel.org/all/[email protected]
Fixes: b2aaf38 ("KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
(cherry picked from commit f2d7993)
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit c5b6aba
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:02 2025 -0400

    locking/mutex: implement mutex_trylock_nested

    Despite the fact that several lockdep-related checks are skipped when
    calling trylock* versions of the locking primitives, for example
    mutex_trylock, each time the mutex is acquired, a held_lock is still
    placed onto the lockdep stack by __lock_acquire() which is called
    regardless of whether the trylock* or regular locking API was used.

    This means that if the caller successfully acquires more than
    MAX_LOCK_DEPTH locks of the same class, even when using mutex_trylock,
    lockdep will still complain that the maximum depth of the held lock stack
    has been reached and disable itself.

    For example, the following error currently occurs in the ARM version
    of KVM, once the code tries to lock all vCPUs of a VM configured with more
    than MAX_LOCK_DEPTH vCPUs, a situation that can easily happen on modern
    systems, where having more than 48 CPUs is common, and it's also common to
    run VMs that have vCPU counts approaching that number:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Luckily, in all instances that require locking all vCPUs, the
    'kvm->lock' is taken a priori, and that fact makes it possible to use
    the little known feature of lockdep, called a 'nest_lock', to avoid this
    warning and subsequent lockdep self-disablement.

    The action of 'nested lock' being provided to lockdep's lock_acquire(),
    causes the lockdep to detect that the top of the held lock stack contains
    a lock of the same class and then increment its reference counter instead
    of pushing a new held_lock item onto that stack.

    See __lock_acquire for more information.

    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit b586c5d
Author: Maxim Levitsky <[email protected]>
Date:   Mon May 12 14:04:06 2025 -0400

    KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs

    Use kvm_trylock_all_vcpus instead of a custom implementation when locking
    all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in
    which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs.

    This fixes the following false lockdep warning:

    [  328.171264] BUG: MAX_LOCK_DEPTH too low!
    [  328.175227] turning off the locking correctness validator.
    [  328.180726] Please attach the output of /proc/lock_stat to the bug report
    [  328.187531] depth: 48  max: 48!
    [  328.190678] 48 locks held by qemu-kvm/11664:
    [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
    [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
    [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

    Suggested-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Maxim Levitsky <[email protected]>
    Acked-by: Marc Zyngier <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit 8752e2b
Author: Sean Christopherson <[email protected]>
Date:   Fri Jan 17 15:42:00 2025 -0800

    KVM: selftests: Only validate counts for hardware-supported arch events

    In the Intel PMU counters test, only validate the counts for architectural
    events that are supported in hardware.  If an arch event isn't supported,
    the event selector may enable a completely different event, and thus the
    logic for the expected count is bogus.

    This fixes test failures on pre-Icelake systems due to the encoding for
    the architectural Top-Down Slots event corresponding to something else
    (at least on the Skylake family of CPUs).

    Note, validation relies on *hardware* support, not KVM support and not
    guest support.  Architectural events are all about enumerating the event
    selector encoding; lack of enumeration for an architectural event doesn't
    mean the event itself is unsupported, i.e. the event should still count as
    expected even if KVM and/or guest CPUID doesn't enumerate the event as
    being "architectural".

    Note #2, it's desirable to _program_ the architectural event encoding even
    if hardware doesn't support the event.  The count can't be validated when
    the event is fully enabled, but KVM should still let the guest program the
    event selector, and the PMC shouldn't count if the event is disabled.

    Fixes: 4f1bd6b ("KVM: selftests: Test Intel PMU architectural events on gp counters")
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-lkp/[email protected]
    Debugged-by: Dapeng Mi <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sean Christopherson <[email protected]>

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-95318

commit 17ec2f9
Author: Sean Christopherson <[email protected]>
Date:   Tue Jun 10 16:20:06 2025 -0700

    KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported

    Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
    guest CPUID model, as debug support is supposed to be available if RTM is
    supported, and there are no known downsides to letting the guest debug RTM
    aborts.

    Note, there are no known bug reports related to RTM_DEBUG, the primary
    motivation is to reduce the probability of breaking existing guests when a
    future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
    (KVM currently lets L2 run with whatever hardware supports; whoops).

    Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
    DR7.RTM.

    Fixes: 83c5291 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Sean Christopherson <[email protected]>

Conflicts:
    - no CPUID rework patch series

Signed-off-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
JIRA: https://issues.redhat.com/browse/RHEL-104899

commit 7d54c15
Author: Song Liu <[email protected]>
Date:   Tue May 24 10:08:39 2022 -0700

    ftrace: Clean up hash direct_functions on register failures

    We see the following GPF when register_ftrace_direct fails:

    [ ] general protection fault, probably for non-canonical address \
      0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
    [...]
    [ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70
    [ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...]
    [ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206
    [ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b
    [ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0
    [ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001
    [ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0
    [ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000
    [ ] FS:  00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000
    [ ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0
    [ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ ] Call Trace:
    [ ]  <TASK>
    [ ]  register_ftrace_direct+0x54/0x290
    [ ]  ? render_sigset_t+0xa0/0xa0
    [ ]  bpf_trampoline_update+0x3f5/0x4a0
    [ ]  ? 0xffffffffa00a3000
    [ ]  bpf_trampoline_link_prog+0xa9/0x140
    [ ]  bpf_tracing_prog_attach+0x1dc/0x450
    [ ]  bpf_raw_tracepoint_open+0x9a/0x1e0
    [ ]  ? find_held_lock+0x2d/0x90
    [ ]  ? lock_release+0x150/0x430
    [ ]  __sys_bpf+0xbd6/0x2700
    [ ]  ? lock_is_held_type+0xd8/0x130
    [ ]  __x64_sys_bpf+0x1c/0x20
    [ ]  do_syscall_64+0x3a/0x80
    [ ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ ] RIP: 0033:0x7fa9421defa9
    [ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...]
    [ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
    [ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9
    [ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011
    [ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000
    [ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001
    [ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001
    [ ]  </TASK>
    [ ] Modules linked in: klp_vm(OK)
    [ ] ---[ end trace 0000000000000000 ]---

    One way to trigger this is:
      1. load a livepatch that patches kernel function xxx;
      2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now);
      3. repeat #2 => gpf.

    This is because the entry is added to direct_functions, but not removed.
    Fix this by remove the entry from direct_functions when
    register_ftrace_direct fails.

    Also remove the last trailing space from ftrace.c, so we don't have to
    worry about it anymore.

    Link: https://lkml.kernel.org/r/[email protected]

    Cc: [email protected]
    Fixes: 763e34e ("ftrace: Add register_ftrace_direct()")
    Signed-off-by: Song Liu <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>

Signed-off-by: Gregory Bell <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 16, 2025
JIRA: https://issues.redhat.com/browse/RHEL-101627

CVE: CVE-2025-38486

commit 834bce6
Author: Amit Pundir <[email protected]>
Date: Wed Jul 9 23:19:49 2025 +0530

    soundwire: Revert "soundwire: qcom: Add set_channel_map api support"

    This reverts commit 7796c97.

    This patch broke Dragonboard 845c (sdm845). I see:

        Unexpected kernel BRK exception at EL1
        Internal error: BRK handler: 00000000f20003e8 [#1]  SMP
        pc : qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom]
        lr : snd_soc_dai_set_channel_map+0x34/0x78
        Call trace:
         qcom_swrm_set_channel_map+0x7c/0x80 [soundwire_qcom] (P)
         sdm845_dai_init+0x18c/0x2e0 [snd_soc_sdm845]
         snd_soc_link_init+0x28/0x6c
         snd_soc_bind_card+0x5f4/0xb0c
         snd_soc_register_card+0x148/0x1a4
         devm_snd_soc_register_card+0x50/0xb0
         sdm845_snd_platform_probe+0x124/0x148 [snd_soc_sdm845]
         platform_probe+0x6c/0xd0
         really_probe+0xc0/0x2a4
         __driver_probe_device+0x7c/0x130
         driver_probe_device+0x40/0x118
         __device_attach_driver+0xc4/0x108
         bus_for_each_drv+0x8c/0xf0
         __device_attach+0xa4/0x198
         device_initial_probe+0x18/0x28
         bus_probe_device+0xb8/0xbc
         deferred_probe_work_func+0xac/0xfc
         process_one_work+0x244/0x658
         worker_thread+0x1b4/0x360
         kthread+0x148/0x228
         ret_from_fork+0x10/0x20
        Kernel panic - not syncing: BRK handler: Fatal exception

    Dan has also reported following issues with the original patch
    https://lore.kernel.org/all/[email protected]/

    Bug #1:
    The zeroeth element of ctrl->pconfig[] is supposed to be unused.  We
    start counting at 1.  However this code sets ctrl->pconfig[0].ch_mask = 128.

    Bug #2:
    There are SLIM_MAX_TX_PORTS (16) elements in tx_ch[] array but only
    QCOM_SDW_MAX_PORTS + 1 (15) in the ctrl->pconfig[] array so it corrupts
    memory like Yongqin Liu pointed out.

    Bug 3:
    Like Jie Gan pointed out, it erases all the tx information with the rx
    information.

    Cc: [email protected] # v6.15+
    Signed-off-by: Amit Pundir <[email protected]>
    Acked-by: Srinivas Kandagatla <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Vinod Koul <[email protected]>

Signed-off-by: Jaroslav Kysela <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 16, 2025
With KASAN enabled, it is possible to get a slab out of bounds
during mount to ksmbd due to missing check in parse_server_interfaces()
(see below):

 BUG: KASAN: slab-out-of-bounds in
 parse_server_interfaces+0x14ee/0x1880 [cifs]
 Read of size 4 at addr ffff8881433dba98 by task mount/9827

 CPU: 5 UID: 0 PID: 9827 Comm: mount Tainted: G
 OE       6.16.0-rc2-kasan #2 PREEMPT(voluntary)
 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT,
 BIOS 2.13.1 06/14/2019
 Call Trace:
  <TASK>
 dump_stack_lvl+0x9f/0xf0
 print_report+0xd1/0x670
 __virt_addr_valid+0x22c/0x430
 ? parse_server_interfaces+0x14ee/0x1880 [cifs]
 ? kasan_complete_mode_report_info+0x2a/0x1f0
 ? parse_server_interfaces+0x14ee/0x1880 [cifs]
   kasan_report+0xd6/0x110
   parse_server_interfaces+0x14ee/0x1880 [cifs]
   __asan_report_load_n_noabort+0x13/0x20
   parse_server_interfaces+0x14ee/0x1880 [cifs]
 ? __pfx_parse_server_interfaces+0x10/0x10 [cifs]
 ? trace_hardirqs_on+0x51/0x60
 SMB3_request_interfaces+0x1ad/0x3f0 [cifs]
 ? __pfx_SMB3_request_interfaces+0x10/0x10 [cifs]
 ? SMB2_tcon+0x23c/0x15d0 [cifs]
 smb3_qfs_tcon+0x173/0x2b0 [cifs]
 ? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
 ? cifs_get_tcon+0x105d/0x2120 [cifs]
 ? do_raw_spin_unlock+0x5d/0x200
 ? cifs_get_tcon+0x105d/0x2120 [cifs]
 ? __pfx_smb3_qfs_tcon+0x10/0x10 [cifs]
 cifs_mount_get_tcon+0x369/0xb90 [cifs]
 ? dfs_cache_find+0xe7/0x150 [cifs]
 dfs_mount_share+0x985/0x2970 [cifs]
 ? check_path.constprop.0+0x28/0x50
 ? save_trace+0x54/0x370
 ? __pfx_dfs_mount_share+0x10/0x10 [cifs]
 ? __lock_acquire+0xb82/0x2ba0
 ? __kasan_check_write+0x18/0x20
 cifs_mount+0xbc/0x9e0 [cifs]
 ? __pfx_cifs_mount+0x10/0x10 [cifs]
 ? do_raw_spin_unlock+0x5d/0x200
 ? cifs_setup_cifs_sb+0x29d/0x810 [cifs]
 cifs_smb3_do_mount+0x263/0x1990 [cifs]

Reported-by: Namjae Jeon <[email protected]>
Tested-by: Namjae Jeon <[email protected]>
Cc: [email protected]
Signed-off-by: Steve French <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 16, 2025
[ Upstream commit 17ce3e5 ]

syzbot reported that the netfilter bpf prog can be called without
migration disabled in xmit path.

Then the assertion in __bpf_prog_run() fails, triggering the splat
below. [0]

Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf().

[0]:
BUG: assuming non migratable context at ./include/linux/filter.h:703
in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session
3 locks held by sshd-session/5829:
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline]
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241
CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
 __cant_migrate kernel/sched/core.c:8860 [inline]
 __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834
 __bpf_prog_run include/linux/filter.h:703 [inline]
 bpf_prog_run include/linux/filter.h:725 [inline]
 nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20
 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline]
 nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623
 nf_hook+0x370/0x680 include/linux/netfilter.h:272
 NF_HOOK_COND include/linux/netfilter.h:305 [inline]
 ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433
 dst_output include/net/dst.h:459 [inline]
 ip_local_out net/ipv4/ip_output.c:129 [inline]
 __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527
 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479
 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline]
 tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838
 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021
 tcp_push+0x225/0x700 net/ipv4/tcp.c:759
 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359
 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396
 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 sock_write_iter+0x4aa/0x5b0 net/socket.c:1131
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x6c7/0x1150 fs/read_write.c:686
 ksys_write+0x1f8/0x250 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe7d365d407
Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP:

Fixes: fd9c663 ("bpf: minimal support for programs hooked into netfilter framework")
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Tested-by: [email protected]
Acked-by: Florian Westphal <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 16, 2025
[ Upstream commit 4668619 ]

When the root of a nested PCIe bridge configuration is unplugged, the
pnv_php driver leaked the allocated IRQ resources for the child bridges'
hotplug event notifications, resulting in a panic.

Fix this by walking all child buses and deallocating all its IRQ resources
before calling pci_hp_remove_devices().

Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so
that it is only destroyed in pnv_php_free_slot(), instead of
pnv_php_disable_irq(). This is required since pnv_php_disable_irq() will
now be called by workers triggered by hot unplug interrupts, so the
workqueue needs to stay allocated.

The abridged kernel panic that occurs without this patch is as follows:

  WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292 msi_device_data_release+0x6c/0x9c
  CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2
  Call Trace:
   msi_device_data_release+0x34/0x9c (unreliable)
   release_nodes+0x64/0x13c
   devres_release_all+0xc0/0x140
   device_del+0x2d4/0x46c
   pci_destroy_dev+0x5c/0x194
   pci_hp_remove_devices+0x90/0x128
   pci_hp_remove_devices+0x44/0x128
   pnv_php_disable_slot+0x54/0xd4
   power_write_file+0xf8/0x18c
   pci_slot_attr_store+0x40/0x5c
   sysfs_kf_write+0x64/0x78
   kernfs_fop_write_iter+0x1b0/0x290
   vfs_write+0x3bc/0x50c
   ksys_write+0x84/0x140
   system_call_exception+0x124/0x230
   system_call_vectored_common+0x15c/0x2ec

Signed-off-by: Shawn Anastasio <[email protected]>
Signed-off-by: Timothy Pearson <[email protected]>
[bhelgaas: tidy comments]
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Link: https://patch.msgid.link/2013845045.1359852.1752615367790.JavaMail.zimbra@raptorengineeringinc.com
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant