-
Notifications
You must be signed in to change notification settings - Fork 56.1k
Tinkered with the README. #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please note that pull requests are not the proper procedure to submit patches to the Linux kernel (Linus put the kernel up here because kernel.org's master mirror is down; it seems that he doesn't like the pull request system[1], but github does not allow him to disable it). Please read Documentation/SubmittingPatches - you must write a proper commit message (actually describing what changed, not just 'tinkered with'), add a Signed-Off-By line, and submit to the linux kernel mailing list. |
That's a bit too much work for the usual github stuff. Perhaps I'll just leave it alone and let the usual kernel.org hackers help out. |
The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Cc: [email protected] Signed-off-by: James Bottomley <[email protected]>
The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
commit a18a920 upstream. This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Signed-off-by: James Bottomley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f992ae8 upstream. The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
Nothing requires that we lock the filesystem until the root inode is provided. Also iget5_locked() triggers a warning because we are holding the filesystem lock while allocating the inode, which result in a lockdep suspicion that we have a lock inversion against the reclaim path: [ 1986.896979] ================================= [ 1986.896990] [ INFO: inconsistent lock state ] [ 1986.896997] 3.1.1-main #8 [ 1986.897001] --------------------------------- [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 1986.897023] (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at: [ 1986.897050] [<c014a5b9>] mark_held_locks+0xae/0xd0 [ 1986.897060] [<c014aab3>] lockdep_trace_alloc+0x7d/0x91 [ 1986.897068] [<c0190ee0>] kmem_cache_alloc+0x1a/0x93 [ 1986.897078] [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d [ 1986.897088] [<c01a5b06>] alloc_inode+0x14/0x5f [ 1986.897097] [<c01a5cb9>] iget5_locked+0x62/0x13a [ 1986.897106] [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9 [ 1986.897114] [<c01953da>] mount_bdev+0x10b/0x159 [ 1986.897123] [<c01e764d>] get_super_block+0x10/0x12 [ 1986.897131] [<c0195b38>] mount_fs+0x59/0x12d [ 1986.897138] [<c01a80d1>] vfs_kern_mount+0x45/0x7a [ 1986.897147] [<c01a83e3>] do_kern_mount+0x2f/0xb0 [ 1986.897155] [<c01a987a>] do_mount+0x5c2/0x612 [ 1986.897163] [<c01a9a72>] sys_mount+0x61/0x8f [ 1986.897170] [<c044060c>] sysenter_do_call+0x12/0x32 [ 1986.897181] irq event stamp: 7509691 [ 1986.897186] hardirqs last enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93 [ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93 [ 1986.897209] softirqs last enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd [ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d [ 1986.897234] [ 1986.897235] other info that might help us debug this: [ 1986.897242] Possible unsafe locking scenario: [ 1986.897244] [ 1986.897250] CPU0 [ 1986.897254] ---- [ 1986.897257] lock(&REISERFS_SB(s)->lock); [ 1986.897265] <Interrupt> [ 1986.897269] lock(&REISERFS_SB(s)->lock); [ 1986.897276] [ 1986.897277] *** DEADLOCK *** [ 1986.897278] [ 1986.897286] no locks held by kswapd0/16. [ 1986.897291] [ 1986.897292] stack backtrace: [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8 [ 1986.897306] Call Trace: [ 1986.897314] [<c0439e76>] ? printk+0xf/0x11 [ 1986.897324] [<c01482d1>] print_usage_bug+0x20e/0x21a [ 1986.897332] [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172 [ 1986.897341] [<c014855c>] mark_lock+0x27f/0x483 [ 1986.897349] [<c0148d88>] __lock_acquire+0x628/0x1472 [ 1986.897358] [<c0149fae>] lock_acquire+0x47/0x5e [ 1986.897366] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897384] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897397] [<c043b5ef>] mutex_lock_nested+0x35/0x26f [ 1986.897409] [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a [ 1986.897421] [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a [ 1986.897433] [<c01e2edd>] map_block_for_writepage+0xc9/0x590 [ 1986.897448] [<c01b1706>] ? create_empty_buffers+0x33/0x8f [ 1986.897461] [<c0121124>] ? get_parent_ip+0xb/0x31 [ 1986.897472] [<c043ef7f>] ? sub_preempt_count+0x81/0x8e [ 1986.897485] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d [ 1986.897496] [<c0121124>] ? get_parent_ip+0xb/0x31 [ 1986.897508] [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7 [ 1986.897521] [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde [ 1986.897533] [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138 [ 1986.897546] [<c014a71e>] ? trace_hardirqs_on+0xb/0xd [ 1986.897559] [<c0177b38>] shrink_page_list+0x34f/0x5e2 [ 1986.897572] [<c01780a7>] shrink_inactive_list+0x172/0x22c [ 1986.897585] [<c0178464>] shrink_zone+0x303/0x3b1 [ 1986.897597] [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d [ 1986.897611] [<c01788c9>] kswapd+0x3b7/0x5f2 The deadlock shouldn't happen since we are doing that allocation in the mount path, the filesystem is not available for any reclaim. Still the warning is annoying. To solve this, acquire the lock later only where we need it, right before calling reiserfs_read_locked_inode() that wants to lock to walk the tree. Reported-by: Knut Petersen <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Al Viro <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jeff Mahoney <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
$ wget "http://pkgs.fedoraproject.org/gitweb/?p=kernel.git;a=blob_plain;f=mac80211_offchannel_rework_revert.patch;h=859799714cd85a58450ecde4a1dabc5adffd5100;hb=refs/heads/f16" -O mac80211_offchannel_rework_revert.patch $ patch -p1 --dry-run < mac80211_offchannel_rework_revert.patch patching file net/mac80211/ieee80211_i.h Hunk #1 succeeded at 702 (offset 8 lines). Hunk #2 succeeded at 712 (offset 8 lines). Hunk #3 succeeded at 1143 (offset -57 lines). patching file net/mac80211/main.c patching file net/mac80211/offchannel.c Hunk #1 succeeded at 18 (offset 1 line). Hunk #2 succeeded at 42 (offset 1 line). Hunk #3 succeeded at 78 (offset 1 line). Hunk #4 succeeded at 96 (offset 1 line). Hunk #5 succeeded at 162 (offset 1 line). Hunk torvalds#6 succeeded at 182 (offset 1 line). patching file net/mac80211/rx.c Hunk #1 succeeded at 421 (offset 4 lines). Hunk #2 succeeded at 2864 (offset 87 lines). patching file net/mac80211/scan.c Hunk #1 succeeded at 213 (offset 1 line). Hunk #2 succeeded at 256 (offset 2 lines). Hunk #3 succeeded at 288 (offset 2 lines). Hunk #4 succeeded at 333 (offset 2 lines). Hunk #5 succeeded at 482 (offset 2 lines). Hunk torvalds#6 succeeded at 498 (offset 2 lines). Hunk torvalds#7 succeeded at 516 (offset 2 lines). Hunk torvalds#8 succeeded at 530 (offset 2 lines). Hunk torvalds#9 succeeded at 555 (offset 2 lines). patching file net/mac80211/tx.c Hunk #1 succeeded at 259 (offset 1 line). patching file net/mac80211/work.c Hunk #1 succeeded at 899 (offset -2 lines). Hunk #2 succeeded at 949 (offset -2 lines). Hunk #3 succeeded at 1046 (offset -2 lines). Hunk #4 succeeded at 1054 (offset -2 lines).
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
If the netdev is already in NETREG_UNREGISTERING/_UNREGISTERED state, do not update the real num tx queues. netdev_queue_update_kobjects() is already called via remove_queue_kobjects() at NETREG_UNREGISTERING time. So, when upper layer driver, e.g., FCoE protocol stack is monitoring the netdev event of NETDEV_UNREGISTER and calls back to LLD ndo_fcoe_disable() to remove extra queues allocated for FCoE, the associated txq sysfs kobjects are already removed, and trying to update the real num queues would cause something like below: ... PID: 25138 TASK: ffff88021e64c440 CPU: 3 COMMAND: "kworker/3:3" #0 [ffff88021f007760] machine_kexec at ffffffff810226d9 #1 [ffff88021f0077d0] crash_kexec at ffffffff81089d2d #2 [ffff88021f0078a0] oops_end at ffffffff813bca78 #3 [ffff88021f0078d0] no_context at ffffffff81029e72 #4 [ffff88021f007920] __bad_area_nosemaphore at ffffffff8102a155 #5 [ffff88021f0079f0] bad_area_nosemaphore at ffffffff8102a23e torvalds#6 [ffff88021f007a00] do_page_fault at ffffffff813bf32e torvalds#7 [ffff88021f007b10] page_fault at ffffffff813bc045 [exception RIP: sysfs_find_dirent+17] RIP: ffffffff81178611 RSP: ffff88021f007bc0 RFLAGS: 00010246 RAX: ffff88021e64c440 RBX: ffffffff8156cc63 RCX: 0000000000000004 RDX: ffffffff8156cc63 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88021f007be0 R8: 0000000000000004 R9: 0000000000000008 R10: ffffffff816fed00 R11: 0000000000000004 R12: 0000000000000000 R13: ffffffff8156cc63 R14: 0000000000000000 R15: ffff8802222a0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#8 [ffff88021f007be8] sysfs_get_dirent at ffffffff81178c07 torvalds#9 [ffff88021f007c18] sysfs_remove_group at ffffffff8117ac27 torvalds#10 [ffff88021f007c48] netdev_queue_update_kobjects at ffffffff813178f9 torvalds#11 [ffff88021f007c88] netif_set_real_num_tx_queues at ffffffff81303e38 torvalds#12 [ffff88021f007cc8] ixgbe_set_num_queues at ffffffffa0249763 [ixgbe] torvalds#13 [ffff88021f007cf8] ixgbe_init_interrupt_scheme at ffffffffa024ea89 [ixgbe] torvalds#14 [ffff88021f007d48] ixgbe_fcoe_disable at ffffffffa0267113 [ixgbe] torvalds#15 [ffff88021f007d68] vlan_dev_fcoe_disable at ffffffffa014fef5 [8021q] torvalds#16 [ffff88021f007d78] fcoe_interface_cleanup at ffffffffa02b7dfd [fcoe] torvalds#17 [ffff88021f007df8] fcoe_destroy_work at ffffffffa02b7f08 [fcoe] torvalds#18 [ffff88021f007e18] process_one_work at ffffffff8105d7ca torvalds#19 [ffff88021f007e68] worker_thread at ffffffff81060513 torvalds#20 [ffff88021f007ee8] kthread at ffffffff810648b6 torvalds#21 [ffff88021f007f48] kernel_thread_helper at ffffffff813c40f4 Signed-off-by: Yi Zou <[email protected]> Tested-by: Ross Brattain <[email protected]> Tested-by: Stephen Ko <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
ata_port lifetime in libata follows the host. In libsas it follows the scsi_target. Once scsi_remove_device() has caused all commands to be completed it allows scsi_remove_target() to immediately proceed to freeing the ata_port causing bug reports like: [ 848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107 [ 848.400262] general protection fault: 0000 [#1] SMP [ 848.406244] CPU 4 [ 848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan] [ 848.432060] [ 848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ torvalds#8 Intel Corporation S2600CP/S2600CP [ 848.445310] RIP: 0010:[<ffffffff8126a68c>] [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.454787] RSP: 0018:ffff8807f868dca0 EFLAGS: 00010002 [ 848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0 [ 848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002 [ 848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b [ 848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b [ 848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e [ 848.503067] FS: 0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000 [ 848.512899] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0 [ 848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000) [ 848.555327] Stack: [ 848.557959] ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0 [ 848.567072] ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703 [ 848.576190] ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb [ 848.585281] Call Trace: [ 848.588409] [<ffffffff8126a6e0>] spin_bug+0x26/0x28 [ 848.594357] [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88 [ 848.601283] [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65 [ 848.609089] [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata] [ 848.618331] [<ffffffff81061813>] ? async_schedule+0x17/0x17 [ 848.625060] [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas] [ 848.632655] [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125 [ 848.639670] [<ffffffff81057439>] process_one_work+0x207/0x38d [ 848.646577] [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d [ 848.653681] [<ffffffff810576f7>] worker_thread+0x138/0x21c [ 848.660305] [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d [ 848.667493] [<ffffffff8105b098>] kthread+0x9d/0xa5 [ 848.673382] [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166 [ 848.681304] [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10 [ 848.688324] [<ffffffff814af534>] ? retint_restore_args+0x13/0x13 [ 848.695530] [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b [ 848.702929] [<ffffffff814b7700>] ? gs_change+0x13/0x13 [ 848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89 [ 848.732467] RIP [<ffffffff8126a68c>] spin_dump+0x5e/0x8c [ 848.738905] RSP <ffff8807f868dca0> [ 848.743743] ---[ end trace 143161646eee8caa ]--- ...so arrange for the ata_port to have the same end of life as the domain device. Reported-by: Marcin Tomczak <[email protected]> Acked-by: Jeff Garzik <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: James Bottomley <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 #6 [d72d3cb4] isolate_migratepages at c030b15a #7 [d72d3d1] zone_watermark_ok at c02d26cb #8 [d72d3d2c] compact_zone at c030b8d #9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/890952 commit a18a920 upstream. This patch validates sdev pointer in scsi_dh_activate before proceeding further. Without this check we might see the panic as below. I have seen this panic multiple times.. Call trace: #0 [ffff88007d647b50] machine_kexec at ffffffff81020902 #1 [ffff88007d647ba0] crash_kexec at ffffffff810875b0 #2 [ffff88007d647c70] oops_end at ffffffff8139c650 #3 [ffff88007d647c90] __bad_area_nosemaphore at ffffffff8102dd15 #4 [ffff88007d647d50] page_fault at ffffffff8139b8cf [exception RIP: scsi_dh_activate+0x82] RIP: ffffffffa0041922 RSP: ffff88007d647e00 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000093c5 RDX: 00000000000093c5 RSI: ffffffffa02e6640 RDI: ffff88007cc88988 RBP: 000000000000000f R8: ffff88007d646000 R9: 0000000000000000 R10: ffff880082293790 R11: 00000000ffffffff R12: ffff88007cc88988 R13: 0000000000000000 R14: 0000000000000286 R15: ffff880037b845e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #5 [ffff88007d647e38] run_workqueue at ffffffff81060268 torvalds#6 [ffff88007d647e78] worker_thread at ffffffff81060386 torvalds#7 [ffff88007d647ee8] kthread at ffffffff81064436 torvalds#8 [ffff88007d647f48] kernel_thread at ffffffff81003fba Signed-off-by: Babu Moger <[email protected]> Signed-off-by: James Bottomley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/890952 commit f992ae8 upstream. The following command sequence triggers an oops. # mount /dev/sdb1 /mnt # echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete # umount /mnt general protection fault: 0000 [#1] PREEMPT SMP CPU 2 Modules linked in: Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ torvalds#8 Bochs Bochs RIP: 0010:[<ffffffff810d0879>] [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60 ... Call Trace: [<ffffffff810d2845>] lock_acquire+0x95/0x140 [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50 [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70 [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0 [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0 [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0 [<ffffffff811c40df>] blkdev_put+0x5f/0x190 [<ffffffff8118f18d>] kill_block_super+0x4d/0x80 [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70 [<ffffffff8119003a>] deactivate_super+0x4a/0x70 [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130 [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0 [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b This is because bdev holds on to disk but disk doesn't pin the associated queue. If a SCSI device is removed while the device is still open, the sdev puts the base reference to the queue on release. When the bdev is finally released, the associated queue is already gone along with the bdi and bdev_inode_switch_bdi() ends up dereferencing already freed bdi. Even if it were not for this bug, disk not holding onto the associated queue is very unusual and error-prone. Fix it by making add_disk() take an extra reference to its queue and put it on disk_release() and ensuring that disk and its fops owner are put in that order after all accesses to the disk and queue are complete. Signed-off-by: Tejun Heo <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/907778 commit ea51d13 upstream. If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 torvalds#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 torvalds#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 torvalds#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 torvalds#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 torvalds#10 <signal handler called> torvalds#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () torvalds#12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]> Signed-off-by: Brad Figg <[email protected]>
…S block during isolation for migration BugLink: http://bugs.launchpad.net/bugs/931719 commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb #1 [d72d3b24] oops_end at c05c5322 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60 #3 [d72d3bec] bad_area at c0227fb6 #4 [d72d3c00] do_page_fault at c05c72e #5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 torvalds#6 [d72d3cb4] isolate_migratepages at c030b15a torvalds#7 [d72d3d1] zone_watermark_ok at c02d26cb torvalds#8 [d72d3d2c] compact_zone at c030b8d torvalds#9 [d72d3d68] compact_zone_order at c030bba1 torvalds#10 [d72d3db4] try_to_compact_pages at c030bc84 torvalds#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 torvalds#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 torvalds#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 torvalds#14 [d72d3eb8] alloc_pages_vma at c030a845 torvalds#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb torvalds#16 [d72d3f00] handle_mm_fault at c02f36c6 torvalds#17 [d72d3f30] do_page_fault at c05c70ed torvalds#18 [d72d3fb] error_code (via page_fault) at c05c47a4 EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Tim Gardner <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
Driver for the BCM2835 ISP hardware block. This driver uses the MMAL component to program the ISP hardware through the VC firmware. The ISP component can produce two video stream outputs, and Bayer image statistics. This can't be encompassed in a simple V4L2 M2M device, so create a new device that registers 4 video nodes. This patch squashes all the development patches from the earlier rpi-5.4.y branch into one Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Add the unpacked (16bpp) raw formats Now that the firmware supports the unpacked (16bpp) variants of the MIPI raw formats, add the mappings. Signed-off-by: Dave Stevenson <[email protected]> staging/bcm2835-isp: Log the number of excess supported formats When logging that the firmware has provided more supported formats than we had allocated storage for, log the number allocated and returned. Signed-off-by: Dave Stevenson <[email protected]> staging: vc04_services: ISP: Add colour denoise control Add colour denoise control to the bcm2835 driver through a new v4l2 control: V4L2_CID_USER_BCM2835_ISP_CDN. Add the accompanying MMAL configuration structure definitions as well. Signed-off-by: Naushir Patuck <[email protected]> bcm2835-isp: Allow formats with different colour spaces. Each supported format now includes a mask showing the allowed colour spaces, as well as a default colour space for when one was not specified. Additionally we translate the colour space to mmal format and pass it over to the VideoCore. Signed-off-by: David Plowman <[email protected]> media: i2c: add ov9281 driver. Change-Id: I7b77250bbc56d2f861450cf77271ad15f9b88ab1 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: fix mclk issue when probe multiple camera. Takes the ov9281 part only from the Rockchip's patch. Change-Id: I30e833baf2c1bb07d6d87ddb3b00759ab45a90e4 Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: add enum_frame_interval function for iq tool 2.2 and hal3 Adds the ov9281 parts of the Rockchip patch adding enum_frame_interval to a large number of drivers. Change-Id: I03344cd6cf278dd7c18fce8e97479089ef185a5c Signed-off-by: Zefa Chen <[email protected]> media: i2c: ov9281: Fixup for recent kernel releases, and remove custom code The Rockchip driver was based on a 4.4 kernel, and had several custom Rockchip parts. Update to 5.4 kernel APIs, with the relevant controls required by libcamera, and remove custom Rockchip parts. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Read chip ID via 2 reads Vision Components have made an OV9281 module which blocks reading back the majority of registers to comply with NDAs, and in doing so doesn't allow auto-increment register reading as used when reading the chip ID. Use two reads and manually combine the results. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Add support for 8 bit readout The sensor supports 8 bit mode as well as 10bit, so add the relevant code to allow selection of this. Signed-off-by: Dave Stevenson <[email protected]> media: ov9281: Add 1280x720 and 640x480 modes Breaks out common register set and adds the different registers for 1280x720 (cropped) and 640x480 (skipped) modes Signed-off-by: Dave Stevenson <[email protected]> Fixed picture line bug in all ov9281 modes Signed-off-by: Mathias Anhalt <[email protected]> Added hflip and vflip controls to ov9281 Signed-off-by: Mathias Anhalt <[email protected]> media: i2c: ov9281: Remove override of subdev name From the original Rockchip driver, the subdev was renamed from the default to being "mov9281 <dev_name>" whereas the default would have been "ov9281 <dev_name>". Remove the override to drop back to the default rather than a vendor custom string. Signed-off-by: Dave Stevenson <[email protected]> media: v4l2-subdev: add subdev-wide state struct Signed-off-by: Dom Cobley <[email protected]> media: i2c: ov9281: Add fwnode properties controls Add call to v4l2_ctrl_new_fwnode_properties to read and create the fwnode based controls. Signed-off-by: Dave Stevenson <[email protected]> media: i2c: ov9281: Sensor should report RAW color space Tested on Raspberry Pi running libcamera. Signed-off-by: David Plowman <[email protected]> Partial revert "media: i2c: add ov9281 driver." This partially reverts commit 84e98e3a4f3eecb168ceb80231c3e8252929892e. The commit had merged some changes to other drivers with adding the ov9281 driver. Only the ov9281 parts have been reverted. staging/bcm2835-isp: Fix compiler warning The result of dividing a u32 by a size_t is an unsigned int on arm32 and a long unsigned int on arm64. Use "%zu" (the size_t format) to remove the build warning for 64-bit builds. Signed-off-by: Phil Elwell <[email protected]> staging: vc04_services: isp: Set the YUV420/YVU420 format stride to 64 bytes The bcm2835 ISP requires the base address of all input/output planes to have 32 byte alignment. Using a Y stride of 32 bytes would not guarantee that the V plane would fulfil this, e.g. a height of 650 lines would mean the V plane buffer is not 32 byte aligned for YUV420 formats. Having a Y stride of 64 bytes would ensure both U and V planes have a 32 byte alignment, as the luma height will always be an even number of lines. Signed-off-by: Naushir Patuck <[email protected]> vc04_services: isp: Report input node as wanting full range RAW color space RAW color spaces are more usually reported as having full range quantization. Tested using libcamera. Signed-off-by: David Plowman <[email protected]> drivers: bcm2835_isp: Allow multiple users for the ISP driver. Add a second (identical) set of device nodes to allow concurrent use of the ISP hardware by another user. This change effectively creates a second state structure (struct bcm2835_isp_dev) to maintain independent state for the second user. Node and media entity names are appened with the instance index appropriately. Further users can be added by changing the BCM2835_ISP_NUM_INSTANCES define. Signed-off-by: Naushir Patuck <[email protected]> drivers: bcm2835_isp: Fix div by 0 bug. Fix a possible division by 0 bug when setting up the mmal port for the stats port. Signed-off-by: Naushir Patuck <[email protected]> staging/bcm2835-isp: Fix cleanup after init fail bcm2835_isp_remove is called on an initialisation failure, but at that point the drvdata hasn't been set. This causes a crash when e.g. using the cutdown firmware (gpu_mem=16). Move platform_set_drvdata before the instance probing loop to avoid the problem. See: raspberrypi/linux#4774 Signed-off-by: Phil Elwell <[email protected]> bcm2835-v4l2-isp: Add missing lock initialization ISP device allocation is dynamic hence the locks too. struct mutex queue_lock is not initialized which result in bug. Fixing same by initializing it. [ 29.847138] INFO: trying to register non-static key. [ 29.847156] The code is fine but needs lockdep annotation, or maybe [ 29.847159] you didn't initialize this object before use? [ 29.847161] turning off the locking correctness validator. [ 29.847167] CPU: 1 PID: 343 Comm: v4l_id Tainted: G C 5.15.11-rt24-v8+ torvalds#8 [ 29.847187] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 29.847194] Call trace: [ 29.847197] dump_backtrace+0x0/0x1b8 [ 29.847227] show_stack+0x20/0x30 [ 29.847240] dump_stack_lvl+0x8c/0xb8 [ 29.847254] dump_stack+0x18/0x34 [ 29.847263] register_lock_class+0x494/0x4a0 [ 29.847278] __lock_acquire+0x80/0x1680 [ 29.847289] lock_acquire+0x214/0x3a0 [ 29.847300] mutex_lock_nested+0x70/0xc8 [ 29.847312] _vb2_fop_release+0x3c/0xa8 [videobuf2_v4l2] [ 29.847346] vb2_fop_release+0x34/0x60 [videobuf2_v4l2] [ 29.847367] v4l2_release+0xc8/0x108 [videodev] [ 29.847453] __fput+0x8c/0x258 [ 29.847476] ____fput+0x18/0x28 [ 29.847487] task_work_run+0x98/0x180 [ 29.847502] do_notify_resume+0x228/0x3f8 [ 29.847515] el0_svc+0xec/0xf0 [ 29.847523] el0t_64_sync_handler+0x90/0xb8 [ 29.847531] el0t_64_sync+0x180/0x184 Signed-off-by: Padmanabha Srinivasaiah <[email protected]> staging: vc04_services: isp: Permit all sRGB colour spaces on ISP outputs ISP outputs actually support all colour spaces that are fundamentally sRGB underneath, regardless of whether an RGB or YUV output format is actually requested. Signed-off-by: David Plowman <[email protected]> drivers: staging: bcm2835-isp: Do not cleanup mmal vcsm buffer on stop_streaming On stop_streaming() the vcsm buffer handle gets released by the buffer cleanup code. This will subsequently cause and error if userland re-queues the same buffer on the next start_streaming() call. Remove this cleanup code and rely on the vb2_ops->buf_cleanup() call to do the cleanups instead. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Clear LS table handle in the firmware When all nodes have stopped streaming, ensure the firmware has released its handle on the LS table dmabuf. This is done by passing a null handle in the LS params. Signed-off-by: Naushir Patuck <[email protected]> drivers: staging: bcm2835-isp: Respect caller's stride value The stride value reported for output image buffers should be at least as large as any value that was passed in by the caller (subject to correct alignment for the pixel format). If the value is zero (meaning no value was passed), or is too small, the minimum acceptable value will be substituted. Signed-off-by: David Plowman <[email protected]> staging: vc04_services: bcm2835-isp: Drop include Makefile directive Drop the include directive. They can break the build, when one only wants to build a subdirectory. Replace with "../" for the includes in the bcm2835-isp instead. The fix is equivalent to the four patches between 29d49a7 ("staging: vc04_services: bcm2835-audio: Drop include Makefile directive")...2529ca2 ("staging: vc04_services: interface: Drop include Makefile directive") Fixes: c8f89c9551c1 ("staging: vc04_services: ISP: Add a more complex ISP processing component") Suggested-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Register with vchiq_bus_type Register the bcm2835-v4l2-isp driver with the vchiq_bus_type instead of using the platform driver/device. Signed-off-by: Kieran Bingham <[email protected]> staging: vc04_services: bcm2835-v4l2-isp: Explicitly set DMA mask The platform model originally handled the DMA mask. Now that we are on the vchiq_bus we need to explicitly set this. Signed-off-by: Kieran Bingham <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
Biju Das <[email protected]> says: The CAN-FD module on RZ/G3E is very similar to the one on both R-Car V4H and RZ/G2L, but differs in some hardware parameters: * No external clock, but instead has ram clock. * Support up to 6 channels. * 20 interrupts. v8->v9: * Collected tags. * Added missing header bitfield.h. * Fixed logical error ch->BIT(ch) in rcar_canfd_global_error(). * Removed unneeded double space in rcar_canfd_setrnc(). * Updated commit description in patch#15. v7->v8: * Collected tags. * Updated commit description for patch#{5,9,15,16,17}. * Replaced the macro RCANFD_GERFL_EEF0_7->RCANFD_GERFL_EEF. * Dropped the redundant macro RCANFD_GERFL_EEF(ch). * Added patch for dropping the mask operation in RCANFD_GAFLCFG_SETRNC macro. * Converted RCANFD_GAFLCFG_SETRNC->rcar_canfd_setrnc(). * Updated RCANFD_GAFLCFG macro by replacing the parameter ch->w, where w is the GAFLCFG index used in the hardware manual. * Renamed the parameter x->page_num in RCANFD_GAFLECTR_AFLPN macro to make it clear. * Renamed the parameter x->cftml in RCANFD_CFCC_CFTML macro to make it clear. * Updated {rzg2l,car_gen3_hw_info} with ch_interface_mode = 0. * Updated {rzg2l,rcar_gen3}_hw_info with shared_can_regs = 0. * Started using struct rcanfd_regs instead of LUT for reg offsets. * Started using struct rcar_canfd_shift_data instead of LUT for shift data. * Renamed only_internal_clks->external_clk to avoid negation. * Updated rcar_canfd_hw_info tables with external_clk entries. * Replaced 10->sizeof(name) in scnprintf(). v6->v7: * Collected tags * Replaced 'aswell'->'as well' in patch#11 commit description. v5->v6: * Replaced RCANFD_RNC_PER_REG macro with rnc_stride variable. * Updated commit description for patch#7 and torvalds#8 * Dropped mask_table: AFLPN_MASK is replaced by max_aflpn variable. CFTML_MASK is replaced by max_cftml variable. BITTIMING MASK's are replaced by {nom,data}_bittiming variables. * Collected tag from Geert. v4->v5: * Collected tag from Geert. * The rules for R-Car Gen3/4 could be kept together, reducing the number of lines. Similar change for rzg2l-canfd aswell. * Keeping interrupts and resets together allows to keep a clear separation between RZ/G2L and RZ/G3E, at the expense of only a single line. * Retained the tags for binding patches as it is trivial changes. * Dropped the unused macro RCANFD_GAFLCFG_GETRNC. * Updated macro RCANFD_GERFL_ERR by using gpriv->channels_mask and dropped unused macro RCANFD_GERFL_EEF0_7. * Replaced RNC mask in RCANFD_GAFLCFG_SETRNC macro by using info->num_supported_rules variable. * Updated the macro RCANFD_GAFLCFG by using info->rnc_field_width variable. * Updated shift value in RCANFD_GAFLCFG_SETRNC macro by using a formula (32 - (n % rnc_per_reg + 1) * field_width). * Replaced the variable name shared_can_reg->shared_can_regs. * Improved commit description for patch{torvalds#11,torvalds#12}by replacing has->have. * Dropped RCANFD_EEF_MASK and RCANFD_RNC_MASK as it is taken care by gpriv->channels_mask and info->num_supported_rules. * Dropped RCANFD_FIRST_RNC_SH and RCANFD_SECOND_RNC_SH by using a formula (32 - (n % rnc_per_reg + 1) * rnc_field_width. * Improved commit description by "All SoCs supports extenal clock"-> "All existing SoCs support an external clock". * Updated error description in probe as "cannot get enabled ram clock" * Updated r9a09g047_hw_info table. v3->v4: * Added Rb tag from Rob for patch#2. * Added prefix RCANFD_* to enum rcar_canfd_reg_offset_id. * Added prefix RCANFD_* to enum rcar_canfd_mask_id. * Added prefix RCANFD_* to enum rcar_canfd_shift_id. v2->v3: * Collected tags. * Dropped reg_gen4() and is_gen4() by adding mask_table, shift_table, regs, ch_interface_mode and shared_can_reg variables to struct rcar_canfd_hw_info. v1->v2: * Split the series with fixes patch separately. * Added patch for Simplify rcar_canfd_probe() using of_get_available_child_by_name() as dependency patch hit on can-next. * Added Rb tag from Vincent Mailhol. * Dropped redundant comment from commit description for patch#3. Link: https://patch.msgid.link/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bf3624c ] The netdevsim driver was experiencing NOHZ tick-stop errors during packet transmission due to pending softirq work when calling napi_schedule(). This issue was observed when running the netconsole selftest, which triggered the following error message: NOHZ tick-stop error: local softirq work is pending, handler torvalds#8!!! To fix this issue, introduce a timer that schedules napi_schedule() from a timer context instead of calling it directly from the TX path. Create an hrtimer for each queue and kick it from the TX path, which then schedules napi_schedule() from the timer context. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
Symbolize stack traces by creating a live machine. Add this functionality to dump_stack and switch dump_stack users to use it. Switch TUI to use it. Add stack traces to the child test function which can be useful to diagnose blocked code. Example output: ``` 8: PERF_RECORD_* events & perf_sample fields : Running (1 active) ^C Signal (2) while running tests. Terminating tests with the same signal Internal test harness failure. Completing any started tests: : 8: PERF_RECORD_* events & perf_sample fields: ____ unexpected signal (2) ____ #0 0x5590fb6209b6 in child_test_sig_handler builtin-test.c:243 #1 0x7f4a91e49e20 in __restore_rt libc_sigaction.c:0 #2 0x7f4a91ee4f33 in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:71 #3 0x7f4a91ef0333 in __nanosleep nanosleep.c:26 #4 0x7f4a91f01f68 in __sleep sleep.c:55 #5 0x5590fb638c63 in test__PERF_RECORD perf-record.c:295 torvalds#6 0x5590fb620b43 in run_test_child builtin-test.c:269 torvalds#7 0x5590fb5b83ab in start_command run-command.c:127 torvalds#8 0x5590fb621572 in start_test builtin-test.c:467 torvalds#9 0x5590fb621a47 in __cmd_test builtin-test.c:573 torvalds#10 0x5590fb6225ea in cmd_test builtin-test.c:775 torvalds#11 0x5590fb5a9099 in run_builtin perf.c:351 torvalds#12 0x5590fb5a9340 in handle_internal_command perf.c:404 torvalds#13 0x5590fb5a9499 in run_argv perf.c:451 torvalds#14 0x5590fb5a97e2 in main perf.c:558 torvalds#15 0x7f4a91e33d68 in __libc_start_call_main libc_start_call_main.h:74 torvalds#16 0x7f4a91e33e25 in __libc_start_main@@GLIBC_2.34 libc-start.c:128 torvalds#17 0x5590fb4fd6d1 in _start perf[436d1] ``` Signed-off-by: Ian Rogers <[email protected]>
- treat tailcall count as 32-bit for access and update - change out_offset scope from file to function - minor format/structure changes for consistency Testing: (skipping fentry, fexit, freplace) ======== root@qemu-armhf:/usr/libexec/kselftests-bpf# modprobe test_bpf test_suite=test_tail_calls test_bpf: #0 Tail call leaf jited:1 967 PASS test_bpf: #1 Tail call 2 jited:1 1427 PASS test_bpf: #2 Tail call 3 jited:1 2373 PASS test_bpf: #3 Tail call 4 jited:1 2304 PASS test_bpf: #4 Tail call load/store leaf jited:1 1684 PASS test_bpf: #5 Tail call load/store jited:1 2249 PASS test_bpf: torvalds#6 Tail call error path, max count reached jited:1 22538 PASS test_bpf: torvalds#7 Tail call count preserved across function calls jited:1 1055668 PASS test_bpf: torvalds#8 Tail call error path, NULL target jited:1 513 PASS test_bpf: torvalds#9 Tail call error path, index out of range jited:1 392 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -n 397/1-12,17-18,23-24,27-31 397/1 tailcalls/tailcall_1:OK 397/2 tailcalls/tailcall_2:OK 397/3 tailcalls/tailcall_3:OK 397/4 tailcalls/tailcall_4:OK 397/5 tailcalls/tailcall_5:OK 397/6 tailcalls/tailcall_6:OK 397/7 tailcalls/tailcall_bpf2bpf_1:OK 397/8 tailcalls/tailcall_bpf2bpf_2:OK 397/9 tailcalls/tailcall_bpf2bpf_3:OK 397/10 tailcalls/tailcall_bpf2bpf_4:OK 397/11 tailcalls/tailcall_bpf2bpf_5:OK 397/12 tailcalls/tailcall_bpf2bpf_6:OK 397/17 tailcalls/tailcall_poke:OK 397/18 tailcalls/tailcall_bpf2bpf_hierarchy_1:OK 397/23 tailcalls/tailcall_bpf2bpf_hierarchy_2:OK 397/24 tailcalls/tailcall_bpf2bpf_hierarchy_3:OK 397/27 tailcalls/tailcall_failure:OK 397/28 tailcalls/reject_tail_call_spin_lock:OK 397/29 tailcalls/reject_tail_call_rcu_lock:OK 397/30 tailcalls/reject_tail_call_preempt_lock:OK 397/31 tailcalls/reject_tail_call_ref:OK 397 tailcalls:OK Summary: 1/21 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tony Ambardar <[email protected]>
Update current JIT offsets to enable and set up BPF line info for better introspection and debugging. Offsets map each xlated insn to the start of its JITed code, as well as the epilogue. This also allows simplifying some code and dropping unneeded JIT ctx variables. Taking bpf_iter_udp4.bpf.o as an example using bpftool's JIT disassembly, before this change we see: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 While afterwards we have: 48: ldr lr, [fp, #-44] @ 0xffffffd4 4c: strd r2, [lr, #-40] @ 0xffffffd8 ; struct seq_file *seq = ctx->meta->seq; 50: ldr r8, [fp, #-44] @ 0xffffffd4 54: ldr r8, [r8, #-40] @ 0xffffffd8 58: ldr r2, [r8] ; struct seq_file *seq = ctx->meta->seq; 5c: mov r7, r2 60: ldr r2, [r7] 64: mov r3, #0 ; struct udp_sock *udp_sk = ctx->udp_sk; 68: ldr r8, [fp, #-44] @ 0xffffffd4 6c: ldr r8, [r8, #-40] @ 0xffffffd8 70: ldr r6, [r8, torvalds#8] 74: ldr r9, [fp, #-44] @ 0xffffffd4 78: strd r6, [r9, #-48] @ 0xffffffd0 which aligns with the original source code. Signed-off-by: Tony Ambardar <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bf3624c ] The netdevsim driver was experiencing NOHZ tick-stop errors during packet transmission due to pending softirq work when calling napi_schedule(). This issue was observed when running the netconsole selftest, which triggered the following error message: NOHZ tick-stop error: local softirq work is pending, handler torvalds#8!!! To fix this issue, introduce a timer that schedules napi_schedule() from a timer context instead of calling it directly from the TX path. Create an hrtimer for each queue and kick it from the TX path, which then schedules napi_schedule() from the timer context. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
there is a global spinlock between reset and clk, if locked in reset, then print some debug information, maybe dead-lock when uart driver try to disable clk. Backtrace stopped: frame did not save the PC (gdb) thread 4 [Switching to thread 4 (Thread 4)] #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 22 ./arch/riscv/include/asm/vdso/processor.h: No such file or directory. (gdb) bt #0 cpu_relax () at ./arch/riscv/include/asm/vdso/processor.h:22 #1 arch_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/asm-generic/spinlock.h:49 #2 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock.h:186 #3 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd0 <enable_lock>) at ./include/linux/spinlock_api_smp.h:111 #4 _raw_spin_lock_irqsave (lock=lock@entry=0xffffffff81a57cd0 <enable_lock>) at kernel/locking/spinlock.c:162 #5 0xffffffff80563416 in clk_enable_lock () at ./include/linux/spinlock.h:325 torvalds#6 0xffffffff805648de in clk_core_disable_lock (core=0xffffffd900512500) at drivers/clk/clk.c:1062 torvalds#7 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#8 clk_disable (clk=0xffffffd9048b5100) at drivers/clk/clk.c:1079 torvalds#9 0xffffffff8059e5d4 in serial_pxa_console_write (co=<optimized out>, s=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", count=<optimized out>) at drivers/tty/serial/pxa_k1x.c:1724 torvalds#10 0xffffffff8004a34c in call_console_driver (dropped_text=0xffffffff81a68650 <dropped_text> "", len=69, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n", con=0xffffffff81964c10 <serial_pxa_console>) at kernel/printk/printk.c:1942 torvalds#11 console_emit_next_record (con=con@entry=0xffffffff81964c10 <serial_pxa_console>, ext_text=<optimized out>, dropped_text=0xffffffff81a68650 <dropped_text> "", handover=0xffffffc80578baa7, text=0xffffffff81a68250 <text> "[ 14.708612] [RESET][spacemit_reset_set][373]:assert = 1, id = 59 \n") at kernel/printk/printk.c:2731 torvalds#12 0xffffffff8004a49a in console_flush_all (handover=0xffffffc80578baa7, next_seq=<synthetic pointer>, do_cond_resched=false) at kernel/printk/printk.c:2793 torvalds#13 console_unlock () at kernel/printk/printk.c:2860 torvalds#14 0xffffffff8004b388 in vprintk_emit (facility=facility@entry=0, level=<optimized out>, level@entry=-1, dev_info=dev_info@entry=0x0, fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2268 torvalds#15 0xffffffff8004b3ae in vprintk_default (fmt=<optimized out>, args=<optimized out>) at kernel/printk/printk.c:2279 torvalds#16 0xffffffff8004b646 in vprintk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n", args=args@entry=0xffffffc80578bbd8) at kernel/printk/printk_safe.c:50 torvalds#17 0xffffffff80a880d6 in _printk (fmt=fmt@entry=0xffffffff813be470 "\001\066[RESET][%s][%d]:assert = %d, id = %d \n") at kernel/printk/printk.c:2289 torvalds#18 0xffffffff80a90bb6 in spacemit_reset_set (rcdev=rcdev@entry=0xffffffff81f563a8 <k1x_reset_controller+8>, id=id@entry=59, assert=assert@entry=true) at drivers/reset/reset-spacemit-k1x.c:373 torvalds#19 0xffffffff805823b6 in spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:401 torvalds#20 spacemit_reset_update (assert=true, id=59, rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>) at drivers/reset/reset-spacemit-k1x.c:387 torvalds#21 spacemit_reset_assert (rcdev=0xffffffff81f563a8 <k1x_reset_controller+8>, id=59) at drivers/reset/reset-spacemit-k1x.c:413 torvalds#22 0xffffffff8058158e in reset_control_assert (rstc=0xffffffd902b2f280) at drivers/reset/core.c:485 torvalds#23 0xffffffff807ccf96 in cpp_disable_clocks (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:960 torvalds#24 0xffffffff807cd0b2 in cpp_release_hardware (cpp_dev=cpp_dev@entry=0xffffffd904cc9040) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1038 torvalds#25 0xffffffff807cd990 in cpp_close_node (sd=<optimized out>, fh=<optimized out>) at drivers/media/platform/spacemit/camera/cam_cpp/k1x_cpp.c:1135 torvalds#26 0xffffffff8079525e in subdev_close (file=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-subdev.c:105 torvalds#27 0xffffffff8078e49e in v4l2_release (inode=<optimized out>, filp=0xffffffd906645d00) at drivers/media/v4l2-core/v4l2-dev.c:459 torvalds#28 0xffffffff80154974 in __fput (file=0xffffffd906645d00) at fs/file_table.c:320 torvalds#29 0xffffffff80154aa2 in ____fput (work=<optimized out>) at fs/file_table.c:348 torvalds#30 0xffffffff8002677e in task_work_run () at kernel/task_work.c:179 torvalds#31 0xffffffff800053b4 in resume_user_mode_work (regs=0xffffffc80578bee0) at ./include/linux/resume_user_mode.h:49 torvalds#32 do_work_pending (regs=0xffffffc80578bee0, thread_info_flags=<optimized out>) at arch/riscv/kernel/signal.c:478 torvalds#33 0xffffffff800039c6 in handle_exception () at arch/riscv/kernel/entry.S:374 Backtrace stopped: frame did not save the PC (gdb) thread 1 [Switching to thread 1 (Thread 1)] #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 49 ./include/asm-generic/spinlock.h: No such file or directory. (gdb) bt #0 0xffffffff80047e9c in arch_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/asm-generic/spinlock.h:49 #1 do_raw_spin_lock (lock=lock@entry=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock.h:186 #2 0xffffffff80aa21ce in __raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at ./include/linux/spinlock_api_smp.h:111 #3 _raw_spin_lock_irqsave (lock=0xffffffff81a57cd8 <g_cru_lock>) at kernel/locking/spinlock.c:162 #4 0xffffffff8056c4cc in ccu_mix_disable (hw=0xffffffff81956858 <sdh2_clk+120>) at ./include/linux/spinlock.h:325 #5 0xffffffff80564832 in clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1051 torvalds#6 clk_core_disable (core=0xffffffd900529900) at drivers/clk/clk.c:1031 torvalds#7 0xffffffff805648e6 in clk_core_disable_lock (core=0xffffffd900529900) at drivers/clk/clk.c:1063 torvalds#8 0xffffffff8056527e in clk_disable (clk=<optimized out>) at drivers/clk/clk.c:1084 torvalds#9 clk_disable (clk=clk@entry=0xffffffd904fafa80) at drivers/clk/clk.c:1079 torvalds#10 0xffffffff808bb898 in clk_disable_unprepare (clk=0xffffffd904fafa80) at ./include/linux/clk.h:1085 torvalds#11 0xffffffff808bb916 in spacemit_sdhci_runtime_suspend (dev=<optimized out>) at drivers/mmc/host/sdhci-of-k1x.c:1469 torvalds#12 0xffffffff8066e8e2 in pm_generic_runtime_suspend (dev=<optimized out>) at drivers/base/power/generic_ops.c:25 torvalds#13 0xffffffff80670398 in __rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:395 torvalds#14 0xffffffff806704b8 in rpm_callback (cb=cb@entry=0xffffffff8066e8ca <pm_generic_runtime_suspend>, dev=dev@entry=0xffffffd9018a2810) at drivers/base/power/runtime.c:529 torvalds#15 0xffffffff80670bdc in rpm_suspend (dev=0xffffffd9018a2810, rpmflags=<optimized out>) at drivers/base/power/runtime.c:672 torvalds#16 0xffffffff806716de in pm_runtime_work (work=0xffffffd9018a2948) at drivers/base/power/runtime.c:974 torvalds#17 0xffffffff800236f4 in process_one_work (worker=worker@entry=0xffffffd9013ee9c0, work=0xffffffd9018a2948) at kernel/workqueue.c:2289 torvalds#18 0xffffffff80023ba6 in worker_thread (__worker=0xffffffd9013ee9c0) at kernel/workqueue.c:2436 torvalds#19 0xffffffff80028bb2 in kthread (_create=0xffffffd9017de840) at kernel/kthread.c:376 torvalds#20 0xffffffff80003934 in handle_exception () at arch/riscv/kernel/entry.S:249 Backtrace stopped: frame did not save the PC (gdb) Change-Id: Ia95b41ffd6c1893c9c5e9c1c9fc0c155ea902d2c
It appears that the xe_res_cursor also assumes 4KiB alignment. Current implementation uses `PAGE_SIZE' as an assumed alignment reference, but 4KiB kernel page sizes is by no means a guarantee. On 16KiB-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Wenbin Fang <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Jianfeng Liu <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Link: https://t.me/c/1109254909/768552 Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]>
It appears that the xe_res_cursor also assumes 4K alignment. Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up: [ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) drm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty torvalds#8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]--- Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code. Cc: [email protected] Fixes: e89b384 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai <[email protected]> Tested-by: Haien Liang <[email protected]> Tested-by: Shirong Liu <[email protected]> Tested-by: Haofeng Wu <[email protected]> Link: FanFansfan@22c55ab Co-developed-by: Shang Yatsen <[email protected]> Signed-off-by: Shang Yatsen <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kexy Biscuit <[email protected]>
[ Upstream commit 88f7f56 ] When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 torvalds#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 torvalds#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 torvalds#8 [ffff800084a2fa60] generic_make_request at ffff800040570138 torvalds#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 torvalds#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] torvalds#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] torvalds#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] torvalds#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] torvalds#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] torvalds#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] torvalds#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 torvalds#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc torvalds#18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <[email protected]> Reviewed-by: Tianxiang Peng <[email protected]> Reviewed-by: Hao Peng <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
I was going through the readme, checking it out, and a few things kept distracting me. So I fixed 'em.