Skip to content

raspbian: kernel OOPS following page allocation failures under USB/network load #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
P33M opened this issue Jan 8, 2013 · 18 comments
Closed

Comments

@P33M
Copy link
Contributor

P33M commented Jan 8, 2013

Hello.

I have a stock raspbian + kernel 3.6.11+ #350, and can reliably oops the kernel by doing the following:

  1. ntfs-3g mount an external USB HDD
  2. Share this HDD with samba
  3. stream a large high bandwidth video (although standard resolution produces a similar result eventually)
  4. Seek repeatedly within the file on Windows

Over time multiple page allocation failures occur, all with the typical stack trace:

[ 2099.187501] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 2099.187665] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 2099.187843] smbd: page allocation failure: order:0, mode:0x20
[ 2099.187905] [<c0013a7c>] (unwind_backtrace+0x0/0xf0) from [<c0092760>] (warn_alloc_failed+0xc4/0x11c)
[ 2099.187937] [<c0092760>] (warn_alloc_failed+0xc4/0x11c) from [<c0094a70>] (__alloc_pages_nodemask+0x3e0/0x63c)
[ 2099.187967] [<c0094a70>] (__alloc_pages_nodemask+0x3e0/0x63c) from [<c02e4598>] (__netdev_alloc_frag+0x90/0x118)
[ 2099.187997] [<c02e4598>] (__netdev_alloc_frag+0x90/0x118) from [<c02e8708>] (__netdev_alloc_skb+0x40/0xd0)
[ 2099.188031] [<c02e8708>] (__netdev_alloc_skb+0x40/0xd0) from [<c026a870>] (rx_submit+0x1c/0x1f8)
[ 2099.188056] [<c026a870>] (rx_submit+0x1c/0x1f8) from [<c026aa88>] (rx_alloc_submit+0x3c/0x98)
[ 2099.188079] [<c026aa88>] (rx_alloc_submit+0x3c/0x98) from [<c026b83c>] (usbnet_bh+0x1e4/0x254)
[ 2099.188116] [<c026b83c>] (usbnet_bh+0x1e4/0x254) from [<c0025724>] (tasklet_action+0x60/0xb4)
[ 2099.188142] [<c0025724>] (tasklet_action+0x60/0xb4) from [<c0025854>] (__do_softirq+0xa0/0x154)
[ 2099.188168] [<c0025854>] (__do_softirq+0xa0/0x154) from [<c0025d1c>] (irq_exit+0x8c/0x94)
[ 2099.188201] [<c0025d1c>] (irq_exit+0x8c/0x94) from [<c000e920>] (handle_IRQ+0x34/0x84)
[ 2099.188227] [<c000e920>] (handle_IRQ+0x34/0x84) from [<c03997d4>] (__irq_svc+0x34/0xc8)
[ 2099.188252] [<c03997d4>] (__irq_svc+0x34/0xc8) from [<c02e7528>] (__alloc_skb+0xac/0x158)
[ 2099.188292] [<c02e7528>] (__alloc_skb+0xac/0x158) from [<c03272b4>] (sk_stream_alloc_skb+0x2c/0xe8)
[ 2099.188319] [<c03272b4>] (sk_stream_alloc_skb+0x2c/0xe8) from [<c0327cec>] (tcp_sendmsg+0x29c/0xe68)
[ 2099.188346] [<c0327cec>] (tcp_sendmsg+0x29c/0xe68) from [<c0349870>] (inet_sendmsg+0x40/0x78)
[ 2099.188379] [<c0349870>] (inet_sendmsg+0x40/0x78) from [<c02dd1e0>] (sock_aio_write+0x110/0x144)
[ 2099.188421] [<c02dd1e0>] (sock_aio_write+0x110/0x144) from [<c00c1b58>] (do_sync_readv_writev+0x94/0xdc)
[ 2099.188451] [<c00c1b58>] (do_sync_readv_writev+0x94/0xdc) from [<c00c1dfc>] (do_readv_writev+0xa8/0x178)
[ 2099.188476] [<c00c1dfc>] (do_readv_writev+0xa8/0x178) from [<c00c1f18>] (vfs_writev+0x4c/0x70)
[ 2099.188500] [<c00c1f18>] (vfs_writev+0x4c/0x70) from [<c00c20a4>] (sys_writev+0x38/0xc8)
[ 2099.188527] [<c00c20a4>] (sys_writev+0x38/0xc8) from [<c000da60>] (ret_fast_syscall+0x0/0x30)

Leading to an OOPS with a constant originator, failure of memcpy to allocate memory.

This behaviour can also be replicated by eliminating the network stack entirely from proceedings. Copying from one USB device to another results in similar failures:

[  599.883068] ntfs-3g: page allocation failure: order:0, mode:0x20
[  599.883121] [<c0014a70>] (unwind_backtrace+0x0/0x130) from [<c00907b0>] (warn_alloc_failed+0xc8/0x10c)
[  599.883149] [<c00907b0>] (warn_alloc_failed+0xc8/0x10c) from [<c0092ad4>] (__alloc_pages_nodemask+0x3e4/0x64c)
[  599.883175] [<c0092ad4>] (__alloc_pages_nodemask+0x3e4/0x64c) from [<c02e785c>] (__netdev_alloc_frag+0x94/0x128)
[  599.883201] [<c02e785c>] (__netdev_alloc_frag+0x94/0x128) from [<c02ebb90>] (__netdev_alloc_skb+0x84/0xe0)
[  599.883230] [<c02ebb90>] (__netdev_alloc_skb+0x84/0xe0) from [<c02683d8>] (rx_submit+0x1c/0x2c8)
[  599.883252] [<c02683d8>] (rx_submit+0x1c/0x2c8) from [<c02686c0>] (rx_alloc_submit+0x3c/0x94)
[  599.883272] [<c02686c0>] (rx_alloc_submit+0x3c/0x94) from [<c0269adc>] (usbnet_bh+0x244/0x334)
[  599.883304] [<c0269adc>] (usbnet_bh+0x244/0x334) from [<c00260f8>] (tasklet_action+0x60/0xb8)
[  599.883327] [<c00260f8>] (tasklet_action+0x60/0xb8) from [<c0026230>] (__do_softirq+0xa4/0x14c)
[  599.883349] [<c0026230>] (__do_softirq+0xa4/0x14c) from [<c0026700>] (irq_exit+0x8c/0x94)
[  599.883378] [<c0026700>] (irq_exit+0x8c/0x94) from [<c000f280>] (handle_IRQ+0x34/0x84)
[  599.883402] [<c000f280>] (handle_IRQ+0x34/0x84) from [<c039c454>] (__irq_svc+0x34/0xc8)
[  599.883429] [<c039c454>] (__irq_svc+0x34/0xc8) from [<c01ea470>] (memcpy+0x50/0x330)
[  599.883437] Mem-info:
[  599.883444] Normal per-cpu:
[  599.883453] CPU    0: hi:   90, btch:  15 usd:  26
[  599.883476] active_anon:1516 inactive_anon:1526 isolated_anon:0
[  599.883476]  active_file:5640 inactive_file:32134 isolated_file:0
[  599.883476]  unevictable:0 dirty:1987 writeback:0 unstable:0
[  599.883476]  free:8903 slab_reclaimable:1010 slab_unreclaimable:999
[  599.883476]  mapped:2354 shmem:98 pagetables:186 bounce:0
[  599.883517] Normal free:35612kB min:16384kB low:20480kB high:24576kB active_anon:6064kB inactive_anon:6104kB active_file:22560kB inactive_file:128536kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:223392kB mlocked:0kB dirty:7948kB writeback:0kB mapped:9416kB shmem:392kB slab_reclaimable:4040kB slab_unreclaimable:3996kB kernel_stack:1152kB pagetables:744kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  599.883528] lowmem_reserve[]: 0 0
[  599.883544] Normal: 5309*4kB 1753*8kB 22*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35612kB
[  599.883590] 37872 total pagecache pages
[  599.883598] 0 pages in swap cache
[  599.883606] Swap cache stats: add 0, delete 0, find 0/0
[  599.883612] Free swap  = 102396kB
[  599.883617] Total swap = 102396kB
[  599.894071] 56320 pages of RAM
[  599.894080] 9169 free pages
[  599.894086] 2077 reserved pages
[  599.894092] 2009 slab pages
[  599.894098] 29265 pages shared
[  599.894104] 0 pages swap cached

Although it seems to be because there was network activity also.

I compiled a debugging kernel from the latest pull of this repository with slab allocation debugging enabled plus kmemleak debugging.

On boot I get this:

<..snip normal boot messages..>
[    2.636678] usb 1-1: new high-speed USB device number 2 using dwc_otg
[    2.647828] Indeed it is in host mode hprt0 = 00001101
[    2.857062] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
[    2.868349] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.881209] hub 1-1:1.0: USB hub found
[    2.889757] hub 1-1:1.0: 3 ports detected
[    3.177021] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
[    3.287464] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    3.300575] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.321545] smsc95xx v1.0.4
[    3.395812] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:16:5c:e9
[    3.527102] usb 1-1.2: new high-speed USB device number 4 using dwc_otg
[    3.667890] usb 1-1.2: New USB device found, idVendor=1a40, idProduct=0201
[    3.679714] usb 1-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    3.691535] usb 1-1.2: Product: USB 2.0 Hub [MTT]
[    3.702452] hub 1-1.2:1.0: USB hub found
[    3.711453] hub 1-1.2:1.0: 7 ports detected
[    3.996938] usb 1-1.2.5: new high-speed USB device number 5 using dwc_otg
[    4.108456] usb 1-1.2.5: New USB device found, idVendor=07ab, idProduct=fc88
[    4.129423] usb 1-1.2.5: New USB device strings: Mfr=3, Product=11, SerialNumber=5
[    4.155406] usb 1-1.2.5: Product: Freecom Mobile Drive XXS
[    4.166334] usb 1-1.2.5: Manufacturer: Freecom
[    4.175256] usb 1-1.2.5: SerialNumber: 287011C82395
[    4.197863] scsi0 : usb-storage 1-1.2.5:1.0
[    4.547357] udevd[145]: starting version 175
[    5.208224] scsi 0:0:0:0: Direct-Access     Freecom  Mobile Drive XXS      PQ: 0 ANSI: 2 CCS
[    5.249382] sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
[    5.287762] sd 0:0:0:0: [sda] Write Protect is off
[    5.316683] sd 0:0:0:0: [sda] Mode Sense: 34 00 00 00
[    5.317761] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    5.368362] Slab corruption (Not tainted): size-64 start=c7a5b8e0, len=64
[    5.406670] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    5.441301] Prev obj: start=c7a5b8a0, len=64
[    5.476652] 000: 20 b3 a5 c7 a8 22 80 c7 00 00 00 00 00 10 af c7   ...."..........
[    5.502686] 010: 04 00 00 00 ff ff ff ff 00 00 5a 5a fe ff ff ff  ..........ZZ....
[    5.546631] Next obj: start=c7a5b920, len=64
[    5.555677] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.606636] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    5.644104] Slab corruption (Not tainted): size-64 start=c7b3d3a0, len=64
[    5.683449] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    5.719188] Prev obj: start=c7b3d360, len=64
[    5.749983] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.794431] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    5.826668] Next obj: start=c7b3d3e0, len=64
[    5.835765] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.890040] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    7.959533]  sda: sda1
[    7.977742] sd 0:0:0:0: [sda] Attached SCSI disk
[    8.003154] Registered led device: led0
[    9.390477] Slab corruption (Not tainted): size-64 start=c7a5b720, len=64
[    9.402192] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 6b a9 c7  kkkkkkkkkkkk.k..
[    9.414768] Prev obj: start=c7a5b6e0, len=64
[    9.424032] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.436680] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.449214] Next obj: start=c7a5b760, len=64
[    9.458488] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    9.471242] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    9.523315] Slab corruption (Not tainted): size-64 start=c7a5b6e0, len=64
[    9.551462] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    9.564332] Prev obj: start=c7a5b6a0, len=64
[    9.573690] 000: a0 ed a8 c7 e0 06 9c c7 e0 06 9c c7 d0 40 aa c7  .............@..
[    9.586287] 010: e0 06 9c c7 d4 3e 55 c0 d0 4a ab c7 02 00 00 00  .....>U..J......
[    9.599022] Next obj: start=c7a5b720, len=64
[    9.608257] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.620858] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   14.089511] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   14.685230] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   15.849799] bcm2835 ALSA card created!
[   15.868583] bcm2835 ALSA chip created!
[   15.886912] bcm2835 ALSA chip created!
[   15.905486] bcm2835 ALSA chip created!
[   15.917167] bcm2835 ALSA chip created!
[   15.932963] bcm2835 ALSA chip created!
[   15.944510] bcm2835 ALSA chip created!
[   15.955946] bcm2835 ALSA chip created!

Something is corrupting memory on boot. Given that it's on USB data access during enumeration of a device, my feeling is that it's the USB driver.

@popcornmix
Copy link
Collaborator

Can you quote your traces, or link to pastbin?
github has removed much of the content.

@P33M
Copy link
Contributor Author

P33M commented Jan 8, 2013

Fixed with code blocks added

@popcornmix
Copy link
Collaborator

Possibly a duplicate of:
#153
Can you try again with:
smsc95xx.turbo_mode=N
in cmdline.txt

@P33M
Copy link
Contributor Author

P33M commented Jan 8, 2013

All the above lines were done with the following command line:

coherent_pool=6M smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

@P33M
Copy link
Contributor Author

P33M commented Jan 8, 2013

I went one better and recompiled with smsc95xx compiled as a module, then blacklisted it. The only thing plugged into the system on boot were 2x mass storage devices. Same slab corruption, though with different text in the corrupt entries.

I have gotten kmemleak working (required larger kmemleak early buffer size of 1500 entries) so will attempt to probe further.

@P33M
Copy link
Contributor Author

P33M commented Jan 8, 2013

After boot, kmemleak reports

  comm "swapper", pid 1, jiffies 4294937312 (age 248.680s)
  hex dump (first 32 bytes):
    00 80 00 00 00 10 00 00 00 ae 8c c7 5a 5a 5a 5a  ............ZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5  ZZZZZZZZZZZZZZZ.
  backtrace:
    [<c00bc11c>] kmem_cache_alloc+0x104/0x180
    [<c050b320>] cma_init_reserved_areas+0x84/0x27c
    [<c0008548>] do_one_initcall+0x34/0x174
    [<c04f8868>] kernel_init+0xe8/0x1a8
    [<c000f35c>] kernel_thread_exit+0x0/0x8
    [<ffffffff>] 0xffffffff
unreferenced object 0xc78cae00 (size 512):
  comm "swapper", pid 1, jiffies 4294937312 (age 248.680s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<c00bbf80>] __kmalloc+0x150/0x1e8
    [<c050b344>] cma_init_reserved_areas+0xa8/0x27c
    [<c0008548>] do_one_initcall+0x34/0x174
    [<c04f8868>] kernel_init+0xe8/0x1a8
    [<c000f35c>] kernel_thread_exit+0x0/0x8
    [<ffffffff>] 0xffffffff

Can these init calls be ignored safely?

But subsequently, kmemleak breaks when I try to run a scan again during the "critical phase" i.e. video playback on remote computer.

Note the time differences are due to a reboot in between.

[  240.092992] kmemleak: Cannot allocate a kmemleak_object structure
[  240.093025] kmemleak: Kernel memory leak detector disabled
[  240.146271] kmemleak: Automatic memory scanning thread ended

@P33M
Copy link
Contributor Author

P33M commented Jan 9, 2013

Found it (I think).

I changed the kernel allocator to SLUB which allowed for extensive tracing.

http://pastebin.com/wFnv2qUA

Every single corrupted memory is a result of a handle_hc_xfercomp_intr call from completed USB transactions.

Additionally I get smbd page allocation failures but without any associated SLUB backtraces.

@popcornmix
Copy link
Collaborator

I assume you are running with CMA enabled? Do you get same failure with it disabled?
I'm not very familar with the kmemleak tracing. Do you believe handle_hc_xfercomp_intr is writing beyond the end of a successfully kmalloc'ed buffer? (rather than an alloc fail without a null check for example).

@P33M
Copy link
Contributor Author

P33M commented Jan 9, 2013

Crashed even sooner with CMA disabled.

Found the problem

diff --git a/drivers/usb/host/dwc_otg/dwc_otg_hcd.c b/drivers/usb/host/dwc_otg/dwc_otg_hcd.c
index 2b7945a..969ecb7 100644
--- a/drivers/usb/host/dwc_otg/dwc_otg_hcd.c
+++ b/drivers/usb/host/dwc_otg/dwc_otg_hcd.c
@@ -501,7 +501,8 @@ int dwc_otg_hcd_urb_enqueue(dwc_otg_hcd_t * hcd,
                          "Error status %d\n", retval);
                dwc_otg_hcd_qtd_free(qtd);
        } else {
-               qtd->qh = *ep_handle;
+       // Move this inside the atomic section in qtd_add
+       //      qtd->qh = *ep_handle;
        }
        intr_mask.d32 = DWC_READ_REG32(&hcd->core_if->core_global_regs->gintmsk);
        if (!intr_mask.b.sofintr && retval == 0) {
diff --git a/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c b/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c
index e6b2a7b..b337e1b 100644
--- a/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c
+++ b/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c
@@ -946,6 +946,7 @@ int dwc_otg_hcd_qtd_add(dwc_otg_qtd_t * qtd,
        if (retval == 0) {
                DWC_CIRCLEQ_INSERT_TAIL(&((*qh)->qtd_list), qtd,
                                        qtd_list_entry);
+               qtd->qh = *qh;
        }
        DWC_SPINUNLOCK_IRQRESTORE(hcd->lock, flags);

on adding a QTD to a QH, the endpoint handle was being assigned after an atomic section. In the case of heavy USB activity, a transfer complete interrupt could happen during the QTD insertion which would then get serviced immediately after the spinlock was released. A transfer complete interrupt removes a QTD from the list which then makes the pointer used by dwc_otg_hcd_urb_enqueue point to something entirely random, usually freed memory.

Zero crashes or SLUB failures with this patch applied.

Edit: as an added bonus, smsc95xx.turbo_mode=y now results in zero kevent 2 dropped messages.

@ghollingworth
Copy link

Good catch, can you push the patch?

Gordon

On 9 January 2013 14:27, P33M [email protected] wrote:

Crashed even sooner with CMA disabled.

Found the problem

diff --git a/drivers/usb/host/dwc_otg/dwc_otg_hcd.c b/drivers/usb/host/dwc_otg/d
index 2b7945a..969ecb7 100644
--- a/drivers/usb/host/dwc_otg/dwc_otg_hcd.c
+++ b/drivers/usb/host/dwc_otg/dwc_otg_hcd.c
@@ -501,7 +501,8 @@ int dwc_otg_hcd_urb_enqueue(dwc_otg_hcd_t * hcd,
"Error status %d\n", retval);
dwc_otg_hcd_qtd_free(qtd);
} else {

  •           qtd->qh = *ep_handle;
    
  •   // Move this inside the atomic section in qtd_add
    
  •   //      qtd->qh = _ep_handle;
    }
    intr_mask.d32 = DWC_READ_REG32(&hcd->core_if->core_global_regs->gintmsk)
    if (!intr_mask.b.sofintr && retval == 0) {
    
    diff --git a/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c b/drivers/usb/host/dwc
    index e6b2a7b..b337e1b 100644
    --- a/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c
    +++ b/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c
    @@ -946,6 +946,7 @@ int dwc_otg_hcd_qtd_add(dwc_otg_qtd_t * qtd,
    if (retval == 0) {
    DWC_CIRCLEQ_INSERT_TAIL(&((_qh)->qtd_list), qtd,
    qtd_list_entry);
  •           qtd->qh = *qh;
    }
    DWC_SPINUNLOCK_IRQRESTORE(hcd->lock, flags);
    

on adding a QTD to a QH, the endpoint handle was being allocated after an
atomic section. In the case of heavy USB activity, a transfer complete
interrupt could happen during the QTD insertion which would then get
serviced immediately after the spinlock was released. A transfer complete
interrupt removes a QTD from the list which then makes the pointer returned
by dwc_otg_hcd_qtd_add point to something entirely random.

Zero crashes or SLUB failures with this patch applied.


Reply to this email directly or view it on GitHubhttps://github.com//issues/189#issuecomment-12046142.

@P33M
Copy link
Contributor Author

P33M commented Jan 9, 2013

Addendum to previous: kevent 2 messages are still being generated sporadically with turbo_mode=Y, though the trigger appears to be a much higher level of network activity than previously. I was getting them before the bugfix during video playback at about 1MB/s but afterwards it now requires a straight file copy operation over the network (i.e. max speed). Performance with the transfer from USB to network is typically 3.5MB/s with turbo_mode=Y and wierdly 5.1MB/s with turbo_mode=N.

@popcornmix
Copy link
Collaborator

Presumably the 5.1MB/s with turbo_mode=N has no kevent 2 messages, and the 3.5MB/s with turbo_mode=Y has kevent 2 messages?
Presumably the dropped packets are requiring retransmission and so reducing bandwidth?

@P33M
Copy link
Contributor Author

P33M commented Jan 9, 2013

Yes, correct. The copy operation does not fail, presumably due to TCP retries but performance suffers.

@popcornmix
Copy link
Collaborator

Last time I investigated the dropped messages were much more common with CMA enabled (even getting them when just web browsing sites like engadget). I wasn't seeing them at all with CMA disabled.

Are you seeing this, or is CMA not causing any greater problems?

@P33M
Copy link
Contributor Author

P33M commented Jan 9, 2013

It seems this bug may not be fully squashed. I have seen scarce page allocation failures in a near-identical manner with slub_debug turned off after transferring about 4GB of a 6GB file but frustratingly when I turn on the slub_debug option that gives me a proper call stack it makes the bug even more scarce due to the slowdown it adds.

I will play with various CMA/turbo/USB usage options to see if I can reliably replicate it.

@P33M
Copy link
Contributor Author

P33M commented Jan 10, 2013

Definitely not squashed. Tested with stock kernel pulled from rpi-update with the patch incorporated.

I dug out my old zd1211b wireless adapter and attempted to perform the same test (network copy). I can ping flood the device OK but as soon as I initiate multiple USB activity either with video viewing or file copy then the transfer grinds to a halt with a pitifully slow transfer rate with associated allocation failures.

I think the inherent lag and variability of a wireless network interface are suddenly enough to tickle a similar bug.

@P33M
Copy link
Contributor Author

P33M commented Jan 16, 2013

Progress, of a sort:
Using stock kernel with USB pendrive + zd1211b adapter, I can reliably create memory allocation failures by either seeking rapidly within a video (as before) or letting a video play for a while. If I replug the device and let it reassociate, the symptom is "reset" until the next threshold of transfer is reached. If I do not replug a device, the transfers (and pings) carry on in fits and starts but still unusable.

I need to eliminate the zd1211 driver as a cause - I've ordered a different stick on Ebay to cross-compare.

@P33M
Copy link
Contributor Author

P33M commented Jan 19, 2013

RTL5370 chipset produces no comparable slowdown or allocation failures. Bug most likely within the zd1211rw driver.

@P33M P33M closed this as completed Jan 19, 2013
popcornmix pushed a commit that referenced this issue Apr 24, 2014
commit 3a41c5d upstream.

Following commits:

50e244c fb: rework locking to fix lock ordering on takeover
e93a9a8 fb: Yet another band-aid for fixing lockdep mess
054430e fbcon: fix locking harder

reworked locking to fix related lock ordering on takeover, and introduced console_lock
into fbmem, but it seems that the new lock sequence(fb_info->lock ---> console_lock)
is against with the one in console_callback(console_lock ---> fb_info->lock), and leads to
a potential dead lock as following:

[  601.079000] ======================================================
[  601.079000] [ INFO: possible circular locking dependency detected ]
[  601.079000] 3.11.0 #189 Not tainted
[  601.079000] -------------------------------------------------------
[  601.079000] kworker/0:3/619 is trying to acquire lock:
[  601.079000]  (&fb_info->lock){+.+.+.}, at: [<ffffffff81397566>] lock_fb_info+0x26/0x60
[  601.079000]
but task is already holding lock:
[  601.079000]  (console_lock){+.+.+.}, at: [<ffffffff8141aae3>] console_callback+0x13/0x160
[  601.079000]
which lock already depends on the new lock.

[  601.079000]
the existing dependency chain (in reverse order) is:
[  601.079000]
-> #1 (console_lock){+.+.+.}:
[  601.079000]        [<ffffffff810dc971>] lock_acquire+0xa1/0x140
[  601.079000]        [<ffffffff810c6267>] console_lock+0x77/0x80
[  601.079000]        [<ffffffff81399448>] register_framebuffer+0x1d8/0x320
[  601.079000]        [<ffffffff81cfb4c8>] efifb_probe+0x408/0x48f
[  601.079000]        [<ffffffff8144a963>] platform_drv_probe+0x43/0x80
[  601.079000]        [<ffffffff8144853b>] driver_probe_device+0x8b/0x390
[  601.079000]        [<ffffffff814488eb>] __driver_attach+0xab/0xb0
[  601.079000]        [<ffffffff814463bd>] bus_for_each_dev+0x5d/0xa0
[  601.079000]        [<ffffffff81447e6e>] driver_attach+0x1e/0x20
[  601.079000]        [<ffffffff81447a07>] bus_add_driver+0x117/0x290
[  601.079000]        [<ffffffff81448fea>] driver_register+0x7a/0x170
[  601.079000]        [<ffffffff8144a10a>] __platform_driver_register+0x4a/0x50
[  601.079000]        [<ffffffff8144a12d>] platform_driver_probe+0x1d/0xb0
[  601.079000]        [<ffffffff81cfb0a1>] efifb_init+0x273/0x292
[  601.079000]        [<ffffffff81002132>] do_one_initcall+0x102/0x1c0
[  601.079000]        [<ffffffff81cb80a6>] kernel_init_freeable+0x15d/0x1ef
[  601.079000]        [<ffffffff8166d2de>] kernel_init+0xe/0xf0
[  601.079000]        [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0
[  601.079000]
-> #0 (&fb_info->lock){+.+.+.}:
[  601.079000]        [<ffffffff810dc1d8>] __lock_acquire+0x1e18/0x1f10
[  601.079000]        [<ffffffff810dc971>] lock_acquire+0xa1/0x140
[  601.079000]        [<ffffffff816835ca>] mutex_lock_nested+0x7a/0x3b0
[  601.079000]        [<ffffffff81397566>] lock_fb_info+0x26/0x60
[  601.079000]        [<ffffffff813a4aeb>] fbcon_blank+0x29b/0x2e0
[  601.079000]        [<ffffffff81418658>] do_blank_screen+0x1d8/0x280
[  601.079000]        [<ffffffff8141ab34>] console_callback+0x64/0x160
[  601.079000]        [<ffffffff8108d855>] process_one_work+0x1f5/0x540
[  601.079000]        [<ffffffff8108e04c>] worker_thread+0x11c/0x370
[  601.079000]        [<ffffffff81095fbd>] kthread+0xed/0x100
[  601.079000]        [<ffffffff816914ec>] ret_from_fork+0x7c/0xb0
[  601.079000]
other info that might help us debug this:

[  601.079000]  Possible unsafe locking scenario:

[  601.079000]        CPU0                    CPU1
[  601.079000]        ----                    ----
[  601.079000]   lock(console_lock);
[  601.079000]                                lock(&fb_info->lock);
[  601.079000]                                lock(console_lock);
[  601.079000]   lock(&fb_info->lock);
[  601.079000]
 *** DEADLOCK ***

so we reorder the lock sequence the same as it in console_callback() to
avoid this issue. And following Tomi's suggestion, fix these similar
issues all in fb subsystem.

Signed-off-by: Gu Zheng <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 14, 2016
I got this:

    divide error: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801120a9580 task.stack: ffff8801120b0000
    RIP: 0010:[<ffffffff82c8bd9a>]  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
    RSP: 0018:ffff88011aa87da8  EFLAGS: 00010006
    RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001
    RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048
    R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00
    R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000
    FS:  00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0
    Stack:
     0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76
     ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0
     00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0
    Call Trace:
     <IRQ>
     [<ffffffff813abce7>] __hrtimer_run_queues+0x347/0xa00
     [<ffffffff82c8bbc0>] ? snd_hrtimer_close+0x130/0x130
     [<ffffffff813ab9a0>] ? retrigger_next_event+0x1b0/0x1b0
     [<ffffffff813ae1a6>] ? hrtimer_interrupt+0x136/0x4b0
     [<ffffffff813ae220>] hrtimer_interrupt+0x1b0/0x4b0
     [<ffffffff8120f91e>] local_apic_timer_interrupt+0x6e/0xf0
     [<ffffffff81227ad3>] ? kvm_guest_apic_eoi_write+0x13/0xc0
     [<ffffffff83c35086>] smp_apic_timer_interrupt+0x76/0xa0
     [<ffffffff83c3416c>] apic_timer_interrupt+0x8c/0xa0
     <EOI>
     [<ffffffff83c3239c>] ? _raw_spin_unlock_irqrestore+0x2c/0x60
     [<ffffffff82c8185d>] snd_timer_start1+0xdd/0x670
     [<ffffffff82c87015>] snd_timer_continue+0x45/0x80
     [<ffffffff82c88100>] snd_timer_user_ioctl+0x1030/0x2830
     [<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
     [<ffffffff815aa4f8>] ? handle_mm_fault+0xbc8/0x27f0
     [<ffffffff815a9930>] ? __pmd_alloc+0x370/0x370
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
     [<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
     [<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
     [<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
     [<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
     [<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
     [<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00
    RIP  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
     RSP <ffff88011aa87da8>
    ---[ end trace 6aa380f756a21074 ]---

The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().

Signed-off-by: Vegard Nossum <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 15, 2016
commit 6b760bb upstream.

I got this:

    divide error: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801120a9580 task.stack: ffff8801120b0000
    RIP: 0010:[<ffffffff82c8bd9a>]  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
    RSP: 0018:ffff88011aa87da8  EFLAGS: 00010006
    RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001
    RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048
    R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00
    R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000
    FS:  00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0
    Stack:
     0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76
     ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0
     00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0
    Call Trace:
     <IRQ>
     [<ffffffff813abce7>] __hrtimer_run_queues+0x347/0xa00
     [<ffffffff82c8bbc0>] ? snd_hrtimer_close+0x130/0x130
     [<ffffffff813ab9a0>] ? retrigger_next_event+0x1b0/0x1b0
     [<ffffffff813ae1a6>] ? hrtimer_interrupt+0x136/0x4b0
     [<ffffffff813ae220>] hrtimer_interrupt+0x1b0/0x4b0
     [<ffffffff8120f91e>] local_apic_timer_interrupt+0x6e/0xf0
     [<ffffffff81227ad3>] ? kvm_guest_apic_eoi_write+0x13/0xc0
     [<ffffffff83c35086>] smp_apic_timer_interrupt+0x76/0xa0
     [<ffffffff83c3416c>] apic_timer_interrupt+0x8c/0xa0
     <EOI>
     [<ffffffff83c3239c>] ? _raw_spin_unlock_irqrestore+0x2c/0x60
     [<ffffffff82c8185d>] snd_timer_start1+0xdd/0x670
     [<ffffffff82c87015>] snd_timer_continue+0x45/0x80
     [<ffffffff82c88100>] snd_timer_user_ioctl+0x1030/0x2830
     [<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
     [<ffffffff815aa4f8>] ? handle_mm_fault+0xbc8/0x27f0
     [<ffffffff815a9930>] ? __pmd_alloc+0x370/0x370
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
     [<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
     [<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
     [<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
     [<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
     [<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
     [<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00
    RIP  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
     RSP <ffff88011aa87da8>
    ---[ end trace 6aa380f756a21074 ]---

The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().

Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Sep 15, 2016
commit 6b760bb upstream.

I got this:

    divide error: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801120a9580 task.stack: ffff8801120b0000
    RIP: 0010:[<ffffffff82c8bd9a>]  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
    RSP: 0018:ffff88011aa87da8  EFLAGS: 00010006
    RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001
    RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048
    R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00
    R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000
    FS:  00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0
    Stack:
     0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76
     ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0
     00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0
    Call Trace:
     <IRQ>
     [<ffffffff813abce7>] __hrtimer_run_queues+0x347/0xa00
     [<ffffffff82c8bbc0>] ? snd_hrtimer_close+0x130/0x130
     [<ffffffff813ab9a0>] ? retrigger_next_event+0x1b0/0x1b0
     [<ffffffff813ae1a6>] ? hrtimer_interrupt+0x136/0x4b0
     [<ffffffff813ae220>] hrtimer_interrupt+0x1b0/0x4b0
     [<ffffffff8120f91e>] local_apic_timer_interrupt+0x6e/0xf0
     [<ffffffff81227ad3>] ? kvm_guest_apic_eoi_write+0x13/0xc0
     [<ffffffff83c35086>] smp_apic_timer_interrupt+0x76/0xa0
     [<ffffffff83c3416c>] apic_timer_interrupt+0x8c/0xa0
     <EOI>
     [<ffffffff83c3239c>] ? _raw_spin_unlock_irqrestore+0x2c/0x60
     [<ffffffff82c8185d>] snd_timer_start1+0xdd/0x670
     [<ffffffff82c87015>] snd_timer_continue+0x45/0x80
     [<ffffffff82c88100>] snd_timer_user_ioctl+0x1030/0x2830
     [<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
     [<ffffffff815aa4f8>] ? handle_mm_fault+0xbc8/0x27f0
     [<ffffffff815a9930>] ? __pmd_alloc+0x370/0x370
     [<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
     [<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
     [<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
     [<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
     [<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
     [<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
     [<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
     [<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
     [<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
     [<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
     [<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
     [<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
    Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00
    RIP  [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
     RSP <ffff88011aa87da8>
    ---[ end trace 6aa380f756a21074 ]---

The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().

Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
anholt pushed a commit to anholt/linux that referenced this issue Apr 23, 2018
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6d ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include <stdlib.h>
	#include <sys/shm.h>
	#include <sys/syscall.h>
	#include <unistd.h>

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 raspberrypi#189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[[email protected]: add comment]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
nathanchance pushed a commit to nathanchance/pi-kernel that referenced this issue Apr 24, 2018
commit 3f05317 upstream.

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6d ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include <stdlib.h>
	#include <sys/shm.h>
	#include <sys/syscall.h>
	#include <unistd.h>

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [raspberrypi#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 raspberrypi#189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[[email protected]: add comment]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Apr 26, 2018
commit 3f05317 upstream.

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6d ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include <stdlib.h>
	#include <sys/shm.h>
	#include <sys/syscall.h>
	#include <unistd.h>

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[[email protected]: add comment]
  Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Jun 4, 2019
[ Upstream commit a4b7013 ]

BUG: KASAN: slab-out-of-bounds in rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
Read of size 8 at addr ffff88805c01a608 by task ib_send_bw/573

CPU: 24 PID: 573 Comm: ib_send_bw Not tainted 5.0.0-rc5+ #189
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
 rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
 ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
 ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
 rxe_mem_init_user+0x6c1/0x740 [rdma_rxe]
 rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
 ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
 ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs]
 ib_uverbs_ioctl+0x202/0x310 [ib_uverbs]
 do_vfs_ioctl+0x193/0x1440
 ksys_ioctl+0x3a/0x70
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x13f/0x570
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Allocated by task 573:
 __kasan_kmalloc.constprop.5+0xc1/0xd0
 __kmalloc+0x161/0x310
 rxe_mem_alloc+0x52/0x470 [rdma_rxe]
 rxe_mem_init_user+0x113/0x740 [rdma_rxe]
 rxe_reg_user_mr+0x9b/0x110 [rdma_rxe]
 ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs]
 ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs]
 ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs]
 ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs]
 ib_uverbs_ioctl+0x202/0x310 [ib_uverbs]
 do_vfs_ioctl+0x193/0x1440
 ksys_ioctl+0x3a/0x70
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x13f/0x570
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 0:
 __kasan_slab_free+0x12e/0x180
 kfree+0x10a/0x2c0
 rcu_process_callbacks+0xa77/0x1260
 __do_softirq+0x2ad/0xacb

Test scenario:
 ib_send_bw -x 1 -d rxe0 -a &
 ib_send_bw -x 1 -d rxe0 -a localhost

Fixes: 8700e3e ("Soft RoCE driver")
Reported-by: Parav Pandit <[email protected]>
Reviewed-by: Zhu Yanjun <[email protected]>
Tested-by: Zhu Yanjun <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
herrnst pushed a commit to herrnst/linux-raspberrypi that referenced this issue Jan 2, 2023
…add()

[ Upstream commit 78316e9 ]

In mpt3sas_transport_port_add(), if sas_rphy_add() returns error,
sas_rphy_free() needs be called to free the resource allocated in
sas_end_device_alloc(). Otherwise a kernel crash will happen:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
CPU: 45 PID: 37020 Comm: bash Kdump: loaded Tainted: G        W          6.1.0-rc1+ raspberrypi#189
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x54/0x3d0
lr : device_del+0x37c/0x3d0
Call trace:
 device_del+0x54/0x3d0
 attribute_container_class_device_del+0x28/0x38
 transport_remove_classdev+0x6c/0x80
 attribute_container_device_trigger+0x108/0x110
 transport_remove_device+0x28/0x38
 sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_remove_host+0x20/0x38 [scsi_transport_sas]
 scsih_remove+0xd8/0x420 [mpt3sas]

Because transport_add_device() is not called when sas_rphy_add() fails, the
device is not added. When sas_rphy_remove() is subsequently called to
remove the device in the remove() path, a NULL pointer dereference happens.

Fixes: f92363d ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Jan 3, 2023
…add()

[ Upstream commit 78316e9 ]

In mpt3sas_transport_port_add(), if sas_rphy_add() returns error,
sas_rphy_free() needs be called to free the resource allocated in
sas_end_device_alloc(). Otherwise a kernel crash will happen:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
CPU: 45 PID: 37020 Comm: bash Kdump: loaded Tainted: G        W          6.1.0-rc1+ #189
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x54/0x3d0
lr : device_del+0x37c/0x3d0
Call trace:
 device_del+0x54/0x3d0
 attribute_container_class_device_del+0x28/0x38
 transport_remove_classdev+0x6c/0x80
 attribute_container_device_trigger+0x108/0x110
 transport_remove_device+0x28/0x38
 sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_remove_host+0x20/0x38 [scsi_transport_sas]
 scsih_remove+0xd8/0x420 [mpt3sas]

Because transport_add_device() is not called when sas_rphy_add() fails, the
device is not added. When sas_rphy_remove() is subsequently called to
remove the device in the remove() path, a NULL pointer dereference happens.

Fixes: f92363d ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Jan 13, 2023
…add()

[ Upstream commit 78316e9 ]

In mpt3sas_transport_port_add(), if sas_rphy_add() returns error,
sas_rphy_free() needs be called to free the resource allocated in
sas_end_device_alloc(). Otherwise a kernel crash will happen:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
CPU: 45 PID: 37020 Comm: bash Kdump: loaded Tainted: G        W          6.1.0-rc1+ #189
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : device_del+0x54/0x3d0
lr : device_del+0x37c/0x3d0
Call trace:
 device_del+0x54/0x3d0
 attribute_container_class_device_del+0x28/0x38
 transport_remove_classdev+0x6c/0x80
 attribute_container_device_trigger+0x108/0x110
 transport_remove_device+0x28/0x38
 sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
 sas_port_delete+0x30/0x148 [scsi_transport_sas]
 do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
 device_for_each_child+0x68/0xb0
 sas_remove_children+0x30/0x50 [scsi_transport_sas]
 sas_remove_host+0x20/0x38 [scsi_transport_sas]
 scsih_remove+0xd8/0x420 [mpt3sas]

Because transport_add_device() is not called when sas_rphy_add() fails, the
device is not added. When sas_rphy_remove() is subsequently called to
remove the device in the remove() path, a NULL pointer dereference happens.

Fixes: f92363d ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Apr 10, 2025
[ Upstream commit 5ed3b0c ]

When cur_qp isn't NULL, in order to avoid fetching the QP from
the radix tree again we check if the next cqe QP is identical to
the one we already have.

The bug however is that we are checking if the QP is identical by
checking the QP number inside the CQE against the QP number inside the
mlx5_ib_qp, but that's wrong since the QP number from the CQE is from
FW so it should be matched against mlx5_core_qp which is our FW QP
number.

Otherwise we could use the wrong QP when handling a CQE which could
cause the kernel trace below.

This issue is mainly noticeable over QPs 0 & 1, since for now they are
the only QPs in our driver whereas the QP number inside mlx5_ib_qp
doesn't match the QP number inside mlx5_core_qp.

BUG: kernel NULL pointer dereference, address: 0000000000000012
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP
 CPU: 0 UID: 0 PID: 7927 Comm: kworker/u62:1 Not tainted 6.14.0-rc3+ #189
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
 RIP: 0010:mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
 Code: 03 00 00 8d 58 ff 21 cb 66 39 d3 74 39 48 c7 c7 3c 89 6e a0 0f b7 db e8 b7 d2 b3 e0 49 8b 86 60 03 00 00 48 c7 c7 4a 89 6e a0 <0f> b7 5c 98 02 e8 9f d2 b3 e0 41 0f b7 86 78 03 00 00 83 e8 01 21
 RSP: 0018:ffff88810511bd60 EFLAGS: 00010046
 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff88885fa1b3c0 RDI: ffffffffa06e894a
 RBP: 00000000000000b0 R08: 0000000000000000 R09: ffff88810511bc10
 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d593000
 R13: ffff88810e579108 R14: ffff888105146000 R15: 00000000000000b0
 FS:  0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000012 CR3: 00000001077e6001 CR4: 0000000000370eb0
 Call Trace:
  <TASK>
  ? __die+0x20/0x60
  ? page_fault_oops+0x150/0x3e0
  ? exc_page_fault+0x74/0x130
  ? asm_exc_page_fault+0x22/0x30
  ? mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
  __ib_process_cq+0x5a/0x150 [ib_core]
  ib_cq_poll_work+0x31/0x90 [ib_core]
  process_one_work+0x169/0x320
  worker_thread+0x288/0x3a0
  ? work_busy+0xb0/0xb0
  kthread+0xd7/0x1f0
  ? kthreads_online_cpu+0x130/0x130
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork+0x2d/0x50
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork_asm+0x11/0x20
  </TASK>

Fixes: e126ba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Edward Srouji <[email protected]>
Link: https://patch.msgid.link/4ada09d41f1e36db62c44a9b25c209ea5f054316.1741875692.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Apr 14, 2025
[ Upstream commit 5ed3b0c ]

When cur_qp isn't NULL, in order to avoid fetching the QP from
the radix tree again we check if the next cqe QP is identical to
the one we already have.

The bug however is that we are checking if the QP is identical by
checking the QP number inside the CQE against the QP number inside the
mlx5_ib_qp, but that's wrong since the QP number from the CQE is from
FW so it should be matched against mlx5_core_qp which is our FW QP
number.

Otherwise we could use the wrong QP when handling a CQE which could
cause the kernel trace below.

This issue is mainly noticeable over QPs 0 & 1, since for now they are
the only QPs in our driver whereas the QP number inside mlx5_ib_qp
doesn't match the QP number inside mlx5_core_qp.

BUG: kernel NULL pointer dereference, address: 0000000000000012
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP
 CPU: 0 UID: 0 PID: 7927 Comm: kworker/u62:1 Not tainted 6.14.0-rc3+ #189
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
 RIP: 0010:mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
 Code: 03 00 00 8d 58 ff 21 cb 66 39 d3 74 39 48 c7 c7 3c 89 6e a0 0f b7 db e8 b7 d2 b3 e0 49 8b 86 60 03 00 00 48 c7 c7 4a 89 6e a0 <0f> b7 5c 98 02 e8 9f d2 b3 e0 41 0f b7 86 78 03 00 00 83 e8 01 21
 RSP: 0018:ffff88810511bd60 EFLAGS: 00010046
 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff88885fa1b3c0 RDI: ffffffffa06e894a
 RBP: 00000000000000b0 R08: 0000000000000000 R09: ffff88810511bc10
 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d593000
 R13: ffff88810e579108 R14: ffff888105146000 R15: 00000000000000b0
 FS:  0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000012 CR3: 00000001077e6001 CR4: 0000000000370eb0
 Call Trace:
  <TASK>
  ? __die+0x20/0x60
  ? page_fault_oops+0x150/0x3e0
  ? exc_page_fault+0x74/0x130
  ? asm_exc_page_fault+0x22/0x30
  ? mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
  __ib_process_cq+0x5a/0x150 [ib_core]
  ib_cq_poll_work+0x31/0x90 [ib_core]
  process_one_work+0x169/0x320
  worker_thread+0x288/0x3a0
  ? work_busy+0xb0/0xb0
  kthread+0xd7/0x1f0
  ? kthreads_online_cpu+0x130/0x130
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork+0x2d/0x50
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork_asm+0x11/0x20
  </TASK>

Fixes: e126ba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Edward Srouji <[email protected]>
Link: https://patch.msgid.link/4ada09d41f1e36db62c44a9b25c209ea5f054316.1741875692.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Apr 26, 2025
[ Upstream commit 5ed3b0c ]

When cur_qp isn't NULL, in order to avoid fetching the QP from
the radix tree again we check if the next cqe QP is identical to
the one we already have.

The bug however is that we are checking if the QP is identical by
checking the QP number inside the CQE against the QP number inside the
mlx5_ib_qp, but that's wrong since the QP number from the CQE is from
FW so it should be matched against mlx5_core_qp which is our FW QP
number.

Otherwise we could use the wrong QP when handling a CQE which could
cause the kernel trace below.

This issue is mainly noticeable over QPs 0 & 1, since for now they are
the only QPs in our driver whereas the QP number inside mlx5_ib_qp
doesn't match the QP number inside mlx5_core_qp.

BUG: kernel NULL pointer dereference, address: 0000000000000012
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP
 CPU: 0 UID: 0 PID: 7927 Comm: kworker/u62:1 Not tainted 6.14.0-rc3+ #189
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
 RIP: 0010:mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
 Code: 03 00 00 8d 58 ff 21 cb 66 39 d3 74 39 48 c7 c7 3c 89 6e a0 0f b7 db e8 b7 d2 b3 e0 49 8b 86 60 03 00 00 48 c7 c7 4a 89 6e a0 <0f> b7 5c 98 02 e8 9f d2 b3 e0 41 0f b7 86 78 03 00 00 83 e8 01 21
 RSP: 0018:ffff88810511bd60 EFLAGS: 00010046
 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff88885fa1b3c0 RDI: ffffffffa06e894a
 RBP: 00000000000000b0 R08: 0000000000000000 R09: ffff88810511bc10
 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d593000
 R13: ffff88810e579108 R14: ffff888105146000 R15: 00000000000000b0
 FS:  0000000000000000(0000) GS:ffff88885fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000012 CR3: 00000001077e6001 CR4: 0000000000370eb0
 Call Trace:
  <TASK>
  ? __die+0x20/0x60
  ? page_fault_oops+0x150/0x3e0
  ? exc_page_fault+0x74/0x130
  ? asm_exc_page_fault+0x22/0x30
  ? mlx5_ib_poll_cq+0x4c7/0xd90 [mlx5_ib]
  __ib_process_cq+0x5a/0x150 [ib_core]
  ib_cq_poll_work+0x31/0x90 [ib_core]
  process_one_work+0x169/0x320
  worker_thread+0x288/0x3a0
  ? work_busy+0xb0/0xb0
  kthread+0xd7/0x1f0
  ? kthreads_online_cpu+0x130/0x130
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork+0x2d/0x50
  ? kthreads_online_cpu+0x130/0x130
  ret_from_fork_asm+0x11/0x20
  </TASK>

Fixes: e126ba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Patrisious Haddad <[email protected]>
Reviewed-by: Edward Srouji <[email protected]>
Link: https://patch.msgid.link/4ada09d41f1e36db62c44a9b25c209ea5f054316.1741875692.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants