Skip to content

raspbian: kernel OOPS following page allocation failures under USB/network load #189

Closed
@P33M

Description

@P33M

Hello.

I have a stock raspbian + kernel 3.6.11+ #350, and can reliably oops the kernel by doing the following:

  1. ntfs-3g mount an external USB HDD
  2. Share this HDD with samba
  3. stream a large high bandwidth video (although standard resolution produces a similar result eventually)
  4. Seek repeatedly within the file on Windows

Over time multiple page allocation failures occur, all with the typical stack trace:

[ 2099.187501] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 2099.187665] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 2099.187843] smbd: page allocation failure: order:0, mode:0x20
[ 2099.187905] [<c0013a7c>] (unwind_backtrace+0x0/0xf0) from [<c0092760>] (warn_alloc_failed+0xc4/0x11c)
[ 2099.187937] [<c0092760>] (warn_alloc_failed+0xc4/0x11c) from [<c0094a70>] (__alloc_pages_nodemask+0x3e0/0x63c)
[ 2099.187967] [<c0094a70>] (__alloc_pages_nodemask+0x3e0/0x63c) from [<c02e4598>] (__netdev_alloc_frag+0x90/0x118)
[ 2099.187997] [<c02e4598>] (__netdev_alloc_frag+0x90/0x118) from [<c02e8708>] (__netdev_alloc_skb+0x40/0xd0)
[ 2099.188031] [<c02e8708>] (__netdev_alloc_skb+0x40/0xd0) from [<c026a870>] (rx_submit+0x1c/0x1f8)
[ 2099.188056] [<c026a870>] (rx_submit+0x1c/0x1f8) from [<c026aa88>] (rx_alloc_submit+0x3c/0x98)
[ 2099.188079] [<c026aa88>] (rx_alloc_submit+0x3c/0x98) from [<c026b83c>] (usbnet_bh+0x1e4/0x254)
[ 2099.188116] [<c026b83c>] (usbnet_bh+0x1e4/0x254) from [<c0025724>] (tasklet_action+0x60/0xb4)
[ 2099.188142] [<c0025724>] (tasklet_action+0x60/0xb4) from [<c0025854>] (__do_softirq+0xa0/0x154)
[ 2099.188168] [<c0025854>] (__do_softirq+0xa0/0x154) from [<c0025d1c>] (irq_exit+0x8c/0x94)
[ 2099.188201] [<c0025d1c>] (irq_exit+0x8c/0x94) from [<c000e920>] (handle_IRQ+0x34/0x84)
[ 2099.188227] [<c000e920>] (handle_IRQ+0x34/0x84) from [<c03997d4>] (__irq_svc+0x34/0xc8)
[ 2099.188252] [<c03997d4>] (__irq_svc+0x34/0xc8) from [<c02e7528>] (__alloc_skb+0xac/0x158)
[ 2099.188292] [<c02e7528>] (__alloc_skb+0xac/0x158) from [<c03272b4>] (sk_stream_alloc_skb+0x2c/0xe8)
[ 2099.188319] [<c03272b4>] (sk_stream_alloc_skb+0x2c/0xe8) from [<c0327cec>] (tcp_sendmsg+0x29c/0xe68)
[ 2099.188346] [<c0327cec>] (tcp_sendmsg+0x29c/0xe68) from [<c0349870>] (inet_sendmsg+0x40/0x78)
[ 2099.188379] [<c0349870>] (inet_sendmsg+0x40/0x78) from [<c02dd1e0>] (sock_aio_write+0x110/0x144)
[ 2099.188421] [<c02dd1e0>] (sock_aio_write+0x110/0x144) from [<c00c1b58>] (do_sync_readv_writev+0x94/0xdc)
[ 2099.188451] [<c00c1b58>] (do_sync_readv_writev+0x94/0xdc) from [<c00c1dfc>] (do_readv_writev+0xa8/0x178)
[ 2099.188476] [<c00c1dfc>] (do_readv_writev+0xa8/0x178) from [<c00c1f18>] (vfs_writev+0x4c/0x70)
[ 2099.188500] [<c00c1f18>] (vfs_writev+0x4c/0x70) from [<c00c20a4>] (sys_writev+0x38/0xc8)
[ 2099.188527] [<c00c20a4>] (sys_writev+0x38/0xc8) from [<c000da60>] (ret_fast_syscall+0x0/0x30)

Leading to an OOPS with a constant originator, failure of memcpy to allocate memory.

This behaviour can also be replicated by eliminating the network stack entirely from proceedings. Copying from one USB device to another results in similar failures:

[  599.883068] ntfs-3g: page allocation failure: order:0, mode:0x20
[  599.883121] [<c0014a70>] (unwind_backtrace+0x0/0x130) from [<c00907b0>] (warn_alloc_failed+0xc8/0x10c)
[  599.883149] [<c00907b0>] (warn_alloc_failed+0xc8/0x10c) from [<c0092ad4>] (__alloc_pages_nodemask+0x3e4/0x64c)
[  599.883175] [<c0092ad4>] (__alloc_pages_nodemask+0x3e4/0x64c) from [<c02e785c>] (__netdev_alloc_frag+0x94/0x128)
[  599.883201] [<c02e785c>] (__netdev_alloc_frag+0x94/0x128) from [<c02ebb90>] (__netdev_alloc_skb+0x84/0xe0)
[  599.883230] [<c02ebb90>] (__netdev_alloc_skb+0x84/0xe0) from [<c02683d8>] (rx_submit+0x1c/0x2c8)
[  599.883252] [<c02683d8>] (rx_submit+0x1c/0x2c8) from [<c02686c0>] (rx_alloc_submit+0x3c/0x94)
[  599.883272] [<c02686c0>] (rx_alloc_submit+0x3c/0x94) from [<c0269adc>] (usbnet_bh+0x244/0x334)
[  599.883304] [<c0269adc>] (usbnet_bh+0x244/0x334) from [<c00260f8>] (tasklet_action+0x60/0xb8)
[  599.883327] [<c00260f8>] (tasklet_action+0x60/0xb8) from [<c0026230>] (__do_softirq+0xa4/0x14c)
[  599.883349] [<c0026230>] (__do_softirq+0xa4/0x14c) from [<c0026700>] (irq_exit+0x8c/0x94)
[  599.883378] [<c0026700>] (irq_exit+0x8c/0x94) from [<c000f280>] (handle_IRQ+0x34/0x84)
[  599.883402] [<c000f280>] (handle_IRQ+0x34/0x84) from [<c039c454>] (__irq_svc+0x34/0xc8)
[  599.883429] [<c039c454>] (__irq_svc+0x34/0xc8) from [<c01ea470>] (memcpy+0x50/0x330)
[  599.883437] Mem-info:
[  599.883444] Normal per-cpu:
[  599.883453] CPU    0: hi:   90, btch:  15 usd:  26
[  599.883476] active_anon:1516 inactive_anon:1526 isolated_anon:0
[  599.883476]  active_file:5640 inactive_file:32134 isolated_file:0
[  599.883476]  unevictable:0 dirty:1987 writeback:0 unstable:0
[  599.883476]  free:8903 slab_reclaimable:1010 slab_unreclaimable:999
[  599.883476]  mapped:2354 shmem:98 pagetables:186 bounce:0
[  599.883517] Normal free:35612kB min:16384kB low:20480kB high:24576kB active_anon:6064kB inactive_anon:6104kB active_file:22560kB inactive_file:128536kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:223392kB mlocked:0kB dirty:7948kB writeback:0kB mapped:9416kB shmem:392kB slab_reclaimable:4040kB slab_unreclaimable:3996kB kernel_stack:1152kB pagetables:744kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  599.883528] lowmem_reserve[]: 0 0
[  599.883544] Normal: 5309*4kB 1753*8kB 22*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35612kB
[  599.883590] 37872 total pagecache pages
[  599.883598] 0 pages in swap cache
[  599.883606] Swap cache stats: add 0, delete 0, find 0/0
[  599.883612] Free swap  = 102396kB
[  599.883617] Total swap = 102396kB
[  599.894071] 56320 pages of RAM
[  599.894080] 9169 free pages
[  599.894086] 2077 reserved pages
[  599.894092] 2009 slab pages
[  599.894098] 29265 pages shared
[  599.894104] 0 pages swap cached

Although it seems to be because there was network activity also.

I compiled a debugging kernel from the latest pull of this repository with slab allocation debugging enabled plus kmemleak debugging.

On boot I get this:

<..snip normal boot messages..>
[    2.636678] usb 1-1: new high-speed USB device number 2 using dwc_otg
[    2.647828] Indeed it is in host mode hprt0 = 00001101
[    2.857062] usb 1-1: New USB device found, idVendor=0424, idProduct=9512
[    2.868349] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.881209] hub 1-1:1.0: USB hub found
[    2.889757] hub 1-1:1.0: 3 ports detected
[    3.177021] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
[    3.287464] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    3.300575] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.321545] smsc95xx v1.0.4
[    3.395812] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:16:5c:e9
[    3.527102] usb 1-1.2: new high-speed USB device number 4 using dwc_otg
[    3.667890] usb 1-1.2: New USB device found, idVendor=1a40, idProduct=0201
[    3.679714] usb 1-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    3.691535] usb 1-1.2: Product: USB 2.0 Hub [MTT]
[    3.702452] hub 1-1.2:1.0: USB hub found
[    3.711453] hub 1-1.2:1.0: 7 ports detected
[    3.996938] usb 1-1.2.5: new high-speed USB device number 5 using dwc_otg
[    4.108456] usb 1-1.2.5: New USB device found, idVendor=07ab, idProduct=fc88
[    4.129423] usb 1-1.2.5: New USB device strings: Mfr=3, Product=11, SerialNumber=5
[    4.155406] usb 1-1.2.5: Product: Freecom Mobile Drive XXS
[    4.166334] usb 1-1.2.5: Manufacturer: Freecom
[    4.175256] usb 1-1.2.5: SerialNumber: 287011C82395
[    4.197863] scsi0 : usb-storage 1-1.2.5:1.0
[    4.547357] udevd[145]: starting version 175
[    5.208224] scsi 0:0:0:0: Direct-Access     Freecom  Mobile Drive XXS      PQ: 0 ANSI: 2 CCS
[    5.249382] sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
[    5.287762] sd 0:0:0:0: [sda] Write Protect is off
[    5.316683] sd 0:0:0:0: [sda] Mode Sense: 34 00 00 00
[    5.317761] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    5.368362] Slab corruption (Not tainted): size-64 start=c7a5b8e0, len=64
[    5.406670] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    5.441301] Prev obj: start=c7a5b8a0, len=64
[    5.476652] 000: 20 b3 a5 c7 a8 22 80 c7 00 00 00 00 00 10 af c7   ...."..........
[    5.502686] 010: 04 00 00 00 ff ff ff ff 00 00 5a 5a fe ff ff ff  ..........ZZ....
[    5.546631] Next obj: start=c7a5b920, len=64
[    5.555677] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.606636] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    5.644104] Slab corruption (Not tainted): size-64 start=c7b3d3a0, len=64
[    5.683449] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    5.719188] Prev obj: start=c7b3d360, len=64
[    5.749983] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.794431] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    5.826668] Next obj: start=c7b3d3e0, len=64
[    5.835765] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    5.890040] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    7.959533]  sda: sda1
[    7.977742] sd 0:0:0:0: [sda] Attached SCSI disk
[    8.003154] Registered led device: led0
[    9.390477] Slab corruption (Not tainted): size-64 start=c7a5b720, len=64
[    9.402192] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 6b a9 c7  kkkkkkkkkkkk.k..
[    9.414768] Prev obj: start=c7a5b6e0, len=64
[    9.424032] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.436680] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.449214] Next obj: start=c7a5b760, len=64
[    9.458488] 000: 01 00 00 00 5a 5a 5a 5a 01 00 00 00 ad 4e ad de  ....ZZZZ.....N..
[    9.471242] 010: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00  ................
[    9.523315] Slab corruption (Not tainted): size-64 start=c7a5b6e0, len=64
[    9.551462] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a0 6a a9 c7  kkkkkkkkkkkk.j..
[    9.564332] Prev obj: start=c7a5b6a0, len=64
[    9.573690] 000: a0 ed a8 c7 e0 06 9c c7 e0 06 9c c7 d0 40 aa c7  .............@..
[    9.586287] 010: e0 06 9c c7 d4 3e 55 c0 d0 4a ab c7 02 00 00 00  .....>U..J......
[    9.599022] Next obj: start=c7a5b720, len=64
[    9.608257] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[    9.620858] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   14.089511] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   14.685230] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   15.849799] bcm2835 ALSA card created!
[   15.868583] bcm2835 ALSA chip created!
[   15.886912] bcm2835 ALSA chip created!
[   15.905486] bcm2835 ALSA chip created!
[   15.917167] bcm2835 ALSA chip created!
[   15.932963] bcm2835 ALSA chip created!
[   15.944510] bcm2835 ALSA chip created!
[   15.955946] bcm2835 ALSA chip created!

Something is corrupting memory on boot. Given that it's on USB data access during enumeration of a device, my feeling is that it's the USB driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions