Skip to content

Raspberry pi as router - very slow #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
goldie83 opened this issue May 20, 2013 · 14 comments
Closed

Raspberry pi as router - very slow #301

goldie83 opened this issue May 20, 2013 · 14 comments
Assignees

Comments

@goldie83
Copy link

Hi,
Raspberry Pi Model B - 512MB RAM working mainly as a router. (raspbian wheezy)
Symptoms: slow internet (not going over 600kiB/s), raspberry acts like on heavy load.
The second NIC is on the USB bus. Watch for: time spent servicing software interrupts.

top shows:
top - 00:19:56 up 4:28, 1 user, load average: 0.19, 0.49, 0.42
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.0 us, 7.1 sy, 0.0 ni, 34.3 id, 0.0 wa, 0.0 hi, 48.6 si, 0.0 st

pi@goldie ~ $ uname -a
Linux goldie 3.6.11+ #446 PREEMPT Fri May 10 20:17:25 BST 2013 armv6l GNU/Linux

pi@goldie ~ $ lsusb
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 0a46:9601 Davicom Semiconductor, Inc. DM9601 Fast Ethernet Adapter
Bus 001 Device 005: ID 04fc:0c25 Sunplus Technology Co., Ltd SATALink SPIF225A

pi@goldie ~ $ procinfo
Memory: Total Used Free Buffers
RAM: 383752 249500 134252 23276
Swap: 9767484 0 9767484

Bootup: Mon May 20 19:51:20 2013 Load average: 1.22 0.70 0.43 3/161 4946

user : 00:19:02.52 9.8% page in : 154808
nice : 00:00:00.00 0.0% page out: 66441
system: 00:04:22.47 2.3% page act: 19394
IOwait: 00:00:15.95 0.1% page dea: 3
hw irq: 00:00:02.54 0.0% page flt: 1683962
sw irq: 00:21:39.39 11.2% swap in : 0
idle : 02:28:00.45 76.5% swap out: 0
uptime: 04:24:12.00 context : 1968187

irq 3: 571046 irq 75: 1
irq 32: 382177216 irq 77: 224
irq 52: 0 irq 83: 20
irq 65: 31685 irq 84: 18204
irq 66: 1

mmcblk0 218r mmcblk0p2 80r
mmcblk0p1 50r sda 5830r 5096w

eth0 TX 1.03GiB RX 2.92GiB lo TX 48.25KiB RX 48.25KiB
eth1 TX 2.97GiB RX 1022.39MiB

and

pi@goldie ~ $ ls /proc/irq/32
dwc_otg dwc_otg_hcd:usb1 dwc_otg_pcd spurious

@ghost ghost assigned ghollingworth May 20, 2013
@Ferroin
Copy link
Contributor

Ferroin commented May 20, 2013

At least part of this is probably due to bus contention between the Ethernet chips (and other devices).
USB also is not very efficient (comparatively) for devices like network adapters that need high I/O bandwidth because the devices have to deal with the bus management overhead (which can be very high if there are multiple hubs between the device and the bus controller).
Furthermore, all the data being relayed between the adapters has to traverse the bus twice because the adapters can't transfer the data directly between each other.

In fact, assuming that you are using a USB disk as root (I assume that's what the SATALink is for), and that all devices are running at high speed (480Mb/s signaling rate), 600kiB/s is actually very good.

@popcornmix
Copy link
Collaborator

@Ferroin
are you sure you've got your units right?
600kB/s=4.8Mb/s which is 1% of 480Mb/s signalling rate.

I'm not sure what speed you should expect, but that seems too low to me.

@goldie83
Copy link
Author

@Ferroin
you are right with the USB disk - but I didn't notice any activity of the disk while routing packets.

@stegmannt
Copy link

My Pi (model B, revision 1) also serves as a router and firewall to a students residence network. Down- and upstream are usually around 3-5 MByte/s, which is something like 50% I get out of my notebook. It's not high speed, but ok for the Pi, isn't it?

Your low speed may also be caused by the usb network adapter. Here's mine:

Bus 001 Device 004: ID 0b95:7720 ASIX Electronics Corp. AX88772

@Ferroin
Copy link
Contributor

Ferroin commented May 21, 2013

@popcornmix
As it is the 480Mb/s signal rate means maybe 240Mb/s actual throughput due to the management overhead, then there is the fact that most USB network adapters (the SMSC95xx included in this list) are controlled via reading and writing to device registers, which is not at all efficient over USB (I2C or SPI would be better for that), and they limit (USB) packet size for such control data. On top of this all, many can only transfer a single Ethernet frame at a time on slower systems like the pi.

Also by 600kiB/s i mean actual throughput on top of TCP/IP, not Ethernet level throughput; TCP and IPv4 add a lot of processing overhead (especially if you are doing packet filtering, not just flat relaying between the interfaces).

@goldie83
Copy link
Author

@stegmannt
maybe its not the best testing method, but:

  • eth1 is connected to the internet. Fetched a single file from web: 7 MB/s peak (of course written to USB disk on same bus)
  • copying this file between two partitions on the disk about 10MB/s
  • getting from apache server on raspberry this file is about 600kB/s (eth0) - ethernet card on raspberry - and I think this is the problem (how to investigate this?)
root@goldie:/home/pi# dd if=/home/pi/debian-testing-amd64-DVD-1.iso of=/var/www/debian-testing-amd64-DVD-1.iso bs=4096
973896+0 records in
973896+0 records out
3989078016 bytes (4.0 GB) copied, 385.982 s, 10.3 MB/s
pi@goldie ~ $ dmesg | grep eth1
[   13.408808] dm9601 1-1.2:1.0: eth1: register 'dm9601' at usb-bcm2708_usb-1.2, Davicom DM9601 USB Ethernet, 00:06:06:e0:0a:5f
pi@goldie ~ $ wget http://cdimage.debian.org/cdimage/weekly-builds/amd64/iso-dvd/debian-testing-amd64-DVD-1.iso
...
2013-05-21 15:03:19 (5.08 MB/s) - `debian-testing-amd64-DVD-1.iso' saved [3989078016/3989078016]
root@dlink-4378AA:/mnt/HD/HD_a2/tmp# wget http://192.168.20.1/debian-testing-amd64-DVD-1.iso
--2013-05-21 14:02:23--  http://192.168.20.1/debian-testing-amd64-DVD-1.iso
Connecting to 192.168.20.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3989078016 (3.7G) [application/x-iso9660-image]
Saving to: `debian-testing-amd64-DVD-1.iso'
 4% [>                                      ] 196,256,512  614K/s  eta 1h 41m

@stegmannt
Copy link

@goldie83

I made some testing. What I found out is, that network speed heavily depends on the tool used to copy a file.

SCP with compression enabled

$ scp -C [email protected]:debian-testing-amd64-CD-1.iso . 
debian-testing-amd64-CD-1.iso                                           10%   67MB   1.0MB/s   09:50 ETA

SCP with default options

$ scp [email protected]:debian-testing-amd64-CD-1.iso . 
debian-testing-amd64-CD-1.iso                                           25%  168MB   3.4MB/s   02:21 ETA

SCP with RC4

$scp -c arcfour [email protected]:debian-testing-amd64-CD-1.iso . 
debian-testing-amd64-CD-1.iso                                           10%   70MB   4.1MB/s   02:21 ETA

Netcat

$ dd if=/dev/zero bs=1024K count=512 | nc -v 192.168.xx.yy 2222
Connection to 192.168.xx.yy 2222 port [tcp/*] succeeded!
512+0 Datensätze ein
512+0 Datensätze aus
536870912 Bytes (537 MB) kopiert, 115,15 s, 4,7 MB/s

As a conclusion I suspect Apache to be your bottleneck. Have your tried any other tools?

edit

And I just learned about iperf:

$ iperf -c 192.168.xx.yy -f M -t 20
------------------------------------------------------------
Client connecting to 192.168.xx.yy, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.xx.zz port 39326 connected with 192.168.xx.yy port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.0 sec   145 MBytes  7.24 MBytes/sec

@goldie83
Copy link
Author

Wan't go faster...

pi@goldie ~ $ dd if=/dev/zero bs=1024K count=16 | nc -v 192.168.20.30 2222
Connection to 192.168.20.30 2222 port [tcp/*] succeeded!
16+0 records in
16+0 records out
16777216 bytes (17 MB) copied, 26.5187 s, 633 kB/s
pi@goldie ~ $ iperf -c 192.168.20.30 -f M -t 20 -p 2222
------------------------------------------------------------
Client connecting to 192.168.20.30, TCP port 2222
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.20.1 port 54523 connected with 192.168.20.30 port 2222
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.3 sec  12.4 MBytes  0.61 MBytes/sec

@goldie83 goldie83 reopened this May 22, 2013
@popcornmix
Copy link
Collaborator

@goldie83
What do you get from internet ethernet interface with external USB ethernet unplugged?

pi@raspberrypi:~ $ iperf -c 10.177.66.43 -f M -t 20
------------------------------------------------------------
Client connecting to 10.177.66.43, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 10.177.71.41 port 57453 connected with 10.177.66.43 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.0 sec   225 MBytes  11.2 MBytes/sec

You could try the fiq_split kernel:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=39175&hilit=fiq_split&start=25#p345905

@stegmannt
Copy link

Also, sdram_freq seems to have an impact on ethernet/usb? performance.

With default options I get ~7MBytes/s, but with sdram_freq=500 I suddenly get:

iperf -c 192.168.xx.yy -f M -t 20
------------------------------------------------------------
Client connecting to 192.168.xx.yy, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.xx.yy port 39555 connected with 192.168.xx.yy port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.0 sec   205 MBytes  10.3 MBytes/sec

This effect is reproducible over here.

@hvenzke
Copy link

hvenzke commented May 24, 2013

well ,

at my site changed few buffer settings ..
http://serverfault.com/questions/277009/what-does-net-ratelimit-44-callbacks-suppressed-mean-on-a-linux-based-router to get RID of

[ 1954.422186] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 1954.422353] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped
[ 1954.422440] smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped

@goldie83
Copy link
Author

goldie83 commented Jun 7, 2013

i have made a mistake. eth1 was an external NIC (dm9601) - and this was all the time the bottleneck. Now i've exchanged the NIC to an:

Bus 001 Device 004: ID 0b95:772a ASIX Electronics Corp. AX88772A Fast Ethernet

and it is fast. Between two raspberrys:

root@goldie:~# iperf -c 192.168.20.30 -f M -t 20
------------------------------------------------------------
Client connecting to 192.168.20.30, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.20.1 port 42496 connected with 192.168.20.30 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-20.0 sec   149 MBytes  7.44 MBytes/sec

I'm closing this issue - because people have reported that the dm9601 has performance issues:
http://www.raspberrypi.org/phpBB3/viewtopic.php?p=117823&sid=6de6013e9cc3b222ac9bcee61607acd2#p117823
http://forum.odroid.com/viewtopic.php?f=10&t=1208

@goldie83 goldie83 closed this as completed Jun 7, 2013
@popcornmix
Copy link
Collaborator

Thanks for the info.

@demian2g
Copy link

yep
ID 0fe6:9700 Kontron (Industrial Computer Source / ICS Advent) DM9601 Fast Ethernet Adapter
same problem on OrangePi: <7Mbit peak on iperf and speedtest (https://github.com/raw/sivel/speedtest-cli/master/speedtest.py)

popcornmix pushed a commit that referenced this issue Feb 26, 2018
The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long <[email protected]>

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  (&xt[i].mutex){+.+.}, at: [<00000000328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59a ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: [email protected]
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
nathanchance pushed a commit to nathanchance/pi-kernel that referenced this issue Mar 3, 2018
commit 01ea306 upstream.

The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long <[email protected]>

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ raspberrypi#301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  (&xt[i].mutex){+.+.}, at: [<00000000328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59a ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: [email protected]
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Krzysztof Piotr Oledzki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Mar 8, 2018
commit 01ea306 upstream.

The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long <[email protected]>

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  (&xt[i].mutex){+.+.}, at: [<00000000328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59a ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: [email protected]
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Krzysztof Piotr Oledzki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Mar 15, 2018
commit 01ea306 upstream.

The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:
Reviewed-by: Xin Long <[email protected]>

======================================================
WARNING: possible circular locking dependency detected
4.15.0+ raspberrypi#301 Not tainted
------------------------------------------------------
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  (&xt[i].mutex){+.+.}, at: [<00000000328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59a ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
Reported-by: [email protected]
Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Krzysztof Piotr Oledzki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Nov 21, 2023
[ Upstream commit d45c4b4 ]

A thread started via eg. user_mode_thread() runs in the kernel to begin
with and then may later return to userspace. While it's running in the
kernel it has a pt_regs at the base of its kernel stack, but that
pt_regs is all zeroes.

If the thread oopses in that state, it leads to an ugly stack trace with
a big block of zero GPRs, as reported by Joel:

  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc7-00004-gf7757129e3de-dirty #3
  Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV
  Call Trace:
  [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable)
  [c0000000036afb30] [c00000000013c524] panic+0x178/0x424
  [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324
  [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344
  [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac
  [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0
  [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
  --- interrupt: 0 at 0x0
  NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c0000000036afe80 TRAP: 0000   Not tainted  (6.5.0-rc7-00004-gf7757129e3de-dirty)
  MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 0
  GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  NIP [0000000000000000] 0x0
  LR [0000000000000000] 0x0
  --- interrupt: 0

The all-zero pt_regs looks ugly and conveys no useful information, other
than its presence. So detect that case and just show the presence of the
frame by printing the interrupt marker, eg:

  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00126-g18e9506562a0-dirty #301
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  Call Trace:
  [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable)
  [c000000003aabb30] [c00000000014c624] panic+0x178/0x424
  [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324
  [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344
  [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac
  [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0
  [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
  --- interrupt: 0 at 0x0

To avoid ever suppressing a valid pt_regs make sure the pt_regs has a
zero MSR and TRAP value, and is located at the very base of the stack.

Fixes: 6895dfc ("powerpc: copy_thread fill in interrupt frame marker and back chain")
Reported-by: Joel Stanley <[email protected]>
Reported-by: Nicholas Piggin <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Nov 21, 2023
[ Upstream commit d45c4b4 ]

A thread started via eg. user_mode_thread() runs in the kernel to begin
with and then may later return to userspace. While it's running in the
kernel it has a pt_regs at the base of its kernel stack, but that
pt_regs is all zeroes.

If the thread oopses in that state, it leads to an ugly stack trace with
a big block of zero GPRs, as reported by Joel:

  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc7-00004-gf7757129e3de-dirty #3
  Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV
  Call Trace:
  [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable)
  [c0000000036afb30] [c00000000013c524] panic+0x178/0x424
  [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324
  [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344
  [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac
  [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0
  [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
  --- interrupt: 0 at 0x0
  NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c0000000036afe80 TRAP: 0000   Not tainted  (6.5.0-rc7-00004-gf7757129e3de-dirty)
  MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 0
  GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  NIP [0000000000000000] 0x0
  LR [0000000000000000] 0x0
  --- interrupt: 0

The all-zero pt_regs looks ugly and conveys no useful information, other
than its presence. So detect that case and just show the presence of the
frame by printing the interrupt marker, eg:

  Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00126-g18e9506562a0-dirty #301
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  Call Trace:
  [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable)
  [c000000003aabb30] [c00000000014c624] panic+0x178/0x424
  [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324
  [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344
  [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac
  [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0
  [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
  --- interrupt: 0 at 0x0

To avoid ever suppressing a valid pt_regs make sure the pt_regs has a
zero MSR and TRAP value, and is located at the very base of the stack.

Fixes: 6895dfc ("powerpc: copy_thread fill in interrupt frame marker and back chain")
Reported-by: Joel Stanley <[email protected]>
Reported-by: Nicholas Piggin <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants