Skip to content

TCP / UDP / iroute interactions are SLOW #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cron2 opened this issue May 7, 2025 · 0 comments
Open

TCP / UDP / iroute interactions are SLOW #4

cron2 opened this issue May 7, 2025 · 0 comments

Comments

@cron2
Copy link

cron2 commented May 7, 2025

Hi,

so this might be due to the GRO/GSO issues already known, but I wanted to log it anyway.

This setup is "Client A connects via TCP/IPv6, and sends pings to a server IP, and an IP on Client B (iroute/iroute-ipv6)".

Pakets do arrive, TCP is SLOW - these machines sit in the same datacenter, and ping times should be in the "1-2 ms" ballpark. Also, it is oscillating - growing from 43ms to 150ms, and then going down again.

The following is from a call to oping -c20 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74, on client A:

Ping client A (TCP) to server - very fast for "this is the DCO interface" (220:1::1) and "sometimes slow for another IP on the host" - but most if the time it's consistently fast (ish). Why "loopback IP" is slower than "DCO interface IP" is a mysterium in itself, but this not the big issue here.

56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=1 ttl=64 time=1.75 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=1 ttl=64 time=43.48 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=2 ttl=64 time=1.49 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=2 ttl=64 time=2.55 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=3 ttl=64 time=1.86 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=3 ttl=64 time=2.84 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=4 ttl=64 time=1.64 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=4 ttl=64 time=2.91 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=5 ttl=64 time=1.52 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=5 ttl=64 time=2.48 ms

ping client A (TCP) to client B (UDP, iroute) - sending 3 pings, but since the effect is the same for all targets, only showing 2..7 for one.

56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=1 ttl=63 time=43.48 ms
56 bytes from fd00:abcd:220:201::74 (fd00:abcd:220:201::74): icmp_seq=1 ttl=63 time=43.48 ms
56 bytes from fd00:abcd:220:202::74 (fd00:abcd:220:202::74): icmp_seq=1 ttl=63 time=43.48 ms
...
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=2 ttl=63 time=42.33 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=3 ttl=63 time=57.84 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=4 ttl=63 time=151.53 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=5 ttl=63 time=139.09 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=6 ttl=63 time=111.47 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=7 ttl=63 time=47.50 ms

(same oping run, same TCP session, ping from server to fd00:abcd:220:200::74 takes 1.4ms)

Subsequently the fping calls used by the t_client test, with a 250ms timeout, do succeed for smaller packets but fail 3000:

fping -b 64 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 1.19 1.53 1.30 1.95 1.63 1.64 1.85 1.63 2.36 3.61
10.220.0.1    : 1.37 1.48 1.50 1.54 1.87 1.93 1.61 1.61 1.50 1.55
10.220.200.74 : 2.43 2.63 2.65 2.65 2.99 2.97 2.64 2.55 2.43 3.09
10.220.201.74 : 2.10 2.49 2.34 2.28 2.70 2.69 2.65 2.37 2.39 2.68
10.220.202.74 : 2.37 2.31 2.32 2.63 2.78 2.21 2.82 2.19 2.38 2.81
fping -b 1440 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 1.37 1.85 1.70 1.91 1.64 1.82 1.89 1.75 1.50 1.59
10.220.0.1    : 1.85 1.90 2.10 1.90 1.85 1.69 2.03 2.01 1.86 1.65
10.220.200.74 : 2.83 - 2.69 2.95 - 2.84 - - - -
10.220.201.74 : - 2.74 2.54 2.60 - 2.71 2.86 - - -
10.220.202.74 : - - 2.66 2.99 - - - 2.71 2.39 2.48
fping -b 3000 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 3.08 3.41 3.48 3.42 2.87 3.29 3.55 3.06 3.33 3.42
10.220.0.1    : 3.52 3.13 3.54 3.28 2.98 3.24 3.21 3.17 4.05 3.13
10.220.200.74 : 4.45 - 4.31 - 4.22 - 4.33 4.19 - -
10.220.201.74 : - - 4.71 - - - - - - -
10.220.202.74 : 47.1 - - - - 4.28 - - - -
fping6 -b 64 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 1.20 1.65 1.59 1.51 1.66 1.36 1.34 1.62 1.59 1.64
fd00:abcd:220:0::1    : 1.37 1.65 1.71 1.92 1.76 1.20 1.28 1.66 1.41 1.39
fd00:abcd:220:200::74 : 2.37 2.72 2.65 2.55 2.52 2.48 2.59 2.51 2.36 2.68
fd00:abcd:220:201::74 : 2.52 2.63 2.91 2.66 2.78 1.97 2.22 2.68 2.46 2.65
fd00:abcd:220:202::74 : 3.69 2.65 2.18 2.46 2.70 2.37 2.37 2.50 2.31 2.28
fping6 -b 1440 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 1.23 1.63 1.83 1.62 1.55 2.02 2.33 1.45 2.57 1.78
fd00:abcd:220:0::1    : 1.64 1.84 1.31 1.88 2.27 1.77 1.76 1.47 1.43 1.66
fd00:abcd:220:200::74 : 2.58 - 2.40 - - - 3.13 - - -
fd00:abcd:220:201::74 : 3.16 - - 2.56 - - - - 2.68 -
fd00:abcd:220:202::74 : 2.46 - - - - - - 2.80 - 2.83
fping6 -b 3000 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 3.74 4.02 3.72 3.78 4.29 3.89 3.49 3.37 3.53 3.55
fd00:abcd:220:0::1    : 3.29 3.49 3.28 3.57 3.56 3.19 3.29 3.20 3.26 3.14
fd00:abcd:220:200::74 : - - 5.17 - - 4.72 - 4.36 - -
fd00:abcd:220:201::74 : - - - - - 5.31 - - - -
fd00:abcd:220:202::74 : - - - - - - - - - -

note that sometimes the fping does succeed, depending on ping times (6 packets, going back and forth, might just succeed in 250ms if latency is around 45ms, but will fail if going higher)

ordex pushed a commit that referenced this issue May 19, 2025
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below.  To prevent this invalid access, it is necessary to
check the PMD migration entry and return early.  In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio.  Since the PMD migration entry is locked, it
cannot be served as the target.

Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."

BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e

The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 84c3fc4 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Gavin Guo <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Zi Yan <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Cc: Florent Revest <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ordex pushed a commit that referenced this issue May 19, 2025
A warning on driver removal started occurring after commit 9dd05df
("net: warn if NAPI instance wasn't shut down"). Disable tx napi before
deleting it in mt76_dma_cleanup().

 WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100
 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy)
 Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024
 RIP: 0010:__netif_napi_del_locked+0xf0/0x100
 Call Trace:
 <TASK>
 mt76_dma_cleanup+0x54/0x2f0 [mt76]
 mt7921_pci_remove+0xd5/0x190 [mt7921e]
 pci_device_remove+0x47/0xc0
 device_release_driver_internal+0x19e/0x200
 driver_detach+0x48/0x90
 bus_remove_driver+0x6d/0xf0
 pci_unregister_driver+0x2e/0xb0
 __do_sys_delete_module.isra.0+0x197/0x2e0
 do_syscall_64+0x7b/0x160
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Tested with mt7921e but the same pattern can be actually applied to other
mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled
in their *_dma_init() functions and only toggled off and on again inside
their suspend/resume/reset paths. So it should be okay to disable tx
napi in such a generic way.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 2ac515a ("mt76: mt76x02: use napi polling for tx cleanup")
Cc: [email protected]
Signed-off-by: Fedor Pchelkin <[email protected]>
Tested-by: Ming Yen Hsieh <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant