TCP / UDP / iroute interactions are SLOW #4

cron2 · 2025-05-07T11:20:23Z

Hi,

so this might be due to the GRO/GSO issues already known, but I wanted to log it anyway.

This setup is "Client A connects via TCP/IPv6, and sends pings to a server IP, and an IP on Client B (iroute/iroute-ipv6)".

Pakets do arrive, TCP is SLOW - these machines sit in the same datacenter, and ping times should be in the "1-2 ms" ballpark. Also, it is oscillating - growing from 43ms to 150ms, and then going down again.

The following is from a call to oping -c20 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74, on client A:

Ping client A (TCP) to server - very fast for "this is the DCO interface" (220:1::1) and "sometimes slow for another IP on the host" - but most if the time it's consistently fast (ish). Why "loopback IP" is slower than "DCO interface IP" is a mysterium in itself, but this not the big issue here.

56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=1 ttl=64 time=1.75 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=1 ttl=64 time=43.48 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=2 ttl=64 time=1.49 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=2 ttl=64 time=2.55 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=3 ttl=64 time=1.86 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=3 ttl=64 time=2.84 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=4 ttl=64 time=1.64 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=4 ttl=64 time=2.91 ms
56 bytes from fd00:abcd:220:1::1 (fd00:abcd:220:1::1): icmp_seq=5 ttl=64 time=1.52 ms
56 bytes from fd00:abcd:220:0::1 (fd00:abcd:220::1): icmp_seq=5 ttl=64 time=2.48 ms

ping client A (TCP) to client B (UDP, iroute) - sending 3 pings, but since the effect is the same for all targets, only showing 2..7 for one.

56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=1 ttl=63 time=43.48 ms
56 bytes from fd00:abcd:220:201::74 (fd00:abcd:220:201::74): icmp_seq=1 ttl=63 time=43.48 ms
56 bytes from fd00:abcd:220:202::74 (fd00:abcd:220:202::74): icmp_seq=1 ttl=63 time=43.48 ms
...
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=2 ttl=63 time=42.33 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=3 ttl=63 time=57.84 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=4 ttl=63 time=151.53 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=5 ttl=63 time=139.09 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=6 ttl=63 time=111.47 ms
56 bytes from fd00:abcd:220:200::74 (fd00:abcd:220:200::74): icmp_seq=7 ttl=63 time=47.50 ms

(same oping run, same TCP session, ping from server to fd00:abcd:220:200::74 takes 1.4ms)

Subsequently the fping calls used by the t_client test, with a 250ms timeout, do succeed for smaller packets but fail 3000:

fping -b 64 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 1.19 1.53 1.30 1.95 1.63 1.64 1.85 1.63 2.36 3.61
10.220.0.1    : 1.37 1.48 1.50 1.54 1.87 1.93 1.61 1.61 1.50 1.55
10.220.200.74 : 2.43 2.63 2.65 2.65 2.99 2.97 2.64 2.55 2.43 3.09
10.220.201.74 : 2.10 2.49 2.34 2.28 2.70 2.69 2.65 2.37 2.39 2.68
10.220.202.74 : 2.37 2.31 2.32 2.63 2.78 2.21 2.82 2.19 2.38 2.81
fping -b 1440 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 1.37 1.85 1.70 1.91 1.64 1.82 1.89 1.75 1.50 1.59
10.220.0.1    : 1.85 1.90 2.10 1.90 1.85 1.69 2.03 2.01 1.86 1.65
10.220.200.74 : 2.83 - 2.69 2.95 - 2.84 - - - -
10.220.201.74 : - 2.74 2.54 2.60 - 2.71 2.86 - - -
10.220.202.74 : - - 2.66 2.99 - - - 2.71 2.39 2.48
fping -b 3000 -C 20 -p 250 -q -C 10 10.220.1.1 10.220.0.1 10.220.200.74 10.220.201.74 10.220.202.74
10.220.1.1    : 3.08 3.41 3.48 3.42 2.87 3.29 3.55 3.06 3.33 3.42
10.220.0.1    : 3.52 3.13 3.54 3.28 2.98 3.24 3.21 3.17 4.05 3.13
10.220.200.74 : 4.45 - 4.31 - 4.22 - 4.33 4.19 - -
10.220.201.74 : - - 4.71 - - - - - - -
10.220.202.74 : 47.1 - - - - 4.28 - - - -
fping6 -b 64 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 1.20 1.65 1.59 1.51 1.66 1.36 1.34 1.62 1.59 1.64
fd00:abcd:220:0::1    : 1.37 1.65 1.71 1.92 1.76 1.20 1.28 1.66 1.41 1.39
fd00:abcd:220:200::74 : 2.37 2.72 2.65 2.55 2.52 2.48 2.59 2.51 2.36 2.68
fd00:abcd:220:201::74 : 2.52 2.63 2.91 2.66 2.78 1.97 2.22 2.68 2.46 2.65
fd00:abcd:220:202::74 : 3.69 2.65 2.18 2.46 2.70 2.37 2.37 2.50 2.31 2.28
fping6 -b 1440 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 1.23 1.63 1.83 1.62 1.55 2.02 2.33 1.45 2.57 1.78
fd00:abcd:220:0::1    : 1.64 1.84 1.31 1.88 2.27 1.77 1.76 1.47 1.43 1.66
fd00:abcd:220:200::74 : 2.58 - 2.40 - - - 3.13 - - -
fd00:abcd:220:201::74 : 3.16 - - 2.56 - - - - 2.68 -
fd00:abcd:220:202::74 : 2.46 - - - - - - 2.80 - 2.83
fping6 -b 3000 -C 20 -p 250 -q -C 10 fd00:abcd:220:1::1 fd00:abcd:220:0::1 fd00:abcd:220:200::74 fd00:abcd:220:201::74 fd00:abcd:220:202::74
fd00:abcd:220:1::1    : 3.74 4.02 3.72 3.78 4.29 3.89 3.49 3.37 3.53 3.55
fd00:abcd:220:0::1    : 3.29 3.49 3.28 3.57 3.56 3.19 3.29 3.20 3.26 3.14
fd00:abcd:220:200::74 : - - 5.17 - - 4.72 - 4.36 - -
fd00:abcd:220:201::74 : - - - - - 5.31 - - - -
fd00:abcd:220:202::74 : - - - - - - - - - -

note that sometimes the fping does succeed, depending on ping times (6 packets, going back and forth, might just succeed in 250ms if latency is around 45ms, but will fail if going higher)

The text was updated successfully, but these errors were encountered:

When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target. Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for." BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace: <TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e The bug is found by syzkaller on an internal kernel, then confirmed on upstream. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Fixes: 84c3fc4 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Zi Yan <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Cc: Florent Revest <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

A warning on driver removal started occurring after commit 9dd05df ("net: warn if NAPI instance wasn't shut down"). Disable tx napi before deleting it in mt76_dma_cleanup(). WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy) Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024 RIP: 0010:__netif_napi_del_locked+0xf0/0x100 Call Trace: <TASK> mt76_dma_cleanup+0x54/0x2f0 [mt76] mt7921_pci_remove+0xd5/0x190 [mt7921e] pci_device_remove+0x47/0xc0 device_release_driver_internal+0x19e/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xb0 __do_sys_delete_module.isra.0+0x197/0x2e0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Tested with mt7921e but the same pattern can be actually applied to other mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled in their *_dma_init() functions and only toggled off and on again inside their suspend/resume/reset paths. So it should be okay to disable tx napi in such a generic way. Found by Linux Verification Center (linuxtesting.org). Fixes: 2ac515a ("mt76: mt76x02: use napi polling for tx cleanup") Cc: [email protected] Signed-off-by: Fedor Pchelkin <[email protected]> Tested-by: Ming Yen Hsieh <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TCP / UDP / iroute interactions are SLOW #4

TCP / UDP / iroute interactions are SLOW #4

cron2 commented May 7, 2025

TCP / UDP / iroute interactions are SLOW #4

TCP / UDP / iroute interactions are SLOW #4

Comments

cron2 commented May 7, 2025