Skip to content

eth0: hw csum failure #1371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dtaht opened this issue Mar 24, 2016 · 16 comments
Closed

eth0: hw csum failure #1371

dtaht opened this issue Mar 24, 2016 · 16 comments

Comments

@dtaht
Copy link
Contributor

dtaht commented Mar 24, 2016

In testing babeld on this platform, I see tons of these errors in the kernel when handling some ipv6 packets.

[170772.004583] : hw csum failure
[170772.004605] CPU: 0 PID: 1415 Comm: babeld Tainted: G W 4.1.19-v7+ #853
[170772.004614] Hardware name: BCM2709
[170772.004646] <800185e0> from <80013f48>
[170772.004670] <80013f48> from <80572fac>
[170772.004695] <80572fac> from <80497444>
[170772.004718] <80497444> from <8048c6d0>
[170772.004790] <8048c6d0> from [<7f01f048>](udpv6_recvmsg+0x11c/0x7cc [ipv6])
[170772.004857] [<7f01f048>](udpv6_recvmsg [ipv6]) from <80509d18>
[170772.004882] <80509d18> from <8047c3dc>
[170772.004903] <8047c3dc> from <8047e274>
[170772.004921] <8047e274> from <8047f0f8>
[170772.004939] <8047f0f8> from <8047f140>
[170772.004959] <8047f140> from <8000fa20>
[170773.053596] eth0: hw csum failure

@dtaht
Copy link
Contributor Author

dtaht commented Mar 24, 2016

Hmm. github ate that kernel log. Let's try.

[170772.004583] <unknown>: hw csum failure
[170772.004605] CPU: 0 PID: 1415 Comm: babeld Tainted: G        W       4.1.19-v7+ #853
[170772.004614] Hardware name: BCM2709
[170772.004646] [<800185e0>] - (unwind_backtrace) from [<80013f48>] (show_stack+0x20/0x24)
[170772.004670] [<80013f48>] - (show_stack) from [<80572fac>] (dump_stack+0xd4/0x118)
[170772.004695] [<80572fac>] - (dump_stack) from [<80497444>] (netdev_rx_csum_fault+0x44/0x48)
[170772.004718] [<80497444>] - (netdev_rx_csum_fault) from [<8048c6d0>] (skb_copy_and_csum_datagram_msg+0xdc/0xe8)
[170772.004790] [<8048c6d0>] -  (skb_copy_and_csum_datagram_msg) from [<7f01f048>] (udpv6_recvmsg+0x11c/0x7cc [ipv6])
[170772.004857] [<7f01f048>] - (udpv6_recvmsg [ipv6]) from [<80509d18>] (inet_recvmsg+0xa4/0xb8)
[170772.004882] [<80509d18>] -(inet_recvmsg) from [<8047c3dc>] (sock_recvmsg+0x20/0x24)
[170772.004903] [<8047c3dc>] - (sock_recvmsg) from [<8047e274>] (___sys_recvmsg+0xa4/0x12c)
[170772.004921] [<8047e274>] - (___sys_recvmsg) from [<8047f0f8>] (__sys_recvmsg+0x4c/0x7c)
[170772.004939] [<8047f0f8>] - (__sys_recvmsg) from [<8047f140>] (SyS_recvmsg+0x18/0x1c)
[170772.004959] [<8047f140>] - (SyS_recvmsg) from [<8000fa20>] (ret_fast_syscall+0x0/0x54)
[170773.053596] eth0: hw csum failure

To reproduce, install babeld on two systems, give them the same essid and a different ip address

iwconfig wlan0 mode ad-hoc essid babel
iwconfig wlan0 channel 6
ifconfig wlan0 172.26.17.230 netmask 255.255.255.255
ifconfig wlan0 up

@dtaht
Copy link
Contributor Author

dtaht commented Mar 27, 2016

I can clarify this somewhat - this is only a bug in the rpi3, (tested a rpi2) and appears to be more generic to ipv6 support than just babeld.

@dtaht
Copy link
Contributor Author

dtaht commented Mar 28, 2016

here is a much more complete log: http://www.taht.net/~d/rpi3_bad_ipv6_issues.dmesg

@pelwell
Copy link
Contributor

pelwell commented Mar 28, 2016

Duplicate of #1083?

@dtaht
Copy link
Contributor Author

dtaht commented Mar 31, 2016

Could be, can I suck down a rpi3 kernel from "Devel" wherever that is?

On 3/28/16 11:24 AM, Phil Elwell wrote:

Duplicate of #1083 #1083?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1371 (comment)

@popcornmix
Copy link
Collaborator

@pelwell I have the 3 suggested commits in my local tree and they build.
I can't be sure they fix the issue. Shall I push them and include in BRANCH=next build?

@pelwell
Copy link
Contributor

pelwell commented Mar 31, 2016

Yes - go for it.

popcornmix added a commit to raspberrypi/firmware that referenced this issue Mar 31, 2016
kernel:  bcm2835-sdhost: Precalc divisors and overclocks

kernel: cherry-pick upstream fixes for eth0 hw csum failures
See: raspberrypi/linux#1371
See: raspberrypi/linux#1083

kernel: Add configs and overlay for PCA9548 I2C mux

kernel: BCM270X_DT: Add DS1339 to i2c-rtc overlay
@popcornmix
Copy link
Collaborator

Latest sudo BRANCH=next rpi-update firmware cherry-picks some upstream commits that may fix this issue. Can you update and test?

popcornmix added a commit to Hexxeh/rpi-firmware that referenced this issue Mar 31, 2016
kernel:  bcm2835-sdhost: Precalc divisors and overclocks

kernel: cherry-pick upstream fixes for eth0 hw csum failures
See: raspberrypi/linux#1371
See: raspberrypi/linux#1083

kernel: Add configs and overlay for PCA9548 I2C mux

kernel: BCM270X_DT: Add DS1339 to i2c-rtc overlay
@dtaht
Copy link
Contributor Author

dtaht commented Mar 31, 2016

ah. "firmware" was not part of the command line. Downloading....

@dtaht
Copy link
Contributor Author

dtaht commented Mar 31, 2016

whilst I'm making feature requests please also see #1370

@dtaht
Copy link
Contributor Author

dtaht commented Mar 31, 2016

Nope. Here's some big logs for you.

http://www.taht.net/~d/rpi3-4.4.6-v7+.dmesg
http://www.taht.net/~d/rpi3-4.4.6-v7+.syslog (this one is still uploading, it's claiming 25 minutes more - running at 50KB/sec on a link capable of 10x that for some reason)

It's very cool you are getting up to 4.4 though. :)

@dtaht
Copy link
Contributor Author

dtaht commented Apr 1, 2016

I have poked into this a bit harder. It only happens when a babel multicast udp packet is on the eth0 interface on the rpi3, so I suspect some driver offload there is causing the error. Works on rpi2, works on wlan interfaces. I will try to find some other tool to generate multicast udp packet to see if it is generic.

@popcornmix
Copy link
Collaborator

Assuming you are not using wifi, I can't think of any reason Pi3 will behave differently to Pi2.
Can you confirm with the same sdcard, that Pi3 has csum errors, and Pi2 does not?

@dtaht
Copy link
Contributor Author

dtaht commented Apr 6, 2016

ok, I will swap cars and change my networks around a little.

@dtaht
Copy link
Contributor Author

dtaht commented Apr 6, 2016

Narrowing it down still further. It is merely the presence of a babel's multicast udp packet on the ethernet wire causing the issue on the rpi3. No daemon is required to be running on rpi to trigger the kernel messages. I switched cables with the correctly working rpi2, also.

I continue to look for other things that do multicast udp for on ipv6 to trigger this. I would have thought ra, for example, might do it. It doesn't.

Off to try this exact kernel and filesystem from the rpi2 to the rpi3...

This is the earliest occurance in the bootlog from the rpi3

Linux pi3 4.1.21-v7+ #872

[   33.330612] eth0: hw csum failure
[   33.330627] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.21-v7+ #872
[   33.330632] Hardware name: BCM2709
[   33.330656] [<800185d0>] (unwind_backtrace) from [<80013f38>] (show_stack+0x20/0x24)
[   33.330669] [<80013f38>] (show_stack) from [<80573120>] (dump_stack+0xd4/0x118)
[   33.330684] [<80573120>] (dump_stack) from [<80497544>] (netdev_rx_csum_fault+0x44/0x48)
[   33.330696] [<80497544>] (netdev_rx_csum_fault) from [<8048c410>] (__skb_checksum_complete+0xb4/0xb8)
[   33.330707] [<8048c410>] (__skb_checksum_complete) from [<8053ace4>] (udp6_csum_init+0x1cc/0x218)
[   33.330753] [<8053ace4>] (udp6_csum_init) from [<7f021ddc>] (__udp6_lib_rcv+0x274/0x4b8 [ipv6])
[   33.330807] [<7f021ddc>] (__udp6_lib_rcv [ipv6]) from [<7f01e904>] (udpv6_rcv+0x1c/0x20 [ipv6])
[   33.330851] [<7f01e904>] (udpv6_rcv [ipv6]) from [<7f006a8c>] (ip6_input_finish+0x188/0x5d8 [ipv6])
[   33.330886] [<7f006a8c>] (ip6_input_finish [ipv6]) from [<7f0075c8>] (ip6_input+0x30/0x84 [ipv6])
[   33.330921] [<7f0075c8>] (ip6_input [ipv6]) from [<7f006884>] (ip6_rcv_finish+0x44/0xc4 [ipv6])
[   33.330957] [<7f006884>] (ip6_rcv_finish [ipv6]) from [<7f007368>] (ipv6_rcv+0x48c/0x6bc [ipv6])
[   33.330981] [<7f007368>] (ipv6_rcv [ipv6]) from [<80495394>] (__netif_receive_skb_core+0x694/0xa40)
[   33.330994] [<80495394>] (__netif_receive_skb_core) from [<80497604>] (__netif_receive_skb+0x20/0x7c)
[   33.331006] [<80497604>] (__netif_receive_skb) from [<8049768c>] (netif_receive_skb_internal+0x2c/0xa4)
[   33.331017] [<8049768c>] (netif_receive_skb_internal) from [<80497728>] (netif_receive_skb_sk+0x24/0x9c)
[   33.331029] [<80497728>] (netif_receive_skb_sk) from [<7f49e2f8>] (ri_tasklet+0xec/0x28c [ifb])
[   33.331050] [<7f49e2f8>] (ri_tasklet [ifb]) from [<8002b928>] (tasklet_action+0x74/0x10c)
[   33.331060] [<8002b928>] (tasklet_action) from [<8002adc4>] (__do_softirq+0x1a0/0x3e0)
[   33.331069] [<8002adc4>] (__do_softirq) from [<8002b048>] (run_ksoftirqd+0x44/0x6c)
[   33.331080] [<8002b048>] (run_ksoftirqd) from [<800475ac>] (smpboot_thread_fn+0x124/0x198)
[   33.331090] [<800475ac>] (smpboot_thread_fn) from [<80043bcc>] (kthread+0xec/0x104)
[   33.331101] [<80043bcc>] (kthread) from [<8000faf8>] (ret_from_fork+0x14/0x3c)

@dtaht
Copy link
Contributor Author

dtaht commented Apr 6, 2016

I swapped out power supplies. This problem went away. !@#@! Thx for taking a look at it. Why it would only show up on ipv6 udp is totally beyond me.

@dtaht dtaht closed this as completed Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants