Skip to content

TCP connections stall, then kernel hangs on reboot request #2709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scottmayo opened this issue Oct 9, 2018 · 6 comments
Closed

TCP connections stall, then kernel hangs on reboot request #2709

scottmayo opened this issue Oct 9, 2018 · 6 comments
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.

Comments

@scottmayo
Copy link

Raspberry Pi 3B+,
Linux lachesis 4.14.50-v7+ #1122 SMP Tue Jun 19 12:26:26 BST 2018 armv7l

This node is part of a cluster of processors (not all pi's) that open TCP connections to each other, some transiently and some long term (potentially months). There are perhaps a dozen or less connections at any one time. Traffic over them is light - writes amount to a dozen bytes to a couple hundred at most, generally every few seconds. wget is sometimes used to pull in a few hundred thousand bytes, rate limited and at slow intervals. One socket streams music on occasion and sends much more traffic. Bottom line, there's nothing challenging going on, and load averages are usually below 0.10 even when streaming.

Randomly, after some number of days, one (maybe more) TCP connection freezes - no data transferred. From that point on, other sockets may or may not work; I generally can't get an ssh session into the pi, but I can sometimes request a reboot over an existing one. The application that handles the reboot request definitely receives it because it shuts itself down, but the pi has to be power cycled to be recovered.

There's no obvious rhyme or reason to what socket(s) fail, for example it doesn't seem to happen more often when streaming music. I suspect a race condition in the TCP stack but I have no evidence. I can go several weeks without an issue.

I'm experienced enough with TCP to know I'm not doing anything wonky with the sockets - other than turning Nagle off on some, this is very vanilla code. I have not seen similar behaviour on my other pies (which are not 3B+) so I wonder if there's a multiprocessor issue.

Major issues for me; people indirectly using these pi's don't know how to reboot them and have no direct interaction with them except "things in the house stop working."

@pelwell
Copy link
Contributor

pelwell commented Oct 10, 2018

There are have been several improvements to the LAN7800 driver since the kernel version you are running. Please run sudo rpi-update to get the latest version and see if the connections are more reliable.

@JamesH65
Copy link
Contributor

@scottmayo Did updating to the latest driver fix this issue?

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

@JamesH65 JamesH65 added Close within 30 days Issue will be closed within 30 days unless requested to stay open Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. labels Jul 31, 2019
@scottmayo
Copy link
Author

scottmayo commented Jul 31, 2019 via email

@JamesH65
Copy link
Contributor

You should never update a critical system without testing on a offline one first, which is what I would suggest in this case. Testing with the current release will be sufficient, no need to use rpi-update.

@JamesH65
Copy link
Contributor

@scottmayo Any test results?

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

@JamesH65
Copy link
Contributor

Closing due to lack of activity. Please request to be reopened if you feel this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Close within 30 days Issue will be closed within 30 days unless requested to stay open Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.
Projects
None yet
Development

No branches or pull requests

3 participants