Skip to content

net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery #41549

Closed
@leventov

Description

@leventov

What version of Go are you using (go version)?

go1.15

What operating system and processor architecture are you using (go env)?

RaspberriPi Compute Module 3+, 4.19.88 #1 SMP Fri Jul 17 09:42:11 UTC 2020 armv7l GNU/Linux.

Additionally, the process runs within a Docker container.

What did you do?

Internet connection broke and then recovered.

tls.Conn.Read() stuck in runtime_pollWait.

1 @ 0x48608 0x412e8 0x75ffc 0xde6a8 0xdf670 0xdf655 0x1d1ab0 0x1e24e8 0x21a5ec 0x10f724 0x21a834 0x2180a4 0x21d074 0x21d07d 0xda238 0x4cd528 0x4cd4fd 0x4e54e8 0x7ad2c
--
  | #	0x75ffb		internal/poll.runtime_pollWait+0x43				runtime/netpoll.go:220 |   | #	0x75ffb		internal/poll.runtime_pollWait+0x43				runtime/netpoll.go:220
  | #	0xde6a7		internal/poll.(*pollDesc).wait+0x2f				internal/poll/fd_poll_runtime.go:87 |   | #	0xde6a7		internal/poll.(*pollDesc).wait+0x2f				internal/poll/fd_poll_runtime.go:87
  | #	0xdf66f		internal/poll.(*pollDesc).waitRead+0x17b			internal/poll/fd_poll_runtime.go:92 |   | #	0xdf66f		internal/poll.(*pollDesc).waitRead+0x17b			internal/poll/fd_poll_runtime.go:92
  | #	0xdf654		internal/poll.(*FD).Read+0x160					internal/poll/fd_unix.go:159 |   | #	0xdf654		internal/poll.(*FD).Read+0x160					internal/poll/fd_unix.go:159
  | #	0x1d1aaf	net.(*netFD).Read+0x37						net/fd_posix.go:55 |   | #	0x1d1aaf	net.(*netFD).Read+0x37						net/fd_posix.go:55
  | #	0x1e24e7	net.(*conn).Read+0x63						net/net.go:182 |   | #	0x1e24e7	net.(*conn).Read+0x63						net/net.go:182
  | #	0x21a5eb	crypto/tls.(*atLeastReader).Read+0x77				crypto/tls/conn.go:779 |   | #	0x21a5eb	crypto/tls.(*atLeastReader).Read+0x77				crypto/tls/conn.go:779
  | #	0x10f723	bytes.(*Buffer).ReadFrom+0xa3					bytes/buffer.go:204 |   | #	0x10f723	bytes.(*Buffer).ReadFrom+0xa3					bytes/buffer.go:204
  | #	0x21a833	crypto/tls.(*Conn).readFromUntil+0xc3				crypto/tls/conn.go:801 |   | #	0x21a833	crypto/tls.(*Conn).readFromUntil+0xc3				crypto/tls/conn.go:801
  | #	0x2180a3	crypto/tls.(*Conn).readRecordOrCCS+0xfb				crypto/tls/conn.go:608 |   | #	0x2180a3	crypto/tls.(*Conn).readRecordOrCCS+0xfb				crypto/tls/conn.go:608
  | #	0x21d073	crypto/tls.(*Conn).readRecord+0x14f				crypto/tls/conn.go:576 |   | #	0x21d073	crypto/tls.(*Conn).readRecord+0x14f				crypto/tls/conn.go:576
  | #	0x21d07c	crypto/tls.(*Conn).Read+0x158					crypto/tls/conn.go:1252 |   | #	0x21d07c	crypto/tls.(*Conn).Read+0x158					crypto/tls/conn.go:1252
  | #	0xda237		io.ReadAtLeast+0x6b						io/io.go:314 |   | #	0xda237		io.ReadAtLeast+0x6b						io/io.go:314
  | #	0x4cd527	io.ReadFull+0x67						io/io.go:333 |   | #	0x4cd527	io.ReadFull+0x67						io/io.go:333
  | #	0x4cd4fc	github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c	github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105 |   | #	0x4cd4fc	github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c	github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105
  | #	0x4e54e7	github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7		github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132 |   | #	0x4e54e7	github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7		github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132

Might be related to #27752

Activity

davecheney

davecheney commented on Sep 22, 2020

@davecheney
Contributor

This is expected if a timeout has not been set on the connection. Has a timeout been set before calling Read?

leventov

leventov commented on Sep 22, 2020

@leventov
Author

So there should probably be a SetReadDeadline() call before this line?
https://github.com/eclipse/paho.mqtt.golang/blob/ba85050a1f239f4e954dc95920213db51f937df1/net.go#L119

Still, I would expect that a read call (even untimed) would error with "internet disconnected" on internet disconnection, or would unstuck again when the internet connection has recovered, but not just stuck.

davecheney

davecheney commented on Sep 22, 2020

@davecheney
Contributor

Yup, if it’s important, it needs a timeout.

Still, I would expect that a read call (even untimed) would error with "internet disconnected" on internet disconnection, or would unstuck again when the internet connection has recovered, but not just stuck

If the operating system has not signalled that the tcp connection has been closed or reset, there’s not much the runtime can do from user space.

leventov

leventov commented on Sep 22, 2020

@leventov
Author

So you think this is a kernel/Docker problem that it doesn't close the socket on internet disconnection, or no one's problem at all?

The runtime could probably detect the internet disconnection event and fail all outstanding Reads.

davecheney

davecheney commented on Sep 22, 2020

@davecheney
Contributor

The runtime could probably detect the internet disconnection event and fail all outstanding Reads.

the network fd is handled by epoll (on linux) and if there is no event received from the kernel, there's nothing the runtime can do.

networkimprov

networkimprov commented on Sep 23, 2020

@networkimprov

See also #31490 re TCP keepalive problems.

TCP keepalive is on by default for both client and server net.Conn's

changed the title [-]tls.Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-] [+]tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+] on Sep 28, 2020
changed the title [-]tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-] [+]crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+] on Sep 28, 2020
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Sep 28, 2020
added this to the Backlog milestone on Sep 28, 2020
cagedmantis

cagedmantis commented on Sep 28, 2020

@cagedmantis
Contributor
changed the title [-]crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-] [+]net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+] on Oct 5, 2020
FiloSottile

FiloSottile commented on Oct 5, 2020

@FiloSottile
Contributor

Doesn't look like a crypto/tls specific issue, please tag me back in if I'm wrong.

ianlancetaylor

ianlancetaylor commented on Oct 5, 2020

@ianlancetaylor
Contributor

I don't think there is anything we can change in the Go standard library here, so I'm going to close the issue.

Please comment if you disagree.

locked and limited conversation to collaborators on Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @davecheney@cagedmantis@networkimprov@leventov@FiloSottile

        Issue actions

          net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery · Issue #41549 · golang/go