Closed
Description
What version of Go are you using (go version
)?
go1.15
What operating system and processor architecture are you using (go env
)?
RaspberriPi Compute Module 3+, 4.19.88 #1 SMP Fri Jul 17 09:42:11 UTC 2020 armv7l GNU/Linux.
Additionally, the process runs within a Docker container.
What did you do?
Internet connection broke and then recovered.
tls.Conn.Read() stuck in runtime_pollWait.
1 @ 0x48608 0x412e8 0x75ffc 0xde6a8 0xdf670 0xdf655 0x1d1ab0 0x1e24e8 0x21a5ec 0x10f724 0x21a834 0x2180a4 0x21d074 0x21d07d 0xda238 0x4cd528 0x4cd4fd 0x4e54e8 0x7ad2c
--
| # 0x75ffb internal/poll.runtime_pollWait+0x43 runtime/netpoll.go:220 | | # 0x75ffb internal/poll.runtime_pollWait+0x43 runtime/netpoll.go:220
| # 0xde6a7 internal/poll.(*pollDesc).wait+0x2f internal/poll/fd_poll_runtime.go:87 | | # 0xde6a7 internal/poll.(*pollDesc).wait+0x2f internal/poll/fd_poll_runtime.go:87
| # 0xdf66f internal/poll.(*pollDesc).waitRead+0x17b internal/poll/fd_poll_runtime.go:92 | | # 0xdf66f internal/poll.(*pollDesc).waitRead+0x17b internal/poll/fd_poll_runtime.go:92
| # 0xdf654 internal/poll.(*FD).Read+0x160 internal/poll/fd_unix.go:159 | | # 0xdf654 internal/poll.(*FD).Read+0x160 internal/poll/fd_unix.go:159
| # 0x1d1aaf net.(*netFD).Read+0x37 net/fd_posix.go:55 | | # 0x1d1aaf net.(*netFD).Read+0x37 net/fd_posix.go:55
| # 0x1e24e7 net.(*conn).Read+0x63 net/net.go:182 | | # 0x1e24e7 net.(*conn).Read+0x63 net/net.go:182
| # 0x21a5eb crypto/tls.(*atLeastReader).Read+0x77 crypto/tls/conn.go:779 | | # 0x21a5eb crypto/tls.(*atLeastReader).Read+0x77 crypto/tls/conn.go:779
| # 0x10f723 bytes.(*Buffer).ReadFrom+0xa3 bytes/buffer.go:204 | | # 0x10f723 bytes.(*Buffer).ReadFrom+0xa3 bytes/buffer.go:204
| # 0x21a833 crypto/tls.(*Conn).readFromUntil+0xc3 crypto/tls/conn.go:801 | | # 0x21a833 crypto/tls.(*Conn).readFromUntil+0xc3 crypto/tls/conn.go:801
| # 0x2180a3 crypto/tls.(*Conn).readRecordOrCCS+0xfb crypto/tls/conn.go:608 | | # 0x2180a3 crypto/tls.(*Conn).readRecordOrCCS+0xfb crypto/tls/conn.go:608
| # 0x21d073 crypto/tls.(*Conn).readRecord+0x14f crypto/tls/conn.go:576 | | # 0x21d073 crypto/tls.(*Conn).readRecord+0x14f crypto/tls/conn.go:576
| # 0x21d07c crypto/tls.(*Conn).Read+0x158 crypto/tls/conn.go:1252 | | # 0x21d07c crypto/tls.(*Conn).Read+0x158 crypto/tls/conn.go:1252
| # 0xda237 io.ReadAtLeast+0x6b io/io.go:314 | | # 0xda237 io.ReadAtLeast+0x6b io/io.go:314
| # 0x4cd527 io.ReadFull+0x67 io/io.go:333 | | # 0x4cd527 io.ReadFull+0x67 io/io.go:333
| # 0x4cd4fc github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105 | | # 0x4cd4fc github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105
| # 0x4e54e7 github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7 github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132 | | # 0x4e54e7 github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7 github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132
Might be related to #27752
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
davecheney commentedon Sep 22, 2020
This is expected if a timeout has not been set on the connection. Has a timeout been set before calling Read?
leventov commentedon Sep 22, 2020
So there should probably be a
SetReadDeadline()
call before this line?https://github.com/eclipse/paho.mqtt.golang/blob/ba85050a1f239f4e954dc95920213db51f937df1/net.go#L119
Still, I would expect that a read call (even untimed) would error with "internet disconnected" on internet disconnection, or would unstuck again when the internet connection has recovered, but not just stuck.
davecheney commentedon Sep 22, 2020
Yup, if it’s important, it needs a timeout.
If the operating system has not signalled that the tcp connection has been closed or reset, there’s not much the runtime can do from user space.
leventov commentedon Sep 22, 2020
So you think this is a kernel/Docker problem that it doesn't close the socket on internet disconnection, or no one's problem at all?
The runtime could probably detect the internet disconnection event and fail all outstanding Reads.
davecheney commentedon Sep 22, 2020
the network fd is handled by epoll (on linux) and if there is no event received from the kernel, there's nothing the runtime can do.
networkimprov commentedon Sep 23, 2020
See also #31490 re TCP keepalive problems.
TCP keepalive is on by default for both client and server net.Conn's
[-]tls.Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-][+]tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+][-]tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-][+]crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+]cagedmantis commentedon Sep 28, 2020
/cc @FiloSottile
[-]crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/-][+]net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery[/+]FiloSottile commentedon Oct 5, 2020
Doesn't look like a crypto/tls specific issue, please tag me back in if I'm wrong.
ianlancetaylor commentedon Oct 5, 2020
I don't think there is anything we can change in the Go standard library here, so I'm going to close the issue.
Please comment if you disagree.