-
Notifications
You must be signed in to change notification settings - Fork 18k
log/syslog, net: goroutine permanently stuck in waitWrite() to /dev/log using a unix datagram socket #23604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This issue is highly related, if not exact #5932 (there are multiple trains of thought in that issue) |
Sorry to here you're having problems. Sadly Go 1.7 is long out of support, can upgrade to 1.9.x, or 1.10rc1 and verify if the problem is still occurring. |
I was able to reproduce just now using golang 1.10rc1 Here's the full goroutine stack trace: https://pastebin.com/xwnYJKi7 |
I modified the runtime to print certain key pollDesc events: close, evict, pollUnblock, etc. I noticed that when I restart syslog-ng and see two log lines, one close/evict/unblock group for each syslog connection, the process ends up not hanging. When I only see one group of those calls, the other connection hangs. My current theory is that something is not calling FD::Close(), which calls fd.pd.evict(), which calls runtime_pollUnblock. Still digging... |
After much more digging I'm pretty certain that we should be seeing an EPOLLHUP or EPOLLERR condition in netpoll() when the /dev/log read-side closes unexpectedly, but we're not. Since we don't wake up the waitWrite method and fd is half shutdown, the syslog write stays stuck. |
It's worth mentioning that I've reproduced this running on linux kernel versions 3.13.0 and 4.4.0 |
I have to revise my analysis. EPOLLHUP and EPOLLERR are probably not expected on a datagram socket, since it's connectionless, and I think a hang up only makes sense over a connection-oriented socket. The real event that should be observed when the reading end closes is EPOLLOUT, because it means our next write is non-blocking and will fail immediately with ECONNREFUSED. Now the question is: are we missing the delivery of an EPOLLOUT event to the pollDesc in the golang runtime, or does the EPOLLOUT never come from epoll/linux? I tried reproducing the environment in C, using a client and server: https://pastebin.com/59AT01AT client I was not able to reproduce the issue with 8 concurrent clients and a single server. Using strace and some old logging code that's no longer present, I was able to confirm that the clients are indeed bouncing between EAGAIN and epollwait (as expected) and that when the server dies, they immediately transition to ECONNREFUSED until the server comes back, at which point everything is ok again. To reiterate, in the golang reproducer, when the server dies, the clients sometimes get stuck in code that is waiting for an EPOLLOUT, and it seemingly never comes. The negative result with C suggests but does not prove that this could be an issue with the way the golang runtime is using epoll and pollDecs, as opposed to a bug in epoll itself. But a lot of digging around the internet suggests that epoll is far from perfect and could certainly be the root cause, especially when there are multiple clients connected to the same server socket. I'd appreciate any feedback on the analysis so far and would love some suggestions on where to look next in the pollDesc / netpoll code for possible bugs. |
/cc @ianlancetaylor |
has this been fixed? |
This bug is still open. |
Ran the example from above and it exited on my machine (Ubuntu 18.04 ), kernel 4.15.0. |
I think I may have been able to recreate the problem on tip with this program. package main
import (
"fmt"
"io/ioutil"
"net"
"os"
"runtime"
"sync"
"time"
)
func main() {
temp, err := ioutil.TempFile("", "test")
if err != nil {
die(err)
}
addr := temp.Name()
temp.Close()
os.Remove(addr)
go server(addr)
var wg sync.WaitGroup
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
client(addr)
}()
}
wg.Wait()
}
func server(addr string) {
for {
l, err := net.ListenPacket("unixgram", addr)
if err != nil {
die(err)
}
for i := 0; i < 20; i++ {
var buf [4096]byte
if _, _, err := l.ReadFrom(buf[:]); err != nil {
die(err)
}
}
if err := l.Close(); err != nil {
die(err)
}
os.Remove(addr)
time.Sleep(time.Millisecond)
}
}
func client(addr string) {
var c net.Conn
for i := 0; i < 500000; i++ {
for c == nil {
var err error
c, err = net.Dial("unixgram", addr)
if err != nil {
time.Sleep(time.Microsecond)
}
}
if _, err := c.Write([]byte("test line")); err != nil {
c.Close()
c = nil
}
if i%10000 == 0 {
time.Sleep(time.Millisecond)
}
}
}
func die(err error) {
_, file, line, _ := runtime.Caller(1)
fmt.Fprintf(os.Stderr, "%s:%d: %v\n", file, line, err)
os.Exit(1)
} |
I captured an strace of the program in the previous comment. Some things I see: Adding descriptor 19 to
Descriptor 19 is returned by
After the last time descriptor 19 is returned by
We never see descriptor 19 returned by Shortly before the last write to descriptor 19 I see this:
Shortly after the last write to descriptor 19 I see this:
Interestingly, the
There are numerous calls to |
The upshot of the last comment is: as far as I can tell, the kernel is failing to return Searching the web, I see a similar question at https://stackoverflow.com/questions/38441059/edge-triggered-epoll-for-unix-domain-socket . This does make me suspect that this is a kernel bug. I'm running 4.9.0-6-amd64. |
I am on a 4.15 kernel and I see similar traces. |
I suspect that this C program shows the same problem. It's hard to be really sure. I would appreciate it if others could look over this code and see if they can spot any bugs. Thanks. #include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <sys/un.h>
static char temp[11];
static void die(const char* msg, int err) {
fprintf(stderr, "%s: %s\n", msg, strerror(err));
exit(EXIT_FAILURE);
}
struct condlock {
pthread_mutex_t mutex;
pthread_cond_t cond;
int ready;
};
void condlock_init(struct condlock* cl) {
int err;
err = pthread_mutex_init(&cl->mutex, NULL);
if (err != 0) {
die("pthread_mutex_init", err);
}
err = pthread_cond_init(&cl->cond, NULL);
if (err != 0) {
die("pthread_cond_init", err);
}
cl->ready = 0;
}
void condlock_signal(struct condlock* cl) {
int err;
err = pthread_mutex_lock(&cl->mutex);
if (err != 0) {
die("pthread_mutex_lock", err);
}
cl->ready = 1;
err = pthread_cond_signal(&cl->cond);
if (err != 0) {
die("pthread_cond_signal", err);
}
err = pthread_mutex_unlock(&cl->mutex);
if (err != 0) {
die("pthread_mutex_unlock", err);
}
}
void condlock_wait(struct condlock* cl) {
int err;
struct timespec ts;
/* If everything worked as expected, the poller would always report
that a descriptor is ready and call condlock_signal. We give it
20 seconds to do so. If it fails, we are seeing a bug. */
memset(&ts, 0, sizeof ts);
err = pthread_mutex_lock(&cl->mutex);
if (err != 0) {
die("pthread_mutex_lock", err);
}
if (!cl->ready) {
ts.tv_sec = time(NULL) + 20;
err = pthread_cond_timedwait(&cl->cond, &cl->mutex, &ts);
if (err != 0) {
if (err == ETIMEDOUT) {
fprintf(stderr, "waited more than 20 seconds for descriptor to be ready\n");
exit(EXIT_FAILURE);
} else {
die("pthread_cond_timedwait", err);
}
}
}
cl->ready = 0;
err = pthread_mutex_unlock(&cl->mutex);
if (err != 0) {
die("pthread_mutex_unlock", err);
}
}
static int pollfd;
static void poll_init() {
int d;
d = epoll_create(20);
if (d < 0) {
die("epoll_create", errno);
}
pollfd = d;
}
static void poll_add(int d, struct condlock *cond) {
struct epoll_event ev;
memset(&ev, 0, sizeof ev);
ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
ev.data.u64 = (uint64_t)(uintptr_t)(void*)(cond);
if (epoll_ctl(pollfd, EPOLL_CTL_ADD, d, &ev) < 0) {
die("epoll_ctl add", errno);
}
}
static void poll_del(int d) {
if (epoll_ctl(pollfd, EPOLL_CTL_DEL, d, NULL) < 0) {
die("epoll_ctl del", errno);
}
}
static void *poller(void *arg __attribute__((unused))) {
int c;
struct epoll_event events[128];
int i;
struct condlock *cl;
while (1) {
c = epoll_wait(pollfd, events, 128, -1);
for (i = 0; i < c; i++) {
cl = (struct condlock*)(void*)(uintptr_t)(events[i].data.u64);
condlock_signal(cl);
}
}
abort(); /* unreachable */
}
static void *server(void *arg __attribute__((unused))) {
struct sockaddr_un sun;
struct condlock cl;
int s;
int i;
char buf[4096];
memset(&sun, 0, sizeof sun);
sun.sun_family = AF_UNIX;
memcpy(sun.sun_path, temp, sizeof temp);
condlock_init(&cl);
while (1) {
s = socket(AF_UNIX, SOCK_DGRAM | SOCK_NONBLOCK, 0);
if (s < 0) {
die("socket", errno);
}
if (bind(s, (struct sockaddr*)(&sun), sizeof sun) < 0) {
die("bind", errno);
}
poll_add(s, &cl);
for (i = 0; i < 20; i++) {
if (recv(s, buf, sizeof buf, 0) < 0) {
if (errno != EAGAIN && errno != EWOULDBLOCK) {
die("recv", errno);
}
condlock_wait(&cl);
}
}
poll_del(s);
if (close(s) < 0) {
die("close", errno);
}
if (remove(temp) < 0) {
die("remove", errno);
}
usleep(1000); /* one millisecond */
}
abort(); /* unreachable */
}
static void *client(void *arg __attribute__((unused))) {
const char * const msg = "test line";
struct sockaddr_un sun;
struct condlock cl;
int s;
int i;
int flags;
memset(&sun, 0, sizeof sun);
sun.sun_family = AF_UNIX;
memcpy(sun.sun_path, temp, sizeof temp);
condlock_init(&cl);
s = -1;
for (i = 0; i < 500000; i++) {
while (s < 0) {
s = socket(AF_UNIX, SOCK_DGRAM, 0);
if (s < 0) {
die("socket", errno);
}
if (connect(s, (struct sockaddr *)(&sun), sizeof sun) < 0) {
if (errno != ECONNREFUSED && errno != ENOENT) {
die("connect", errno);
}
if (close(s) < 0) {
die("close", errno);
}
s = -1;
usleep(1); /* one microsecond */
} else {
flags = fcntl(s, F_GETFL, 0);
if (flags < 0) {
die("fcntl", errno);
}
if (fcntl(s, F_SETFL, flags | O_NONBLOCK) < 0) {
die("fcntl", errno);
}
poll_add(s, &cl);
}
}
if (write(s, msg, strlen(msg)) < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
condlock_wait(&cl);
} else if (errno == ECONNREFUSED) {
poll_del(s);
if (close(s) < 0) {
die("close", errno);
}
s = -1;
} else {
die("write", errno);
}
}
if (i % 10000 == 0) {
usleep(1000); /* one millisecond */
}
}
if (s > 0) {
poll_del(s);
if (close(s) < 0) {
die("close", errno);
}
}
return NULL;
}
#define CLIENT_COUNT (20)
int main() {
int d;
pthread_attr_t attr;
int err;
pthread_t tid;
int i;
pthread_t clients[CLIENT_COUNT];
memcpy(temp, "testXXXXXX", sizeof temp);
d = mkstemp(temp);
if (d < 0) {
die("mkstemp", errno);
}
close(d);
remove(temp);
err = pthread_attr_init(&attr);
if (err != 0) {
die("pthread_attr_init", err);
}
err = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
if (err != 0) {
die("pthread_attr_setdetachstate", err);
}
poll_init();
err = pthread_create(&tid, &attr, poller, NULL);
if (err != 0) {
die("pthread_create", err);
}
err = pthread_create(&tid, &attr, server, NULL);
if (err != 0) {
die("pthread_create", err);
}
for (i = 0; i < CLIENT_COUNT; i++) {
err = pthread_create(&clients[i], NULL, client, NULL);
if (err != 0) {
die("pthread_create", err);
}
}
for (i = 0; i < CLIENT_COUNT; i++) {
err = pthread_join(clients[i], NULL);
if (err != 0) {
die("pthread_join", err);
}
}
return 0;
} |
The above looks correct and does fail the same way when it does fail. I did happen to have the C version succeed once but only once. The one thing that bothers me is that my reading of all the various issues that people have had with EPOLL always comes down to how file descriptors are managed and how you can get into trouble. The client never really closes its connection unless there is some error. Whether that is contributing to the issue, I really don't know. I have been playing with the code but nothing seems to fix the issue or increase the percentage of success. |
Sent bug report to netdev mailing list: https://www.spinics.net/lists/netdev/msg509646.html . |
Reply on netdev maililng list, with kernel patch: https://www.spinics.net/lists/netdev/msg512686.html |
Closing this issue since there it doesn't seem to be a Go bug. |
Writes to syslog using a unix datagram socket (the default behavior on Ubuntu 14.04, probably more) may become stuck indefinitely if the syslog daemon closes /dev/log.
The following reproduces the issue. This code is derived from a reproducer found in the comments section of #5932 - https://play.golang.org/p/vp_e6n8VJuX
To reproduce:
sudo service syslog-ng restart
suffices.https://pastebin.com/GuV5JZDS
We first observed this bad behavior in a production environment last week during a DNS outage. The outage prevented our syslog-ng daemon from reloading configuration properly, since it exits uncleanly on configuration reload when DNS is unavailable. We then observed a large portion of production boxes hung completely, with one goroutine waiting on syslog and the rest waiting to acquire the log package mutex. I believe the syslog-ng daemon restart caused the bad behavior to be exposed in the syslog and/or net packages. Our syslog write volume is fairly high.
After that, I reproduced the issue locally using the above reproducer. I think the badness has to do with missing a socket close event, or some other invalid state transition in the net code.
Goroutine stack after killing the stuck process with -SIGQUIT: https://pastebin.com/GuV5JZDS
Environment:
$ go version
go version go1.7 linux/amd64
$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jesmet/.gvm/pkgsets/go1.7/global"
GORACE=""
GOROOT="/home/jesmet/.gvm/gos/go1.7"
GOTOOLDIR="/home/jesmet/.gvm/gos/go1.7/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build126114309=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
The text was updated successfully, but these errors were encountered: