Skip to content

[lldb] Unify/improve MainLoop signal handling #115197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 18, 2024
Merged

[lldb] Unify/improve MainLoop signal handling #115197

merged 3 commits into from
Nov 18, 2024

Conversation

labath
Copy link
Collaborator

@labath labath commented Nov 6, 2024

Change the signal handler to use a pipe to notify about incoming signals. This has two benefits:

  • the signal no longer has to happen on the MainLoop thread. With the previous implementation, this had to be the case as that was the only way to ensure that ppoll gets interrupted correctly. In a multithreaded process, this would mean that all other threads have to have the signal blocked at all times.
  • we don't need the android-specific workaround, which was necessary due to the syscall being implemented in a non-atomic way

When the MainLoop class was first implemented, we did not have the interrupt self-pipe, so syscall interruption was the most straight-forward implementation. Over time, the class gained new abilities (the pipe being one of them), so we can now piggy-back on those.

This patch also changes the kevent-based implementation to use the pipe for signal notification as well. The motivation there is slightly different:

  • it makes the implementations more uniform
  • it makes sure we handle all kinds of signals, like we do with the linux version (EVFILT_SIGNAL only catches process-directed signals)

Change the signal handler to use a pipe to notify about incoming
signals. This has two benefits:
- the signal no longer has to happen on the MainLoop thread. With the
  previous implementation, this had to be the case as that was the only
  way to ensure that ppoll gets interrupted correctly. In a
  multithreaded process, this would mean that all other threads have
  to have the signal blocked at all times.
- we don't need the android-specific workaround, which was necessary due
  to the syscall being implemented in a non-atomic way

When the MainLoop class was first implemented, we did not have the
interrupt self-pipe, so syscall interruption was the most
straight-forward implementation. Over time, the class gained new
abilities (the pipe being one of them), so we can now piggy-back on
those.

This patch also changes the kevent-based implementation to use the pipe
for signal notification as well. The motivation there is slightly
different:
- it makes the implementations more uniform
- it makes sure we handle all kinds of signals, like we do with the
  linux version (EVFILT_SIGNAL only catches process-directed signals)
@labath labath requested review from mgorny and Jlalond November 6, 2024 19:13
@labath labath requested a review from JDevlieghere as a code owner November 6, 2024 19:13
@llvmbot llvmbot added the lldb label Nov 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Change the signal handler to use a pipe to notify about incoming signals. This has two benefits:

  • the signal no longer has to happen on the MainLoop thread. With the previous implementation, this had to be the case as that was the only way to ensure that ppoll gets interrupted correctly. In a multithreaded process, this would mean that all other threads have to have the signal blocked at all times.
  • we don't need the android-specific workaround, which was necessary due to the syscall being implemented in a non-atomic way

When the MainLoop class was first implemented, we did not have the interrupt self-pipe, so syscall interruption was the most straight-forward implementation. Over time, the class gained new abilities (the pipe being one of them), so we can now piggy-back on those.

This patch also changes the kevent-based implementation to use the pipe for signal notification as well. The motivation there is slightly different:

  • it makes the implementations more uniform
  • it makes sure we handle all kinds of signals, like we do with the linux version (EVFILT_SIGNAL only catches process-directed signals)

Full diff: https://github.com/llvm/llvm-project/pull/115197.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Host/posix/MainLoopPosix.h (+1)
  • (modified) lldb/source/Host/posix/MainLoopPosix.cpp (+63-119)
  • (modified) lldb/unittests/Host/MainLoopTest.cpp (+11)
diff --git a/lldb/include/lldb/Host/posix/MainLoopPosix.h b/lldb/include/lldb/Host/posix/MainLoopPosix.h
index 07497b7b8c259a..1988dde7c65aee 100644
--- a/lldb/include/lldb/Host/posix/MainLoopPosix.h
+++ b/lldb/include/lldb/Host/posix/MainLoopPosix.h
@@ -59,6 +59,7 @@ class MainLoopPosix : public MainLoopBase {
 private:
   void ProcessReadObject(IOObject::WaitableHandle handle);
   void ProcessSignal(int signo);
+  void ProcessSignals();
 
   class SignalHandle {
   public:
diff --git a/lldb/source/Host/posix/MainLoopPosix.cpp b/lldb/source/Host/posix/MainLoopPosix.cpp
index 6f8eaa55cfdf09..34c937f8a05d6c 100644
--- a/lldb/source/Host/posix/MainLoopPosix.cpp
+++ b/lldb/source/Host/posix/MainLoopPosix.cpp
@@ -17,17 +17,14 @@
 #include <cerrno>
 #include <csignal>
 #include <ctime>
+#include <fcntl.h>
 #include <vector>
 
 // Multiplexing is implemented using kqueue on systems that support it (BSD
-// variants including OSX). On linux we use ppoll, while android uses pselect
-// (ppoll is present but not implemented properly). On windows we use WSApoll
-// (which does not support signals).
+// variants including OSX). On linux we use ppoll.
 
 #if HAVE_SYS_EVENT_H
 #include <sys/event.h>
-#elif defined(__ANDROID__)
-#include <sys/syscall.h>
 #else
 #include <poll.h>
 #endif
@@ -35,11 +32,38 @@
 using namespace lldb;
 using namespace lldb_private;
 
-static sig_atomic_t g_signal_flags[NSIG];
+namespace {
+struct GlobalSignalInfo {
+  sig_atomic_t pipe_fd = -1;
+  static_assert(sizeof(sig_atomic_t) >= sizeof(int),
+                "Type too small for a file descriptor");
+  sig_atomic_t flag = 0;
+};
+} // namespace
+static GlobalSignalInfo g_signal_info[NSIG];
 
 static void SignalHandler(int signo, siginfo_t *info, void *) {
   assert(signo < NSIG);
-  g_signal_flags[signo] = 1;
+
+  // Set the flag before writing to the pipe!
+  g_signal_info[signo].flag = 1;
+
+  char c = '.';
+  int fd = g_signal_info[signo].pipe_fd;
+  if (fd < 0) {
+    // This can happen with the following (unlikely) sequence of events:
+    // 1. Thread 1 gets a signal, starts running the signal handler
+    // 2. Thread 2 unregisters the signal handler, setting pipe_fd to -1
+    // 3. Signal handler on thread 1 reads -1 out of pipe_fd
+    // In this case, we can just ignore the signal because we're no longer
+    // interested in it.
+    return;
+  }
+
+  ssize_t bytes_written = llvm::sys::RetryAfterSignal(-1, ::write, fd, &c, 1);
+  // We're only using the pipe to wake up the reader, so we can safely ignore
+  // EAGAIN (pipe full)
+  assert(bytes_written == 1 || (bytes_written == -1 && errno == EAGAIN));
 }
 
 class MainLoopPosix::RunImpl {
@@ -48,7 +72,7 @@ class MainLoopPosix::RunImpl {
   ~RunImpl() = default;
 
   Status Poll();
-  void ProcessEvents();
+  void ProcessReadEvents();
 
 private:
   MainLoopPosix &loop;
@@ -58,15 +82,9 @@ class MainLoopPosix::RunImpl {
   struct kevent out_events[4];
   int num_events = -1;
 
-#else
-#ifdef __ANDROID__
-  fd_set read_fd_set;
 #else
   std::vector<struct pollfd> read_fds;
 #endif
-
-  sigset_t get_sigmask();
-#endif
 };
 
 #if HAVE_SYS_EVENT_H
@@ -94,7 +112,7 @@ Status MainLoopPosix::RunImpl::Poll() {
   return Status();
 }
 
-void MainLoopPosix::RunImpl::ProcessEvents() {
+void MainLoopPosix::RunImpl::ProcessReadEvents() {
   assert(num_events >= 0);
   for (int i = 0; i < num_events; ++i) {
     if (loop.m_terminate_request)
@@ -103,9 +121,6 @@ void MainLoopPosix::RunImpl::ProcessEvents() {
     case EVFILT_READ:
       loop.ProcessReadObject(out_events[i].ident);
       break;
-    case EVFILT_SIGNAL:
-      loop.ProcessSignal(out_events[i].ident);
-      break;
     default:
       llvm_unreachable("Unknown event");
     }
@@ -113,64 +128,12 @@ void MainLoopPosix::RunImpl::ProcessEvents() {
 }
 #else
 MainLoopPosix::RunImpl::RunImpl(MainLoopPosix &loop) : loop(loop) {
-#ifndef __ANDROID__
   read_fds.reserve(loop.m_read_fds.size());
-#endif
 }
 
-sigset_t MainLoopPosix::RunImpl::get_sigmask() {
-  sigset_t sigmask;
-  int ret = pthread_sigmask(SIG_SETMASK, nullptr, &sigmask);
-  assert(ret == 0);
-  UNUSED_IF_ASSERT_DISABLED(ret);
-
-  for (const auto &sig : loop.m_signals)
-    sigdelset(&sigmask, sig.first);
-  return sigmask;
-}
-
-#ifdef __ANDROID__
-Status MainLoopPosix::RunImpl::Poll() {
-  // ppoll(2) is not supported on older all android versions. Also, older
-  // versions android (API <= 19) implemented pselect in a non-atomic way, as a
-  // combination of pthread_sigmask and select. This is not sufficient for us,
-  // as we rely on the atomicity to correctly implement signal polling, so we
-  // call the underlying syscall ourselves.
-
-  FD_ZERO(&read_fd_set);
-  int nfds = 0;
-  for (const auto &fd : loop.m_read_fds) {
-    FD_SET(fd.first, &read_fd_set);
-    nfds = std::max(nfds, fd.first + 1);
-  }
-
-  union {
-    sigset_t set;
-    uint64_t pad;
-  } kernel_sigset;
-  memset(&kernel_sigset, 0, sizeof(kernel_sigset));
-  kernel_sigset.set = get_sigmask();
-
-  struct {
-    void *sigset_ptr;
-    size_t sigset_len;
-  } extra_data = {&kernel_sigset, sizeof(kernel_sigset)};
-  if (syscall(__NR_pselect6, nfds, &read_fd_set, nullptr, nullptr, nullptr,
-              &extra_data) == -1) {
-    if (errno != EINTR)
-      return Status(errno, eErrorTypePOSIX);
-    else
-      FD_ZERO(&read_fd_set);
-  }
-
-  return Status();
-}
-#else
 Status MainLoopPosix::RunImpl::Poll() {
   read_fds.clear();
 
-  sigset_t sigmask = get_sigmask();
-
   for (const auto &fd : loop.m_read_fds) {
     struct pollfd pfd;
     pfd.fd = fd.first;
@@ -179,55 +142,34 @@ Status MainLoopPosix::RunImpl::Poll() {
     read_fds.push_back(pfd);
   }
 
-  if (ppoll(read_fds.data(), read_fds.size(), nullptr, &sigmask) == -1 &&
+  if (ppoll(read_fds.data(), read_fds.size(),
+            /*timeout=*/nullptr,
+            /*sigmask=*/nullptr) == -1 &&
       errno != EINTR)
     return Status(errno, eErrorTypePOSIX);
 
   return Status();
 }
-#endif
 
-void MainLoopPosix::RunImpl::ProcessEvents() {
-#ifdef __ANDROID__
-  // Collect first all readable file descriptors into a separate vector and
-  // then iterate over it to invoke callbacks. Iterating directly over
-  // loop.m_read_fds is not possible because the callbacks can modify the
-  // container which could invalidate the iterator.
-  std::vector<IOObject::WaitableHandle> fds;
-  for (const auto &fd : loop.m_read_fds)
-    if (FD_ISSET(fd.first, &read_fd_set))
-      fds.push_back(fd.first);
-
-  for (const auto &handle : fds) {
-#else
+void MainLoopPosix::RunImpl::ProcessReadEvents() {
   for (const auto &fd : read_fds) {
     if ((fd.revents & (POLLIN | POLLHUP)) == 0)
       continue;
     IOObject::WaitableHandle handle = fd.fd;
-#endif
     if (loop.m_terminate_request)
       return;
 
     loop.ProcessReadObject(handle);
   }
-
-  std::vector<int> signals;
-  for (const auto &entry : loop.m_signals)
-    if (g_signal_flags[entry.first] != 0)
-      signals.push_back(entry.first);
-
-  for (const auto &signal : signals) {
-    if (loop.m_terminate_request)
-      return;
-    g_signal_flags[signal] = 0;
-    loop.ProcessSignal(signal);
-  }
 }
 #endif
 
 MainLoopPosix::MainLoopPosix() : m_triggering(false) {
   Status error = m_trigger_pipe.CreateNew(/*child_process_inherit=*/false);
   assert(error.Success());
+  assert(fcntl(m_trigger_pipe.GetWriteFileDescriptor(), F_SETFL,
+               O_NONBLOCK | fcntl(m_trigger_pipe.GetWriteFileDescriptor(),
+                                  F_GETFL)) == 0);
   const int trigger_pipe_fd = m_trigger_pipe.GetReadFileDescriptor();
   m_read_fds.insert({trigger_pipe_fd, [trigger_pipe_fd](MainLoopBase &loop) {
                        char c;
@@ -295,7 +237,9 @@ MainLoopPosix::RegisterSignal(int signo, const Callback &callback,
   sigaddset(&new_action.sa_mask, signo);
   sigset_t old_set;
 
-  g_signal_flags[signo] = 0;
+  // Set signal info before installing the signal handler!
+  g_signal_info[signo].pipe_fd = m_trigger_pipe.GetWriteFileDescriptor();
+  g_signal_info[signo].flag = 0;
 
   // Even if using kqueue, the signal handler will still be invoked, so it's
   // important to replace it with our "benign" handler.
@@ -303,18 +247,7 @@ MainLoopPosix::RegisterSignal(int signo, const Callback &callback,
   UNUSED_IF_ASSERT_DISABLED(ret);
   assert(ret == 0 && "sigaction failed");
 
-#if HAVE_SYS_EVENT_H
-  struct kevent ev;
-  EV_SET(&ev, signo, EVFILT_SIGNAL, EV_ADD, 0, 0, 0);
-  ret = kevent(m_kqueue, &ev, 1, nullptr, 0, nullptr);
-  assert(ret == 0);
-#endif
-
-  // If we're using kqueue, the signal needs to be unblocked in order to
-  // receive it. If using pselect/ppoll, we need to block it, and later unblock
-  // it as a part of the system call.
-  ret = pthread_sigmask(HAVE_SYS_EVENT_H ? SIG_UNBLOCK : SIG_BLOCK,
-                        &new_action.sa_mask, &old_set);
+  ret = pthread_sigmask(SIG_UNBLOCK, &new_action.sa_mask, &old_set);
   assert(ret == 0 && "pthread_sigmask failed");
   info.was_blocked = sigismember(&old_set, signo);
   auto insert_ret = m_signals.insert({signo, info});
@@ -349,14 +282,8 @@ void MainLoopPosix::UnregisterSignal(
   assert(ret == 0);
   UNUSED_IF_ASSERT_DISABLED(ret);
 
-#if HAVE_SYS_EVENT_H
-  struct kevent ev;
-  EV_SET(&ev, signo, EVFILT_SIGNAL, EV_DELETE, 0, 0, 0);
-  ret = kevent(m_kqueue, &ev, 1, nullptr, 0, nullptr);
-  assert(ret == 0);
-#endif
-
   m_signals.erase(it);
+  g_signal_info[signo] = {};
 }
 
 Status MainLoopPosix::Run() {
@@ -370,7 +297,9 @@ Status MainLoopPosix::Run() {
     if (error.Fail())
       return error;
 
-    impl.ProcessEvents();
+    impl.ProcessReadEvents();
+
+    ProcessSignals();
 
     m_triggering = false;
     ProcessPendingCallbacks();
@@ -384,6 +313,21 @@ void MainLoopPosix::ProcessReadObject(IOObject::WaitableHandle handle) {
     it->second(*this); // Do the work
 }
 
+void MainLoopPosix::ProcessSignals() {
+  std::vector<int> signals;
+  for (const auto &entry : m_signals)
+    if (g_signal_info[entry.first].flag != 0)
+      signals.push_back(entry.first);
+
+  for (const auto &signal : signals) {
+    if (m_terminate_request)
+      return;
+
+    g_signal_info[signal].flag = 0;
+    ProcessSignal(signal);
+  }
+}
+
 void MainLoopPosix::ProcessSignal(int signo) {
   auto it = m_signals.find(signo);
   if (it != m_signals.end()) {
diff --git a/lldb/unittests/Host/MainLoopTest.cpp b/lldb/unittests/Host/MainLoopTest.cpp
index 4688d4fed475b6..622a547fa22f04 100644
--- a/lldb/unittests/Host/MainLoopTest.cpp
+++ b/lldb/unittests/Host/MainLoopTest.cpp
@@ -254,6 +254,17 @@ TEST_F(MainLoopTest, Signal) {
   ASSERT_EQ(1u, callback_count);
 }
 
+TEST_F(MainLoopTest, SignalOnOtherThread) {
+  MainLoop loop;
+  Status error;
+
+  auto handle = loop.RegisterSignal(SIGUSR1, make_callback(), error);
+  ASSERT_TRUE(error.Success());
+  std::thread([] { pthread_kill(pthread_self(), SIGUSR1); }).join();
+  ASSERT_TRUE(loop.Run().Success());
+  ASSERT_EQ(1u, callback_count);
+}
+
 // Test that a signal which is not monitored by the MainLoop does not
 // cause a premature exit.
 TEST_F(MainLoopTest, UnmonitoredSignal) {

@labath
Copy link
Collaborator Author

labath commented Nov 14, 2024

Any brave souls? :)

@rocallahan, I can't add you as a reviewer (I'm not sure what one needs to do that), but I would totally go with your LGTM :)

@JDevlieghere
Copy link
Member

I'm not sure what one needs to do that

You need to be a member of the LLVM organization (i.e. have commit access). This has tripped me up a few times as well.

@rocallahan
Copy link
Contributor

rocallahan commented Nov 14, 2024 via email

Copy link
Contributor

@Jlalond Jlalond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not expert, but everything looks rigorous and you've added a lot of comments.

I left a few nits and a singular question. I would encourage comments on some file descriptors for the less knowledgeable of us (namely me!)

// Set the flag before writing to the pipe!
g_signal_info[signo].flag = 1;

char c = '.';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, I would like a slightly more descriptive variable name, mostly the intent of .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about the intent. I think c is a good name for an entity with such a short scope. The choice of character is irrelevant -- we only need to write something to force poll to return.

@@ -179,55 +142,34 @@ Status MainLoopPosix::RunImpl::Poll() {
read_fds.push_back(pfd);
}

if (ppoll(read_fds.data(), read_fds.size(), nullptr, &sigmask) == -1 &&
if (ppoll(read_fds.data(), read_fds.size(),
/*timeout=*/nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a reasonable timeout make sense as a precaution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what that would solve. The goal is to sleep until one of the FDs is readable. If none of them is, the only thing we can do is go back to sleep. Sure, we can have a bug where we e.g. somehow fail to notify the pipe about a signal, but that can also happen with a spurious wakeup (cause then you have to check if the signal really occurred, and maybe the bug is that you forgot to set that flag).

FWIW, there will be a timeout here after #112895 (but there is still the possibility of an infinite wait if/when there are no scheduled callbacks).

}
#endif

MainLoopPosix::MainLoopPosix() : m_triggering(false) {
Status error = m_trigger_pipe.CreateNew(/*child_process_inherit=*/false);
assert(error.Success());
assert(fcntl(m_trigger_pipe.GetWriteFileDescriptor(), F_SETFL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that you're asserting setting the file descriptor and then reading it produces the same output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm setting the flags on the descriptor. This is essentially fd->flags |= O_NONBLOCK. I added a comment and fixed the embarrassing mistake of putting a load-bearing statement into an assert block.

@@ -384,6 +313,21 @@ void MainLoopPosix::ProcessReadObject(IOObject::WaitableHandle handle) {
it->second(*this); // Do the work
}

void MainLoopPosix::ProcessSignals() {
std::vector<int> signals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Uneducated question) Is int actually the correct type for signals, and not an unsigned value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the type that the OS signal APIs use. I think it's best to stick with that even though the values would technically fit into a uint8_t.

@labath
Copy link
Collaborator Author

labath commented Nov 15, 2024

I'm not expert, but everything looks rigorous and you've added a lot of comments.

I left a few nits and a singular question. I would encourage comments on some file descriptors for the less knowledgeable of us (namely me!)

That you for the review. Making sure the code is understandable to the uninitiated is an important part of it.

@labath labath merged commit c25c6c3 into llvm:main Nov 18, 2024
7 checks passed
@labath labath deleted the sig branch November 18, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants