Skip to content

feat(native): add stack overflow handling to advanced usage #13548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions docs/platforms/native/advanced-usage/stack-overflow-handling/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
title: Handling Stack Overflows
description: "Learn about differences in reporting crashes from stack-overflows across platforms and Sentry can help."
sidebar_order: 1000
---
Application crashes due to stack overflow differ from other crashes from the handler's perspective because the handler
relies on the resource that ran out: stack space. Since the handler typically runs on the thread whose stack overflowed,
it can no longer use stack variables or call functions. This results in a crashed handler that can't report the initial
crash.

How to handle this issue is different from platform to platform, but options boil down to:

* allocating a stack that only the crash handler can use (Linux and Windows)
* running the handler in a separate thread (or process), which will receive a message of the crash asynchronously (macOS)

Independent of whether an application crashed due to stack overflow or not, handlers should make minimal use of the
stack because even if there was no stack overflow, the stack amount available to the handler could be limited. This is
especially true for users who use the `on_crash` or `before_send` hook over which Sentry has no control.

On Linux (and other `POSIX` systems), users should preallocate everything before their hooks run and only move data into
preallocated storage because heap allocations can also fail inside the signal handler (constructing `sentry_value_t` is
okay because we use a safe allocator inside the signal handler). See also
[What to consider when writing on_crash hooks](https://docs.sentry.io/platforms/native/advanced-usage/signal-handling/#what-to-consider-when-writing-on_crash-hooks).

## How do OSes differ and how can Sentry help?

### Windows

The Windows API provides a [thread-stack guarantee interface](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreadstackguarantee) where users can give a size in bytes reserved for the handler to run in case of a crash. However, this size is subtracted from the thread stack reserve as it is a direct continuation inside the thread stack, not a separate allocation or memory region.

This means the developer must weigh the thread stack reserve against the handler's guarantee during regular operation.
Otherwise, the guarantee used for the handler could eat enough stack space to lead to an overflow.

This should not be the case for most threads on Windows, which have a default stack reserve of 1MiB (whereas the
required handler guarantee will be only 10s of KiB). However, some threads created by specific runtimes or the kernel
(for drivers) might have much smaller stack reserves, where a handler guarantee of 32KiB could already be half or all
the stack available to the thread.

In short, while Windows provides a very high-level request interface ("guarantee me x bytes for my handler"), it is not
flexible regarding the location of the guaranteed handler stack. As such, you must consider the size of the guarantee in
the context of the stack reserve and the actual stack use in a particular thread. The latter is hard to do for threads
you do not control.

In addition, you must request the stack guarantee from within the thread for which you want it. You cannot set a
guarantee from the outside, which typically limits you to the threads you own.

On Windows, the Native SDK automatically sets a stack guarantee of 64KiB for all threads that start after loading it as
a shared library. For static library builds, we only automatically set the stack guarantee for the thread that calls
`sentry_init()`.

If you need to set stack guarantees manually, you can use the Win32 API directly or `sentry_set_thread_stack_guarantee()`,
which provides logging and prevents overriding a previously set stack guarantee.

The auto initialization is also defensive in requesting the stack reserve for each thread it runs on and only attempts
to set a guarantee if the reserve is at least 10 times larger than the requested default guarantee.

You can parameterize this behavior to suite your use-case:

* you can disable

### Linux or OSes that primarily use POSIX signal handlers

When you use POSIX signal handlers, you can specify a `sigaltstack`. This alternative signal stack allows the kernel to
continue the handler stack even if the crashed and preempted thread stack runs out.

This relatively low-level interface allows users to specify an arbitrary memory range (from the heap or any `mmap` a
user can access). The upside of allowing the user to determine the size _and_ location offers flexibility compared to
the Windows approach because it is independent of the stack usage and size of the crashed thread and allows you to add
additional bounds like protected regions around the handler stack. However, it also adds environmental complexity because
a badly placed or incorrectly set up memory region could lead to hard-to-identify bugs (consider a handler stack inside
the heap, where a handler overflow could lead to an arbitrary heap corruption).

Like Windows, you can only assign a `sigaltstack` from within the thread, meaning you can only set the handler region
for threads you own.

### Android

Android automatically configures every thread to use a `sigaltstack` size of 16KiB (on 32-bit systems) and 32KiB (on
64-bit systems). The Android team recommends not overriding these because configuration inconsistencies with the signal
stacks provided by Android can lead to crashes. The `inproc` backend of the Native SDK used in the Android integration
will not define a `sigaltstack` on Linux/Android if one is already specified. Thus, only the default `sigaltstack` will
be used on Android, and you can be sure that one exists for each thread.

### macOS when using mach exception port listeners

The Mach exception port listener typically blocks in a separate thread until the kernel delivers a Mach exception. Since
the listener thread is entirely independent of the thread that crashed, an exception caused by a stack overflow will
never affect the available stack for the handler. This is even more true for `crashpad` on macOS, where the handler
doesn't only run in a separate thread but in a separate process.

Be aware that in contrast to mach exception port usage, signal handlers on macOS run on the same thread that caused the
signal and thus also need a `sigaltstack` to handle any crash from a stack overflow.

### What does the Native SDK do when using signal handlers?

All backends that use signal handlers as their primary means of handling

### Windows

### macOS (when using Mach exception ports)