Skip to content

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Feb 10, 2022

Re-use the shared ptls->bt_data buffer from the thread-local storage
for the buffer, to ensure that each thread has a separate buffer.

This buffer is shared with the exception throwing mechanism, but is safe
to share since julia exception throwing never interleaves with
allocations profiling.

This fixes #44099.

@NHDaly NHDaly force-pushed the nhd-alloc-profiler-thread-safety branch from 9a62751 to d7f3674 Compare February 10, 2022 17:17
@NHDaly NHDaly mentioned this pull request Feb 10, 2022
7 tasks
@NHDaly NHDaly marked this pull request as ready for review February 10, 2022 17:19
@NHDaly NHDaly requested a review from vtjnash February 10, 2022 17:19
@vilterp
Copy link
Contributor

vilterp commented Feb 10, 2022

Could some allocations happen while an exception is being thrown and this buffer is holding the exception backtrace? 🤔🤔

@NHDaly
Copy link
Member Author

NHDaly commented Feb 10, 2022

yeah, that's a fantastic question @vilterp... I think @vtjnash is saying no that can't happen, because we basically immediately copy it off the buffer onto a properly sized object, once we know the size (same as we were doing), such as here:

julia/src/task.c

Lines 621 to 623 in 7ccc83e

jl_push_excstack(&ct->excstack, exception,
ptls->bt_data, ptls->bt_size);
ptls->bt_size = 0;

But I frankly can't tell. The comments indicating that we're supposedly using this buffer for rooting implies to me that YES this can ABSOLUTELY happen, in which case 🙅 🙅 no good.

But i would like to get some confirmation from others on this. Just in case, i've started a second PR with the other, simpler approach, of just creating a separate buffer for this, here: #44116

Re-use the shared `ptls->bt_data` buffer from the thread-local storage
for the buffer, to ensure that each thread has a separate buffer.

This buffer is shared with the exception throwing mechanism, but is safe
to share since julia exception throwing never interleaves with
allocations profiling.
@NHDaly NHDaly force-pushed the nhd-alloc-profiler-thread-safety branch from 04e6c5a to e3fa8eb Compare February 15, 2022 01:23
@vtjnash
Copy link
Member

vtjnash commented Feb 16, 2022

Yes, we use ptls->bt_data so that it is safe across the jl_gc_alloc_buf in that jl_push_excstack. Is that okay?

@c42f
Copy link
Member

c42f commented Feb 25, 2022

To perhaps clear up some confusion here... the preallocated bt_data exists fundamentally to avoid allocation in signal handlers — see for example jl_throw_in_ctx in src/signals-unix.c. The point is that inside a signal handler (for things like stack overflow) we must record the backtrace based on the program stack context we have available only within that signal handler. But we're not allowed to call most libc functions in a signal handler and certainly not malloc! The chosen solution is to have a preallocated buffer bt_data available in some thread local storage.

But we can't store backtraces permanently in bt_data because that's thread-local and we want backtraces to survive task switching. Thus when catching an exception we copy it out of bt_data as soon as possible and into the tasks local exception stack storage. However, this task local storage may need to grow (the jl_gc_alloc_buf inside jl_push_excstack, as pointed out by Jameson). So unless that jl_gc_alloc_buf bypasses the memory allocation profiler, the allocation profiler will overwrite the exception backtrace in bt_data.

@vtjnash vtjnash closed this Feb 25, 2022
@vtjnash
Copy link
Member

vtjnash commented Feb 25, 2022

Fixed by #44116

@vtjnash vtjnash deleted the nhd-alloc-profiler-thread-safety branch February 25, 2022 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allocation profiler not threadsafe
4 participants