Skip to content

runtime/trace: flush trace data on non-throw crashes #65319

Closed
@mknyszek

Description

@mknyszek

If a Go program has tracing enabled and crashes, chances are that the most recent data (the most useful data) won't be properly flushed, and the trace will be broken. We can discard this broken part of the trace in the tooling (#65316), but it doesn't change the fact that we might loose a lot of information.

The thing is, many crashes that only impact user program state (such as nil dereferences and uncaught-but-recoverable panics) can absolutely still go through with a global buffer flush (runtime.traceAdvance) since the runtime state is still OK.

I'd like to suggest explicitly flushing all trace data on an uncaught panic or a crash due to some "easier" case, like nil dereferences, so that as much of the data comes out in-tact as possible.

Activity

mknyszek

mknyszek commented on Jan 26, 2024

@mknyszek
ContributorAuthor

Note: this is related to #63185 (flight recording) as well, since this could make recovering trace data from a crash while flight recording was enabled much more successful in the future. We could consider also adding the ability to install an optional handler to the flight recorder for writing out trace data in these cases, though that should probably go in the flight recording proposal.

added this to the Backlog milestone on Jan 26, 2024
mknyszek

mknyszek commented on Jan 26, 2024

@mknyszek
ContributorAuthor

This also goes hand-in-hand with #65316, since it's still likely the tail end of the trace data will be broken, since the crash still has to happen.

added
NeedsFixThe path to resolution is known, but the work has not been done.
on Jan 26, 2024
self-assigned this
on Jan 31, 2024
gopherbot

gopherbot commented on Feb 9, 2024

@gopherbot
Contributor

Change https://go.dev/cl/562616 mentions this issue: runtime: call traceAdvance before exiting

added a commit that references this issue on Feb 10, 2024
20f4b6d
added a commit that references this issue on Feb 18, 2024
64ce206
modified the milestones: Backlog, Go1.23 on May 24, 2024
aktau

aktau commented on Oct 22, 2024

@aktau
Contributor

I'm wondering if we could do this in more cases. I think I see three top-level crashing functions (all calling startpanic_m):

Which of these paths could we conceivably add a traceAdvance to? The most interesting case for us would be sighandler(), but that is annotated //go:nowritebarrierrec, which conflicts with traceAdvance. Is there a way to get around this? Or is there a way to create a traceAdvanceLite which does as much as feasible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @aktau@mknyszek@dmitshur@gopherbot@cherrymui

      Issue actions

        runtime/trace: flush trace data on non-throw crashes · Issue #65319 · golang/go