Using the inspector protocol for the JavaScript postmortem use case #320

mmarchini · 2019-07-23T22:08:04Z

I've been looking into ways to improve the story around the following diagnostics use case:

On a production environemnt, how to get rich information about the JavaScript context when a process crashes and there's not enough observability on the path which caused the process to crash?

Today we have llnode which can fulfill this use case by allowing users to inspect the value of any variable on the heap after the process crashes, as well as allow users to look at which variables where available on the scope of each frame on the stack at the point of crash. llnode and core dumps have a lot of caveats which we discussed several times, so having a more stable alternative which covers the most common use cases (uncaught exceptions/unhandled rejections on JavaScript) would be nice.

Following a suggestion by @hashseed, I've been looking into using the inspector protocol to fulfill this use case. I came up with a proof-of-concept which uses the inspector protocol to save the state of the process before crashing. This saved state can later be loaded by a separate application which exposes an inspector protocol-compliant websocket. Essentially, this allows us to use Chrome DevTools to look at the state of a process after it crashed, giving users a postmortem tool with excellent usability.

The proof-of-concept is available at: https://github.com/mmarchini/inspector-postmortem

I found several caveats while working on that proof-of-concept (which is why it's not usable in production):

We have to use the Debugger domain to properly capture the state of the process (with the correct exception stack trace and the variables available on that stack trace). Using the Debugger domain will cause V8 to bail on some optimizations and take the slow path instead, which can lead to slowdowns of 200% in some cases (this slowdown was measured with the following benchmark: https://github.com/v8/promise-performance-tests).
We might run out of memory while trying to capture the state of the VM before exiting.
Exit will be delayed until we finish capturing the state of the process. Some applications won't have a problem with that, but other applications can't cope with exit delay.
We can't recursively get all objects accessible from the current scope because the Inspector Protocol doesn't use unique RemoteObjectId for each object, which means we'll get into an infinite recursive loop if we try to get all objects.

(Dealing with Promises also bring some caveats, but I'll open separate issues for those.)

Despite the caveats, the end result is amazing. Being able to use Chrome DevTools to look at any crashed Node.js process would allow developers to understand why their application crashed by using a well-known and well-established debugging interface.

So what do we need to make this production ready? I think the biggest concern today is using the Debugger domain (both because of performance and safety issues). If we could have access to the same information via the Runtime domain, that would be great (Runtime.exceptionThrown doesn't expose information about the variables in the execution scope). Another alternative would be to have the state saving functionality built into V8, as this could benefit other projects as well. The other issues are manageable and we could work on those in the future.

Thoughts?

The text was updated successfully, but these errors were encountered:

hashseed · 2019-07-24T05:57:42Z

This sounds pretty cool!

I think the slow down caused by the Debugger domain mainly comes from tracking promises for exception prediction.

mhdawson · 2019-07-25T20:04:09Z

Definitely sounds interesting.

github-actions · 2020-07-16T00:36:48Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions · 2022-07-22T00:29:34Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

github-actions bot added the stale label Jul 16, 2020

mmarchini added never stale and removed stale labels Jul 17, 2020

mmarchini mentioned this issue Oct 17, 2021

Enrich stack traces with an API to add notes to a current stack frame nodejs/node#40331

Closed

github-actions bot added the stale label Jul 22, 2022

github-actions bot closed this as completed Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using the inspector protocol for the JavaScript postmortem use case #320

Using the inspector protocol for the JavaScript postmortem use case #320

mmarchini commented Jul 23, 2019

hashseed commented Jul 24, 2019

Uh oh!

mhdawson commented Jul 25, 2019

Uh oh!

github-actions bot commented Jul 16, 2020

Uh oh!

github-actions bot commented Jul 22, 2022

Uh oh!

Uh oh!

Using the inspector protocol for the JavaScript postmortem use case #320

Using the inspector protocol for the JavaScript postmortem use case #320

Comments

mmarchini commented Jul 23, 2019

hashseed commented Jul 24, 2019

Uh oh!

mhdawson commented Jul 25, 2019

Uh oh!

github-actions bot commented Jul 16, 2020

Uh oh!

github-actions bot commented Jul 22, 2022

Uh oh!