Skip to content

Using the inspector protocol for the JavaScript postmortem use case #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mmarchini opened this issue Jul 23, 2019 · 4 comments
Closed

Comments

@mmarchini
Copy link
Contributor

I've been looking into ways to improve the story around the following diagnostics use case:

On a production environemnt, how to get rich information about the JavaScript context when a process crashes and there's not enough observability on the path which caused the process to crash?

Today we have llnode which can fulfill this use case by allowing users to inspect the value of any variable on the heap after the process crashes, as well as allow users to look at which variables where available on the scope of each frame on the stack at the point of crash. llnode and core dumps have a lot of caveats which we discussed several times, so having a more stable alternative which covers the most common use cases (uncaught exceptions/unhandled rejections on JavaScript) would be nice.

Following a suggestion by @hashseed, I've been looking into using the inspector protocol to fulfill this use case. I came up with a proof-of-concept which uses the inspector protocol to save the state of the process before crashing. This saved state can later be loaded by a separate application which exposes an inspector protocol-compliant websocket. Essentially, this allows us to use Chrome DevTools to look at the state of a process after it crashed, giving users a postmortem tool with excellent usability.

The proof-of-concept is available at: https://github.com/mmarchini/inspector-postmortem

I found several caveats while working on that proof-of-concept (which is why it's not usable in production):

  1. We have to use the Debugger domain to properly capture the state of the process (with the correct exception stack trace and the variables available on that stack trace). Using the Debugger domain will cause V8 to bail on some optimizations and take the slow path instead, which can lead to slowdowns of 200% in some cases (this slowdown was measured with the following benchmark: https://github.com/v8/promise-performance-tests).
  2. We might run out of memory while trying to capture the state of the VM before exiting.
  3. Exit will be delayed until we finish capturing the state of the process. Some applications won't have a problem with that, but other applications can't cope with exit delay.
  4. We can't recursively get all objects accessible from the current scope because the Inspector Protocol doesn't use unique RemoteObjectId for each object, which means we'll get into an infinite recursive loop if we try to get all objects.

(Dealing with Promises also bring some caveats, but I'll open separate issues for those.)

Despite the caveats, the end result is amazing. Being able to use Chrome DevTools to look at any crashed Node.js process would allow developers to understand why their application crashed by using a well-known and well-established debugging interface.

So what do we need to make this production ready? I think the biggest concern today is using the Debugger domain (both because of performance and safety issues). If we could have access to the same information via the Runtime domain, that would be great (Runtime.exceptionThrown doesn't expose information about the variables in the execution scope). Another alternative would be to have the state saving functionality built into V8, as this could benefit other projects as well. The other issues are manageable and we could work on those in the future.

Thoughts?

@hashseed
Copy link
Member

This sounds pretty cool!

I think the slow down caused by the Debugger domain mainly comes from tracking promises for exception prediction.

@mhdawson
Copy link
Member

Definitely sounds interesting.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants