-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Stackwalk support for interpreted frames #113900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stackwalk support for interpreted frames #113900
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 20 changed files in this pull request and generated no comments.
Files not reviewed (19)
- src/coreclr/debug/daccess/dacdbiimplstackwalk.cpp: Language not supported
- src/coreclr/debug/daccess/stack.cpp: Language not supported
- src/coreclr/debug/inc/dacdbistructures.inl: Language not supported
- src/coreclr/inc/eetwain.h: Language not supported
- src/coreclr/vm/FrameTypes.h: Language not supported
- src/coreclr/vm/codeman.cpp: Language not supported
- src/coreclr/vm/codeman.h: Language not supported
- src/coreclr/vm/eetwain.cpp: Language not supported
- src/coreclr/vm/eventtrace.cpp: Language not supported
- src/coreclr/vm/exceptionhandling.cpp: Language not supported
- src/coreclr/vm/frames.cpp: Language not supported
- src/coreclr/vm/frames.h: Language not supported
- src/coreclr/vm/gcinfodecoder.cpp: Language not supported
- src/coreclr/vm/interpexec.cpp: Language not supported
- src/coreclr/vm/interpexec.h: Language not supported
- src/coreclr/vm/jitinterface.h: Language not supported
- src/coreclr/vm/prestub.cpp: Language not supported
- src/coreclr/vm/stackwalk.cpp: Language not supported
- src/coreclr/vm/threadsuspend.cpp: Language not supported
Tagging subscribers to this area: @mangod9 |
src/coreclr/vm/FrameTypes.h
Outdated
FRAME_TYPE_NAME(DebuggerU2MCatchHandlerFrame) | ||
FRAME_TYPE_NAME(ExceptionFilterFrame) | ||
#ifdef FEATURE_INTERPRETER | ||
FRAME_TYPE_NAME(InterpreterEntryFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need really to have both entry and exit frames? I would expect that one InterpreterFrame
should be enough. We will know that we have exited the interpreter if we find some other Frame or managed code first on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering that and this seems to work better with the stack walker. We could possibly have just the InterpreterEntryFrame
, but then upon reaching it, we would immediately need to switch to the underlying interpreter frames and then once we get out of them, hit the InterpreterEntryFrame
again and copy the transition record to the context to move to the caller of the sequence of interpreted frames. Also, the InterpreterEntryFrame
doesn't have a direct pointer to the topmost frame in the InterpExecMethod
, Vlad was concerned about updating that on every interpreter to interpreter call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the interpreter -> AOT code calls going to push InterpreterExitFrame?
I think we should be optimizing for making the interpreter -> AOT calls as cheap as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those would need it too. We can pass current thread as a parameter to the InterpMethodFrame and then use it to push that frame without the thread local access overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the purpose of the InterpreterEntryFrame
is to store the context so can unwind to AOT callers, while the purpose of the InterpreterExitFrame
is to make it cleaner to access the top most interpreter frame ? Would we always resume execution in the interpreter loop via a try/catch ? So if EH determines that an interpreter method caught the exception, in order to resume in the interpreter it would throw a C++ exception that will get caught at this interpreter exit location ?
src/coreclr/vm/frames.cpp
Outdated
PTR_InterpMethodContextFrame pFrame = m_pInterpMethodContextFrame; | ||
_ASSERTE(pFrame != NULL && pFrame->ip != NULL); | ||
|
||
// The frames of a method are linked in a reverse order (from bottom to top of the part of the stack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an advantage in this reverse order? I would expect the interpreter frame to point to what we are executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vlad was concerned about updating such a pointer on every interpreter to interpreter call. The current call implementation is pretty efficient and according to his experience from Mono, this would add noticeable overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, this reverse walk is only needed in the native debugger on desktop scenario when someone has the debugger sitting in the middle of the InterpExecMethod
, nothing else needs to walk that way (provided we have the two interpreter frame kinds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vlad was concerned about updating such a pointer on every interpreter to interpreter call.
We do not need to update the pointer on every interpreter-to-interpreter call. We only need to update the pointer when we exit the interpreter. ie instead of pushing the full Frame to exit the interpreter, we would just need to patch the current IP or current method in the Interpreter frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that since only the native debugger needs to walk this list and it would need to walk it even if I changed it the way you suggested in case it was sitting in the InterpExecMethod
and someone has invoked the clrstack
command, it doesn't seem really beneficial to store it. Unless we moved to the single interpreter frame way, which would introduce the unusual double handling of single explicit frame to the StackFrameIterator
or something similar to that.
What is your concern about having two separate explicit frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scheme looks more expensive than it needs to be and non-intuitive (the linked list is in wrong direction). EDIT: I see that it is double linked list so it has links in the right direction too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the direction I am using for this specific native debugger scenario is primarily used for the interpreter internal purposes, I have just reused it for these rare scenarios. This is not used for managed debuggers, GC or EH. In those cases we would always have the interpreter exit frame.
dd98e2b
to
62f3514
Compare
@jkotas I have made changes based on your feedback. There is now a single |
The CI test failures are unrelated. |
InterpExecMethod(&interpFrame, threadContext); | ||
InterpreterFrame interpreterFrame(pTransitionBlock, &interpMethodContextFrame); | ||
|
||
InterpExecMethod(&interpreterFrame, &interpMethodContextFrame, threadContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks ok for now. Do we have a plan for where and how the arguments are going to converted from the TransitionBlock
to the interpreter representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking when invoking an interpreted method through the precode thunks, instead of calling ExecuteInterpretedMethod
like we do now, it will call some other assembly thunk that receives a lower level bytecode referenced from the interpreter bytecode (maybe in some Interpreter code header). These assembly thunks will move arguments from the transition block to the interpreter stack, finally dispatching to ExecuteInterpreterMethod
. I described this mechanism a while ago in a doc. Last time I spoke with @janvorli, the plan was to use this mechanism from the start since we will need it as a fallback anyway on iOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used the TransitionBlock here for two reasons. One, as a possibly easy starting point for passing arguments until we implement the proper mechanism and second, as a mean to save and restore the callee saved registers. This is necessary as we need to be able to restore them during stack walking from interpreted to JITted / AOTed methods. Once we have the optimized mechanism, we can use a smaller structure than the TransitionBlock that would contain only the callee saved registers.
This change adds support for walking stack with interpreter frames present. The StackFrameIterator now considers the interpreter frames to be frameless frames like regular JITted / crossgened folder. This is a basis for GC, EH and debugger stack walking. The SOS `clrstack` and `clrstack -i` both display correct stack trace with interpreter frames on the stack with this change. I have added some extra handling of the case when the interpreter frames are on top of the stack so that the clrstack can dump them.
In case we were generating the caller context, the SP and ContextFlags were missing.
* Change the GET_CALLER_SP to assert and return NULL * Remove extra assert at the call to GET_CALLER_SP * Revert some changes removing "virtual" keyword that were not intentional
5d43a67
to
a257f58
Compare
} | ||
} | ||
else | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iteration here seems weird. I don't see why we would ever reach this path, given we always start for from the "list head". Why would we ever need to iterate backwards. Nvm, I noticed this is actually the top and we set it when leaving interp
MethodDesc *pMD = (MethodDesc*)(targetMethod & ~INTERP_METHOD_DESC_TAG); | ||
PCODE code = pMD->GetNativeCode(); | ||
if (!code) { | ||
pInterpreterFrame->SetTopInterpMethodContextFrame(pFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, this is just an optimization and not required for correctness. Is that correct? It may be worth mentioning it in a comment.
Does the setting need to be undone when the call returns?
It is not clear to me whether this optimization is worth it. It may be better to do it during stackwalk - so that repeated stackwalks from approximately the same spot avoid walking the whole list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would still work correctly without it, so it can be viewed as an optimization.
It doesn't need to be undone when the call returns:
- for the SOS scenario, we would seek for the top frame going in one way or the other through the linked list from the last value stored there. It doesn't matter if we start on a frame that's in the middle of the "substack" belonging to the current InterpExecMethod or if we start on an inactive frame that's somewhere by the end of the list. Based on the inactive vs active state of the frame (ip being 0 or non-zero), we know which way to seek for the last active frame.
- for other scenarios, we would maintain the exact value so that we don't waste time scanning the list.
Since this optimization is just one store, I think it is worth doing that instead of updating it during the stack walk. Imagine a case with a deep recursion within interpreted frames only. It would cause a lot of seeking the first time we do a stack walk which is completely unnecessary.
We will need to do other stuff while leaving the interpreter, like catching exceptions from the native code and calling the DispatchManagedException, so this is going to be just one of the things to do. I am planning to add a macro named "EXIT_INTERPRETER_BEGIN/END or something along those lines to contain that stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would cause a lot of seeking the first time we do a stack walk which is completely unnecessary.
This work is small and proportional to the total cost of stack walk (assuming we typically walk large part of the stack).
I am looking at it as "total number of instructions executed" in given scenario. I think updating the pointer in the stackwalker would be better on average in this metric.
This is a micro-optimization. It is fine to keep this as is. Comment would be nice.
We will need to do other stuff while leaving the interpreter, like catching exceptions from the native code and calling the DispatchManagedException, so this is going to be just one of the things to do.
Yep. I expect that the total cost of this other stuff will make us consider building the interpreter executor in C# eventually - but that is for future discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment
{ | ||
// We already have the caller's frame context | ||
// We just switch the pointers | ||
PT_CONTEXT temp = pRD->pCurrentContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't unwinding the process of walking up the stack? Why are we switching the pointers instead of just overwriting the current context with the caller context? I assume I'm missing something here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already have the caller context available, then you can just set the current context to it, there is no need to call unwind again to get the same values. The switching of pointer is to avoid copying in this specific case.
There are cases when we want to know the caller context without actually unwinding, so we call EnsureCallerContextIsValid. That's where the caller context gets created and then it can get reused here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ba-g the test failures in this PR are happening on all PRs. |
This change adds support for walking stack with interpreter frames present. The
StackFrameIterator
now considers the interpreter frames to be frameless frames like regular JITted / crossgened folder. This is a basis for GC, EH and debugger stack walking. The SOSclrstack
andclrstack -i
both display correct stack trace with interpreter frames on the stack with this change.It required changing the
GetCallerSP
andEnsureCallerContextIsValid
to become virtual methods so that the right variant for the interpreter or JITted / AOTed code can be invoked. Before, they were both static methods on theEECodeManager
.I have also added some extra handling of the case when the interpreter frames are on top of the stack so that the clrstack can dump them.