-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Apx requirements for VM and GC stubs #116806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @mangod9 |
src/coreclr/inc/regdisp.h
Outdated
PDWORD64 R29; | ||
PDWORD64 R30; | ||
PDWORD64 R31; | ||
//X18 is reserved by OS, in userspace it represents TEB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//X18 is reserved by OS, in userspace it represents TEB |
DWORD64 R29; | ||
DWORD64 R30; | ||
DWORD64 R31; | ||
struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnet/dotnet-diag I believe CONTEXT
structure is used by the debugger interfaces. Is this modification going to break them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot say for sure at the moment. It depends on the decision if we want the debugger to track these new EGPRs. Once the context structure gets finalized is when we can say with surety about what might get impacted. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, we have avoided making any kind of debugger changes since they would need windows OS to have XSTATE_APX
support for extended CONTEXT. Once we have complete EGPR support, we can extend debugger support as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry missed this comment earlier when @jkotas first pinged. Generally the managed debugger needs to be aware of all registers because at various points it saves them and later restores them. If the debugger isn't aware of all the registers it needs to save then we can easily end up in situations where an app works without the debugger, but trying to step through the code in the debugger behaves unpredictably. For example here is a recent debugger bug where x86 floating point register capture/restore got inadvertently broken: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2428652
If we wanted to make this work available prior to having the debugger portions implemented and tested I think it would need to be explicitly an opt-in feature with warnings that debugging is unsupported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @noahfalk
So what needs to be done in order to make this an opt in feature? Do I need to open a new issue to make this a opt in feature only with warnings or is that handled by the debugger team after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common way we've done it in the past is to define an environment variable and then only use the new registers if the env var is set. Later on once it is fully supported we can switch the default value so the env var becomes an opt-out rather than opt-in.
Here is an example for AVX registers: https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/clrconfigvalues.h#L673
I'd prefer if the opt-in mechanism was included in this PR so that we don't create any window of builds where debugging is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noahfalk, The opt-in for APX already exists on L681 (DOTNET_EnableAPX
) and it is already defaulted to off (same with AVX10v2) since the hardware isn't available yet.
We automatically add a config switch per ISA (or ISA group in some cases) when the detection logic is added to the VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tannergooding! Glad to see its already in place. I'd also ask that anywhere we advertise the env var should make it clear managed debugging isn't supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍. We notably do not document these switches and rarely advertise them. They are considered advanced and primarily for people doing local testing, so they can validate downlevel hardware.
There will likely be a blog post that lets people know about the switch when hardware does become available (so not until after .NET 10 ships) and we can ensure any nuances, such as the GC or debugger experience not working end to end, are called out there.
c4c3fd4
to
be67cad
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@@ -59,6 +59,11 @@ void ClearRegDisplayArgumentAndScratchRegisters(REGDISPLAY * pRD) | |||
pContextPointers->R9 = NULL; | |||
pContextPointers->R10 = NULL; | |||
pContextPointers->R11 = NULL; | |||
|
|||
#if defined(TARGET_UNIX) | |||
for (int i=0; i < 16; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question. Can/should we limit this to only scenarios with APX enabled?
Same question applies to all the other loops/handling for the extended registers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notably this can also just be a memset
call and avoid the loop so its easier for the compiler to optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question. Can/should we limit this to only scenarios with APX enabled?
Same question applies to all the other loops/handling for the extended registers here.
Same question as here https://github.com/dotnet/runtime/pull/116806/files#r2208598605
Notably this can also just be a
memset
call and avoid the loop so its easier for the compiler to optimize.
Yes this can be a memset. But then it would differ from how things are done under ARM. Current coding convention only extends what was already present.
#if defined(TARGET_UNIX) | ||
_ASSERTE(regNum >= 0 && regNum <= 32); | ||
#else // TARGET_UNIX | ||
_ASSERTE(regNum >= 0 && regNum <= 16); | ||
#endif // TARGET_UNIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like most of this handling isn't Unix specific. It's APX enabled vs disabled with Windows just not having it enabled yet.
Similar comment to other new #if TARGET_UNIX
areas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Once windows OS gets support for APX XSTATE registers, we can get rid of UNIX
checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding some places access the R16
inside the CONTEXT
. can you guide me or point me to a similar place as to how those CONTEXT
are handled by windows for extended context? I wanted to reomve the #ifdefs LINUX
but since I dont have access to the extended CONTEXT for windows, the compilation fail on windows right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As elaborated in a few other places, the "proper" way to deal with extended state is to query for the offset and size via LocateXStateFeature
. This will tell you where the APX
region begins and therefore where you can start accessing the relevant data.
This can be unique per CONTEXT
given that a given instance may be using things like XSAVEOPT
, XSAVEC
, or even not saving a particular bit of extended state depending on scenario. The set of xstate features enabled for a given instance is queried using GetXStateFeaturesMask
.
[MethodImpl(MethodImplOptions.NoInlining)] | ||
static void StressRegisters() | ||
{ | ||
// 32 reference variables to force JIT to use all GPRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the JIT actually enregistering them all? I don't think simply having 32 objects is enough to guarantee that, particularly if calls and so some are going to require spilling anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I would look into the test one more time. We can force APX EPRs using JitStressRegs=4000
with what the testing was done but I added this test thinking JIT would enregister all registers. is there another way of using all the registers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any way to guarantee it and the stress modes are primarily there to just help default to different registers to ensure they get tested as well.
The question was more because the comment is claiming to do x
, but I don't think we can actually guarantee that happens. We could instead comment that we're using enough and at the time the test was written it was generating a particular bit of assembly, which is likely good enough to show that they're all being used.
We should have a test covering P/Invokes and showing what the codegen difference is for those scenarios. If we're actually allowing the GC to use the extended registers, I would expect we are having to spill all 16 additional registers as part of the transition. -- We can, however, keep costs lower by spilling/loading via push2/pop2 and so the amount of instructions/latency should remain nearly the same. @jkotas are there any other special cases we likely want to check as part of having an extended set of registers available to the GC? |
My top concern with these changes is impact on debugger/diagnostic. I have commented on this above #116806 (comment) . I would like to see @dotnet/dotnet-diag signoff and validate that this is not breaking debugger/diagnostic.
I expect that many of these changes will need to be redone to address these concerns. |
The additional registers are volatile registers in the calling convention. Is that correct? It means they do not need saved as part of GC transition. |
The debugger context structure does not take into consideration the
I was under the impression that we can get the state of the extended CONTEXT using OS APIs and load the EGPRs in debugger context. @tannergooding please correct me if I am wrong.
Yes. Right now, cross compile support is not available since that would lead to using windows provided context in @jkotas the current changes are aimed at having GC support only for |
…ContextPointers for windows
src/coreclr/vm/codeman.h
Outdated
return m_CPUCompileFlags.IsSet(InstructionSet_APX); | ||
} | ||
#endif // TARGET_AMD64 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we want @khushal1996 is something like
inline bool IsAPXSupported()
{
#if defined(HOST_WINDOWS)
return false;
#elif defined(HOST_UNIX)
return m_CPUCompileFlags.IsSet(InstructionSet_APX);
#endif
}
and now instead of having the #ifdef
for HOST_UNIX, we are simply checking IsAPXSupported()
. When windows APIs become available to use, we will enabled that but the rest of the code should remain unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anthonycanino I would actually expect we just do return m_CPUCompileFlags.IsSet(InstructionSet_APX);
always.
I would rather expect that the minipal_getcpufeatures
doesn't set InstructionSet_APX
for Windows, given that the OS enablement query should be returning false
today (as xgetbv
should not have it set and the IsApxEnabled()
query should end up reporting false
on Windows, since there is no XSTATE_MASK_APX
through which we can check that GetEnabledXStateFeatures()
reports support).
@jkotas, the modified struct is Unix specific, we directly use the struct and appropriate APIs from the SDK on Windows. The PAL layer "should" be handling the mapping of this for the debug layer, it wouldn't work at all otherwise.
However, this is just the "base size" and the actual size is dependent on the On Unix, we're "mirroring" the layout in the Realistically (assuming we need to keep emulating I don't believe the debugger ever directly tries to read the |
Right, but those are part of the base `CONTEXT` struct as defined by `winnt.h`:typedef struct DECLSPEC_ALIGN(16) DECLSPEC_NOINITALL _CONTEXT {
//
// Register parameter home addresses.
//
// N.B. These fields are for convience - they could be used to extend the
// context record in the future.
//
DWORD64 P1Home;
DWORD64 P2Home;
DWORD64 P3Home;
DWORD64 P4Home;
DWORD64 P5Home;
DWORD64 P6Home;
//
// Control flags.
//
DWORD ContextFlags;
DWORD MxCsr;
//
// Segment Registers and processor flags.
//
WORD SegCs;
WORD SegDs;
WORD SegEs;
WORD SegFs;
WORD SegGs;
WORD SegSs;
DWORD EFlags;
//
// Debug registers
//
DWORD64 Dr0;
DWORD64 Dr1;
DWORD64 Dr2;
DWORD64 Dr3;
DWORD64 Dr6;
DWORD64 Dr7;
//
// Integer registers.
//
DWORD64 Rax;
DWORD64 Rcx;
DWORD64 Rdx;
DWORD64 Rbx;
DWORD64 Rsp;
DWORD64 Rbp;
DWORD64 Rsi;
DWORD64 Rdi;
DWORD64 R8;
DWORD64 R9;
DWORD64 R10;
DWORD64 R11;
DWORD64 R12;
DWORD64 R13;
DWORD64 R14;
DWORD64 R15;
//
// Program counter.
//
DWORD64 Rip;
//
// Floating point state.
//
union {
XMM_SAVE_AREA32 FltSave;
struct {
M128A Header[2];
M128A Legacy[8];
M128A Xmm0;
M128A Xmm1;
M128A Xmm2;
M128A Xmm3;
M128A Xmm4;
M128A Xmm5;
M128A Xmm6;
M128A Xmm7;
M128A Xmm8;
M128A Xmm9;
M128A Xmm10;
M128A Xmm11;
M128A Xmm12;
M128A Xmm13;
M128A Xmm14;
M128A Xmm15;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
//
// Vector registers.
//
M128A VectorRegister[26];
DWORD64 VectorControl;
//
// Special debug control registers.
//
DWORD64 DebugControl;
DWORD64 LastBranchToRip;
DWORD64 LastBranchFromRip;
DWORD64 LastExceptionToRip;
DWORD64 LastExceptionFromRip;
} CONTEXT, *PCONTEXT;
I believe the "correct" thing, particularly on Windows, is to get the base offset of the APX extended state by querying I don't think the GC has had to care about XSTATE up until this point since it's really only been interested in general-purpose registers, which were all part of the baseline It looks like the The debugger team likely needs to comment on how they expect cross debugging to work given the context and that there doesn't appear to be a way to query the extended XSTATE features in a cross platform way. If there were, which likely just requires them to have a way to call |
Do you think its worth to mimic the |
I would defer to @jkotas and the debugger team. I expect that for the APX feature to work on Windows, we will have to use As iterated above, I believe the GC hasn't had to deal with this yet simply because it's only been concerned about looking at general purpose registers and those have always been part of the core state. APX is changing that to have some general-purpose registers as part of the XSTATE. The debugger likewise has an 11 year old issue of not being able to observe or surface extended state. As such, you cannot observe YMM, ZMM, or KMASK state in the register window and a few other scenarios. Locals are still observable because they get spilled to the stack and the debugger just reads the memory for each backing field (there are some quirks with Them not looking at extended state hasn't been a critical priority up to this point because they still pass The GC starting to have values stored in the XSTATE area is really the "forcing factor" here, since it will have to start observing these registers for correctness reasons. The debugger would be great to fix, but I don't think the register window not working is "blocking" (it hasn't been blocking for any of the other areas we've touched). |
I've got this in my todo list but I am heads down at the moment making sure one of our new diagnostics features is working properly for the Preview7 snap coming imminently. Hopefully next week things will be a little calmer. |
src/coreclr/vm/gcinfodecoder.cpp
Outdated
ULONGLONG **ppRax = &pRD->pCurrentContextPointers->Rax; | ||
#endif | ||
if(ExecutionManager::GetEEJitManager()->IsAPXSupported() && regNum >= 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When APX is not supported, we should never see regNum >= 16
here. It is sufficient and more efficient to just check regNum >= 16
.
IsAPXSupported
can be assert.
Also, GCInfoDecoder is used in number of cross-process diagnostic/debugger scenarios (see https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/gcinfodecoder.h#L13). If this code was dependent on checking IsAPXSupported, you would have to figure out how to get this value for the target.
Terminology nit: I would call this "GC stack root enumeration" to avoid confusion. Unqualified "GC" typically means the code under src\coreclr\gc that I do not expect to affect by any of these changes. |
I think these changes should wait for .NET 11. We have started stabilizing .NET 10 (main is going to switch to .NET 11 early august). We do not want to be merging new potentially destabilizing features for next few weeks. Is there a reason why this change needs to be in .NET 10? |
It is probably the easiest option. It matches current CoreCLR PAL architecture that tries to emulate Windows. An alternative is to create our own context abstraction that is optimized for the given job. |
…lso fix small bugs and remove ifdefs for linux wherever possible.
// Stores the ISA capability of the hardware | ||
int cpuFeatures = 0; | ||
#if defined(TARGET_AMD64) | ||
inline bool IsAPXSupported() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. We should introduce this in the GC only if the GC ever needs conditional paths for Apx - that is unlikely.
We've been developing APX support for .NET 10 so as to enable it as an opt in preview feature. If changing the PAL is too much, it would be helpful to allow some kind solution that would only be enabled if the APX flags are explicitly turned on. |
I've been looking into this more. The current way we are doing our CONTEXT handling appears a bit fragile and challenging to maintain as we want to add support for more registers. A couple thoughts:
I'd recommend rather than allowing them to remain out-of-sync we should restore PAL CONTEXT to its fixed pre-March 2023 definition and define new PAL APIs like LocateXStateFeature to work with the additional registers.
None of this has to happen as part of this PR, but we would need to address it at some point as part of the path to making the debugger functional with these registers in use. There would almost certainly be other elements of the work as well. I agree with you @tannergooding that using new registers as general purpose registers indeed raises the bar on what portions of the debugger functionality need to handle them correctly. When folks are ready I'm happy to discuss that in more detail but its probably out-of-scope for this PR. |
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
Just an FYI. We implemented a work around to prevent GC references from using the APX extended GPRs here #117991 (comment). in order to allow for the APX features in .NET 10 without requiring these context changes. We plan pick this item up for .NET 11. |
Addresses Issue #112587
Current PR focusses on tracking the new APX EGPRs (R16 - R31) during GC tracking. It follows the discussion in issue #112587. APX registers are volatile and hence tracked in a separate
volatileContextPointers
likeARM64
.Current work focusses on enabling GC tracking for EGPRs in
TARGET_UNIX
There are some things I need help with -
asmconstants.h
https://github.com/khushal1996/runtime/blob/ecdfb14194fce9462a612733c579fc6e55ffabe7/src/coreclr/vm/amd64/asmconstants.h#L344.FaultingExceptionFrame::UpdateRegDisplay_Impl
https://github.com/khushal1996/runtime/blob/ecdfb14194fce9462a612733c579fc6e55ffabe7/src/coreclr/vm/amd64/cgenamd64.cpp#L144 ? This would lead to updating thedbgtargetcontext.h
https://github.com/khushal1996/runtime/blob/ecdfb14194fce9462a612733c579fc6e55ffabe7/src/coreclr/debug/inc/dbgtargetcontext.h#L215HijackFrame::UpdateRegDisplay_Impl
https://github.com/khushal1996/runtime/blob/ecdfb14194fce9462a612733c579fc6e55ffabe7/src/coreclr/vm/amd64/cgenamd64.cpp#L229 ? Not sure which registers need to updated here (not all volatile registers are updated here)TESTING
SDE test run with APX off

SDE test run with APX ON CCMP ON
