-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Local GC] FEATURE_EVENT_TRACE 1/n: Tracking Event State #15873
Conversation
…ent state within the GC and plumbing to communicate event state changes
cc @brianrob or @nategraf do you mind taking a look at this to see if I need to do anything else to make this work with LTTNG (particularly around https://github.com/dotnet/coreclr/issues/14327) and EventPipe? I looked around the codebase for other callbacks like I've already tested this PR a bunch with ETW but I'll be kicking the tires with LTTNG and EventPipe tomorrow. |
src/gc/gceventstatus.h
Outdated
assert(level >= GCEventLevel_None && level < GCEventLevel_Max); | ||
|
||
size_t index = static_cast<size_t>(provider); | ||
return (enabledLevels[index] >= level) && (enabledKeywords[index] & keyword); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Volatile
template inserts explicit memory barriers on non-Intel platforms.
I think these reads should be VolatileLoadWithoutBarrier
. We do not need a strict synchronization here, but we do need this to be cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
} | ||
|
||
private: | ||
static void DebugDumpState(GCEventProvider provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should this be under TRACE_GC_EVENT_STATE
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sure - it's just dead code otherwise.
…ut debug-only code under TRACE_GC_EVENT_STATE
@brianrob Have you had a chance to take a look at this? I'm hoping to get this merged soon so follow-up PRs can land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, but a few things that should be addressed.
GCEventStatus::Set(GCEventProvider_Default, keyword, level); | ||
} | ||
|
||
void GCHeap::ControlPrivateEvents(GCEventKeyword keyword, GCEventLevel level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that I missed the end of the review for the design so this probably came up then. Is there a reason to have two functions here for controlling events instead of just one that takes the provider info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason other than not exposing GCEventProvider
across the interface. I don't mind either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm fine either way as well. I doubt we'd add more providers - likely we'd reduce to just one (if that ever happens).
|
||
enum GCEventKeyword | ||
{ | ||
GCEventKeyword_None = 0x0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information needs to be kept in sync with the manifest, which is currently the one source of truth for all events.
At a minimum, there should be a comment indicating this, but ideally, this information is pulled from the manifest and generated. Adding new events won't touch this, but if you want to modify the set of keywords you'll have to know to touch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - I'll call this out in a comment.
src/vm/eventtrace.cpp
Outdated
|
||
// The GC also needs to be informed of changes to keywords and levels. | ||
IGCHeap *heap = GCHeapUtilities::GetGCHeap(); | ||
GCEventKeyword keywords = static_cast<GCEventKeyword>(MatchAnyKeyword); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation supports MatchAllKeywords as well, so you should probably support this as well. From MSDN:
This bitmask is optional. This mask further restricts the category of events that you want the provider to write. If the event's keyword meets the MatchAnyKeyword condition, the provider will write the event only if all of the bits in this mask exist in the event's keyword. This mask is not used if MatchAnyKeyword is zero. See Remarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any events with multiple keywords? (I'm not aware of any for the GC - not sure if there are others elsewhere.) If I'm reading the documentation correctly, it seems to me that MatchAllKeywords
is only useful in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example:
<event value="11" version="1" level="win:Informational" template="GCCreateConcurrentThread"
keywords ="GCKeyword ThreadingKeyword" opcode="GCCreateConcurrentThread"
task="GarbageCollection"
symbol="GCCreateConcurrentThread_V1" message="$(string.RuntimePublisher.GCCreateConcurrentThread_V1EventMessage)"/>
I suspect it's not a huge deal. LTTng and EventPipe don't have this concept, but I don't want us to inadvertently miss it.
@vancem, do you know how big of a deal it is to not support MatchAllKeywords for Local GC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want to support MatchAllKeywords I think we'll need some additional design work. The GC would need to keep track of both the Any and All keyword masks in order to stay correct.
src/vm/eventtrace.cpp
Outdated
ETW::TypeSystemLog::OnKeywordsChanged(); | ||
} | ||
|
||
if (ControlCode == EVENT_CONTROL_CODE_ENABLE_PROVIDER || ControlCode == EVENT_CONTROL_CODE_DISABLE_PROVIDER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support EventPipe, you will need this code to execute when the providers are enabled through EventPipe as well. Otherwise, the GC won't get the message that the events have been enabled. This can be done by adding a callback when the EventPipe provider is created, which occurs at
coreclr/src/scripts/genEventPipe.py
Line 141 in c1bbdae
" = EventPipe::CreateProvider(SL(" + |
The callback that you've modified here is only used by ETW - not EventPipe. Ideally it would get used by both, but I can tell you that there are things in this callback that EventPipe doesn't know how to handle, and so rather than you getting to feel that pain, I think you should do the following:
- Add a new callback right next to this one with the same function signature. You can call it something like EventPipeCallback.
- Put your code in it.
- Register the callback with EventPipe at the CreateProvider call in the Python script.
- Call the new callback from within the existing callback that you've modified here so that you don't have to duplicate the code.
Then, as we find things here that we need to support in EventPipe, we can move them into the EventPipe callback which is a subset of the ETW callback. Eventually, we'll get to the point where the ETW callback just calls the EventPipe callback and they merge back into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do - thanks!
We should not support MatchAllKeywords. There are places in the pipeline where we don't support it already (because it frankly is not that useful). |
I'm about to push another iteration addressing @brianrob 's feedback but I have had trouble today dealing with ETW sessions that exist before a .NET process is launched. Windows fires the ETW callback during EEStartup, well before the GC is initialized and ready to accept changes in the event state. As a result we miss the first callback and the GC isn't ever informed that events are enabled. I'm thinking it should be possible to "stash" early ETW callbacks in |
Thanks @vancem for the confirmation. @swgillespie, sounds like a plan. |
…fore the GC is initialized (e.g. on startup when an ETW session is already active)
My most recent commit addresses the session-on-startup problem by stashing ETW event level and keyword information on I have one more problem: EventPipe invokes my ETW callback in a much different way than ETW does, which is what's causing the AVs in the Windows tests. There are two problems in particular: one, I need a way to determine which provider the callback is being fired for (i.e. which provider the level and keyword info apply to), and two I need to know which mechanism (ETW or EventPipe) invoked the callback so i can do potentially mechanism-specific things to get at the provider that changed. The code in Lines 4441 to 4443 in 59714b6
EventPipe passes (Thinking aloud - I suppose that I could have a separate callback function for each provider that EventPipe knows about...) |
@dotnet-bot test this please |
I think that I've addressed all problems and feedback so far in this PR - does anyone else have any comments or concerns or can I go ahead and merge this? |
LGTM. Thank you! |
thanks for the reviews! |
If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type. In particular we see this pattern for `ArrayPool<T>.Shared.Rent/Release`. Closes dotnet#15873.
If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet#15873.
If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with #20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes #15873.
If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet#15873.
If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet/coreclr#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet/coreclr#15873. Commit migrated from dotnet/coreclr@ccc18a6
This PR is the first of several PRs implementing this design bringing FEATURE_EVENT_TRACE to standalone GCs. This PR implements the portion of the design that keeps track of what events are enabled.
The approach taken in this PR is fundamentally the same as the one described in the design document, with some minor tweaks to
GCEventState
.The
GCEventState
class described in the spec was simplified somewhat, based on some insights I had when experimenting with ETW. There is no need to draw any distinction between enablingor disabling a provider, since the
EtwCallback
installed by the runtime receives the level and keyword state after applying the delta that a log enabler (e.g. logman) has created. For example, for the following sequence of events:EtwCallback
is invoked four times, with the following arguments:We can pass the level and keyword information verbatim to the GC and no additional logic is
necessary; the ETW subsystem is already keeping track of which trace client has what level and keyword enabled so the GC doesn't need to do it. The GC doesn't even need to know if a provider
is being enabled or disabled since it can just take the information ETW gives it.
Instead of having separate
Enable
andDisable
code paths onGCEventState
, as written in the spec, this PR has a singleSet
entry point that sets the GC's level and keyword state for a provider to exactly what is given toSet
as arguments, which in turn comes directly from ETW.