[Local GC] FEATURE_EVENT_TRACE 1/n: Tracking Event State #15873

swgillespie · 2018-01-16T03:00:42Z

This PR is the first of several PRs implementing this design bringing FEATURE_EVENT_TRACE to standalone GCs. This PR implements the portion of the design that keeps track of what events are enabled.

The approach taken in this PR is fundamentally the same as the one described in the design document, with some minor tweaks to GCEventState.

The GCEventState class described in the spec was simplified somewhat, based on some insights I had when experimenting with ETW. There is no need to draw any distinction between enabling
or disabling a provider, since the EtwCallback installed by the runtime receives the level and keyword state after applying the delta that a log enabler (e.g. logman) has created. For example, for the following sequence of events:

logman start trace1 -p {clr-provider-guid} 0x1 0x5 -ets
logman start trace2 -p {clr-provider-guid} 0x2 0x4 -ets
logman stop trace1 -ets
logman stop trace2 -ets

EtwCallback is invoked four times, with the following arguments:

EtwCallback(Level=5, Keyword=1, EVENT_CONTROL_CODE_ENABLE_PROVIDER)
EtwCallback(Level=5, Keyword=3, EVENT_CONTROL_CODE_ENABLE_PROVIDER)
EtwCallback(Level=4, Keyword=2, EVENT_CONTROL_CODE_ENABLE_PROVIDER)
EtwCallback(Level=0, Keyword=0, EVENT_CONTROL_CODE_DISABLE_PROVIDER)

We can pass the level and keyword information verbatim to the GC and no additional logic is
necessary; the ETW subsystem is already keeping track of which trace client has what level and keyword enabled so the GC doesn't need to do it. The GC doesn't even need to know if a provider
is being enabled or disabled since it can just take the information ETW gives it.

Instead of having separate Enable and Disable code paths on GCEventState, as written in the spec, this PR has a single Set entry point that sets the GC's level and keyword state for a provider to exactly what is given to Set as arguments, which in turn comes directly from ETW.

…ent state within the GC and plumbing to communicate event state changes

swgillespie · 2018-01-16T03:05:11Z

cc @brianrob or @nategraf do you mind taking a look at this to see if I need to do anything else to make this work with LTTNG (particularly around https://github.com/dotnet/coreclr/issues/14327) and EventPipe? I looked around the codebase for other callbacks like EtwCallback and I didn't see any others?

I've already tested this PR a bunch with ETW but I'll be kicking the tires with LTTNG and EventPipe tomorrow.

jkotas · 2018-01-16T03:40:15Z

src/gc/gceventstatus.h

+        assert(level >= GCEventLevel_None && level < GCEventLevel_Max);
+
+        size_t index = static_cast<size_t>(provider);
+        return (enabledLevels[index] >= level) && (enabledKeywords[index] & keyword);


The Volatile template inserts explicit memory barriers on non-Intel platforms.

I think these reads should be VolatileLoadWithoutBarrier. We do not need a strict synchronization here, but we do need this to be cheap.

jkotas · 2018-01-16T03:41:07Z

src/gc/gceventstatus.h

+    }
+
+private:
+    static void DebugDumpState(GCEventProvider provider)


Nit: Should this be under TRACE_GC_EVENT_STATE too?

yeah, sure - it's just dead code otherwise.

…ut debug-only code under TRACE_GC_EVENT_STATE

swgillespie · 2018-01-18T18:05:03Z

@brianrob Have you had a chance to take a look at this? I'm hoping to get this merged soon so follow-up PRs can land.

brianrob

Overall, looks good, but a few things that should be addressed.

brianrob · 2018-01-18T18:34:16Z

src/gc/gcee.cpp

+    GCEventStatus::Set(GCEventProvider_Default, keyword, level);
+}
+
+void GCHeap::ControlPrivateEvents(GCEventKeyword keyword, GCEventLevel level)


I realize that I missed the end of the review for the design so this probably came up then. Is there a reason to have two functions here for controlling events instead of just one that takes the provider info?

no particular reason other than not exposing GCEventProvider across the interface. I don't mind either way!

Ok, I'm fine either way as well. I doubt we'd add more providers - likely we'd reduce to just one (if that ever happens).

brianrob · 2018-01-18T18:43:14Z

src/gc/gcinterface.h

+
+enum GCEventKeyword
+{
+    GCEventKeyword_None                          =       0x0,


This information needs to be kept in sync with the manifest, which is currently the one source of truth for all events.

At a minimum, there should be a comment indicating this, but ideally, this information is pulled from the manifest and generated. Adding new events won't touch this, but if you want to modify the set of keywords you'll have to know to touch this.

good point - I'll call this out in a comment.

brianrob · 2018-01-18T18:49:33Z

src/vm/eventtrace.cpp

+
+                // The GC also needs to be informed of changes to keywords and levels.
+                IGCHeap *heap = GCHeapUtilities::GetGCHeap();
+                GCEventKeyword keywords = static_cast<GCEventKeyword>(MatchAnyKeyword);


The current implementation supports MatchAllKeywords as well, so you should probably support this as well. From MSDN:

This bitmask is optional. This mask further restricts the category of events that you want the provider to write. If the event's keyword meets the MatchAnyKeyword condition, the provider will write the event only if all of the bits in this mask exist in the event's keyword. This mask is not used if MatchAnyKeyword is zero. See Remarks.

Do we have any events with multiple keywords? (I'm not aware of any for the GC - not sure if there are others elsewhere.) If I'm reading the documentation correctly, it seems to me that MatchAllKeywords is only useful in that case.

Here's an example:

<event value="11" version="1" level="win:Informational" template="GCCreateConcurrentThread" keywords ="GCKeyword ThreadingKeyword" opcode="GCCreateConcurrentThread" task="GarbageCollection" symbol="GCCreateConcurrentThread_V1" message="$(string.RuntimePublisher.GCCreateConcurrentThread_V1EventMessage)"/>

I suspect it's not a huge deal. LTTng and EventPipe don't have this concept, but I don't want us to inadvertently miss it.

@vancem, do you know how big of a deal it is to not support MatchAllKeywords for Local GC?

If we do want to support MatchAllKeywords I think we'll need some additional design work. The GC would need to keep track of both the Any and All keyword masks in order to stay correct.

brianrob · 2018-01-18T18:57:41Z

src/vm/eventtrace.cpp

+                ETW::TypeSystemLog::OnKeywordsChanged();
+            }
+
+            if (ControlCode == EVENT_CONTROL_CODE_ENABLE_PROVIDER || ControlCode == EVENT_CONTROL_CODE_DISABLE_PROVIDER)


In order to support EventPipe, you will need this code to execute when the providers are enabled through EventPipe as well. Otherwise, the GC won't get the message that the events have been enabled. This can be done by adding a callback when the EventPipe provider is created, which occurs at

coreclr/src/scripts/genEventPipe.py

Line 141 in c1bbdae

" = EventPipe::CreateProvider(SL(" +

.

The callback that you've modified here is only used by ETW - not EventPipe. Ideally it would get used by both, but I can tell you that there are things in this callback that EventPipe doesn't know how to handle, and so rather than you getting to feel that pain, I think you should do the following:

Add a new callback right next to this one with the same function signature. You can call it something like EventPipeCallback.

Put your code in it.

Register the callback with EventPipe at the CreateProvider call in the Python script.

Call the new callback from within the existing callback that you've modified here so that you don't have to duplicate the code.

Then, as we find things here that we need to support in EventPipe, we can move them into the EventPipe callback which is a subset of the ETW callback. Eventually, we'll get to the point where the ETW callback just calls the EventPipe callback and they merge back into one.

will do - thanks!

vancem · 2018-01-18T23:56:41Z

@vancem, do you know how big of a deal it is to not support MatchAllKeywords for Local GC?

We should not support MatchAllKeywords. There are places in the pipeline where we don't support it already (because it frankly is not that useful).

swgillespie · 2018-01-18T23:57:10Z

I'm about to push another iteration addressing @brianrob 's feedback but I have had trouble today dealing with ETW sessions that exist before a .NET process is launched. Windows fires the ETW callback during EEStartup, well before the GC is initialized and ready to accept changes in the event state. As a result we miss the first callback and the GC isn't ever informed that events are enabled.

I'm thinking it should be possible to "stash" early ETW callbacks in GCHeapUtilities and, after the GC is initialized, pass the stashed event information to the GC as soon as it can.

brianrob · 2018-01-18T23:58:36Z

Thanks @vancem for the confirmation.

@swgillespie, sounds like a plan.

…fore the GC is initialized (e.g. on startup when an ETW session is already active)

swgillespie · 2018-01-22T02:57:05Z

My most recent commit addresses the session-on-startup problem by stashing ETW event level and keyword information on GCHeapUtilities if we receive an ETW callback before the GC is fully initialized.

I have one more problem: EventPipe invokes my ETW callback in a much different way than ETW does, which is what's causing the AVs in the Windows tests. There are two problems in particular: one, I need a way to determine which provider the callback is being fired for (i.e. which provider the level and keyword info apply to), and two I need to know which mechanism (ETW or EventPipe) invoked the callback so i can do potentially mechanism-specific things to get at the provider that changed. The code in eventtrace.cpp today reaches into ETW/MCGEN internals to figure out which provider is being enabled/disabled

coreclr/src/vm/eventtrace.cpp

Lines 4441 to 4443 in 59714b6

    
           PMCGEN_TRACE_CONTEXT context = (PMCGEN_TRACE_CONTEXT)CallbackContext; 
        
           BOOLEAN bIsPublicTraceHandle = (context->RegistrationHandle==Microsoft_Windows_DotNETRuntimeHandle);

.

EventPipe passes nullptr as the CallbackContext (which makes sense, because we gave nullptr as the CallbackContext when constructing all of our EventPipeProviders). I suppose I can use nullptr as a sentinel to figure out if I'm being invoked by EventPipe, but I still need some way to figure out which provider was enabled.

(Thinking aloud - I suppose that I could have a separate callback function for each provider that EventPipe knows about...)

…ll through a common handler

swgillespie · 2018-01-23T07:22:42Z

@dotnet-bot test this please

swgillespie · 2018-01-23T19:11:40Z

I think that I've addressed all problems and feedback so far in this PR - does anyone else have any comments or concerns or can I go ahead and merge this?

sergiy-k · 2018-01-24T02:45:55Z

LGTM. Thank you!

swgillespie · 2018-01-24T02:53:38Z

thanks for the reviews!

If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type. In particular we see this pattern for `ArrayPool<T>.Shared.Rent/Release`. Closes dotnet#15873.

If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet#15873.

If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with #20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes #15873.

If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet#15873.

If the jit decides it needs a return spill temp, and the return value has already been spilled to a single-def temp, re-use the existing for the return temp rather than creating a new one. In conjunction with dotnet/coreclr#20553 this allows late devirtualization for calls where the object in the virtual call is the result of an inline that provides a better type, and the objected formerly reached the call via one or more intermediate temps. Closes dotnet/coreclr#15873. Commit migrated from dotnet/coreclr@ccc18a6

[Local GC] FEATURE_EVENT_TRACE 1/n: Add infrastructure for keeping ev…

f534c80

…ent state within the GC and plumbing to communicate event state changes

swgillespie requested review from Maoni0, jkotas and sergiy-k January 16, 2018 03:01

jkotas reviewed Jan 16, 2018

View reviewed changes

Code review feedback: use a load without a barrier in IsEnabled and p…

90636e3

…ut debug-only code under TRACE_GC_EVENT_STATE

sergiy-k approved these changes Jan 16, 2018

View reviewed changes

Maoni0 approved these changes Jan 17, 2018

View reviewed changes

swgillespie mentioned this pull request Jan 17, 2018

[Local GC] [WIP] FEATURE_EVENT_TRACE 2/n: Scaffolding for emitting known events #15905

Closed

swgillespie added the area-GC label Jan 17, 2018

brianrob reviewed Jan 18, 2018

View reviewed changes

swgillespie added 3 commits January 18, 2018 16:09

Address code review feedback: add EventPipe callback and comments

1f6c620

Fix the non-FEATURE_PAL build

fe1f71f

Fix an issue where the GC fails to react to ETW callbacks to occur be…

8e8a773

…fore the GC is initialized (e.g. on startup when an ETW session is already active)

swgillespie mentioned this pull request Jan 22, 2018

[Local GC] FEATURE_EVENT_TRACE 2/n: Scaffolding for emitting known events #15957

Merged

swgillespie added 3 commits January 22, 2018 14:50

Simplify callback locking scheme

f1f3185

Add a separate callback for each EventPipe provider and funnel them a…

df7c609

…ll through a common handler

Fix non-FEATURE_PAL build

6f802f7

swgillespie merged commit facdc8b into dotnet:master Jan 24, 2018

AndyAyersMS mentioned this pull request Oct 26, 2018

JIT: streamline temp usage for returns #20640

Merged

[Local GC] FEATURE_EVENT_TRACE 1/n: Tracking Event State #15873

[Local GC] FEATURE_EVENT_TRACE 1/n: Tracking Event State #15873

Uh oh!

Conversation

swgillespie commented Jan 16, 2018

Uh oh!

swgillespie commented Jan 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swgillespie commented Jan 18, 2018

Uh oh!

brianrob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vancem commented Jan 18, 2018

Uh oh!

swgillespie commented Jan 18, 2018

Uh oh!

brianrob commented Jan 18, 2018

Uh oh!

swgillespie commented Jan 22, 2018

Uh oh!

swgillespie commented Jan 23, 2018

Uh oh!

swgillespie commented Jan 23, 2018

Uh oh!

sergiy-k commented Jan 24, 2018

Uh oh!

swgillespie commented Jan 24, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants