-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Enable Portable ThreadPool events to be written using NativeRuntimeEventSource #47829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Portable ThreadPool events to be written using NativeRuntimeEventSource #47829
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome to see this. A few things that come to mind.
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Show resolved
Hide resolved
...aries/System.Private.CoreLib/src/System/Diagnostics/Tracing/TraceLogging/XplatEventLogger.cs
Show resolved
Hide resolved
uint RetiredWorkerThreadCount = 0, | ||
ushort ClrInstanceID = DefaultClrInstanceId) | ||
{ | ||
if (IsEnabled(EventLevel.Informational, Keywords.ThreadingKeyword)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it, but what ensures that IsEnabled
returns true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't think I quite got what you're asking here.. Do you mean what ensures that the DoCommand callback is received for the NativeRuntimeEventSource?
NativeRuntimeEventSource registers itself with the native runtime upon initialization, and it will receive callback from the runtime from the standard callback handler, just like how any other managed EventSources receive callbacks when being enabled.
Essentially the runtime "thinks" there are two providers -- the native provider whose callback is the EtwCallbackCommon function and the managed provider whose callback is the managed callback function in EventSource.
public static void ThreadPoolWorkerThreadStart(uint ActiveWorkerThreadCount, uint RetiredWorkerThreadCount, ushort ClrInstanceID) | ||
{ | ||
LogThreadPoolWorkerThreadStart(ActiveWorkerThreadCount, RetiredWorkerThreadCount, ClrInstanceID); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of indirection is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indirection is mainly because the QCall methods that call into the runtime need to be private, and the public wrappers are for the ThreadPool to be able to call them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Here is an example of a QCall with the internal accesibility modifier:
Line 24 in 69e114c
internal static extern IntPtr LoadFromPath(string libraryName, bool throwOnError); |
I wouldn't have expected QCall to care what accesibility modifier was assigned. If it does need to be non-public internal should still be sufficient to let the intra-assembly calls work without needing the wrapper method.
|
||
#if FEATURE_EVENTSOURCE_XPLAT | ||
XplatEventLogger.ThreadPoolWorkerThreadAdjustmentAdjustment(AverageThroughput, NewWorkerThreadCount, Reason, ClrInstanceID); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some education... why is this needed, rather than for example registering a listener that routes events to LTTng?
Also, moving forward is LTTng a path we're still excited about / supporting? I assumed with event pipe and the rest of the dotnet tooling around tracing cross-platform that we'd be deprecating this support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do still want to support LTTng, as we're seeing some amount of uptake across other Microsoft products, and we want to be able to participate in this. That said, this code is essentially making it look like the thread pool is still implemented in native code. So rather than using an EventSource in managed code, this is invoking the native FireETW* functions, which result in calls to EventPipe, ETW, and/or LTTng. So, this code is actually platform agnostic, and takes advantage of the platform-specific logger code in the runtime itself.
That said, thinking about this a bit further, I am wondering if there's something that we can't do here in a more generic direction. Rather than having to create a new method and QCALL for every event, can we just use the NativeRuntimeEventSource
as is, and modify EventPipe
to be OK with multiple provider registrations (at least for the runtime provider)? I could believe that over time there will be more components that move to managed code, and we'll want to keep logging the same events. ETW will already allow multiple provider registrations. LTTng might need something special - possibly some pre-generated code that would translate between the EventSource
and the LTTng tracepoints. But, I'd be tempted to say that we shouldn't do something special for it, and just treat the events like any other EventSource
events with respect to LTTng.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is invoking the native FireETW* functions, which result in calls to EventPipe, ETW, and/or LTTng
I don't think that is quite what this code is doing? The code is invoking XplatEventLogger.ThreadPoolWorkerThreadAdjustmentAdjustment() which then invokes FireEtXPlat* functions. The handling for EventPipe, ETW, EventListener, and LTTng (a 2nd time?) is coming the from the WriteEventCore() call.
I am suspicious that we are double emitting events on both the EventListener and the Lttng paths. For EventListener there is the standard WriteToAllListeners() call here and then the special case handling for NativeRuntimeEventSource where ProcessEvent calls WriteToAllListeners() here. For Lttng we've got the standard EventListener that forwards to Lttng here and then the call to XplatEventLogger native code would log a 2nd time. @sywhang could you check if my suspicions are correct or let me know what I missed that causes us to avoid the duplicate logs?
But, I'd be tempted to say that we shouldn't do something special for it, and just treat the events like any other EventSource events with respect to LTTng.
Our native code LTTng output doesn't match the generic EventSource LTTng output as I understand it. The native code emitted events have dedicated ids and strongly typed event fields whereas the EventSource generated events all share a single event id and use JSON serialized event payloads. Were you suggesting that we would ignore this difference and allow the native event ids and payloads to change as a side-effect of porting code from native to managed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - this is calling FireEtXPlat
which is beyond the decision of which logger to call - this will call LTTng on Linux directly. You also bring up a good point that we may be double emitting.
Yes, I was proposing that we not worry about the difference in how the events are emitted when they are moved to managed code. If we wanted to be more aggressive about maintaining perfect compatibility, we could also choose to build auto-generated code that would allow us to call any of the events and have them emitted as a strongly typed LTTng event. In the end, I think that the thing I want to encourage is that it would be nice if we didn't do special casing for these events here. I suspect that we'll end up doing more of this type of thing, and it would be great if calling into NativeRuntimeEventSource
to emit an event "just worked".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @noahfalk and @brianrob. I went and checked and indeed this causes it to double-emit events for EventListener. For LTTng it ends up being emitted as an "EventSourceEvent" event as well as the individual threadpool event. I'll fix this to use FireETW* events instead like @noahfalk suggested.
Yes, I was proposing that we not worry about the difference in how the events are emitted when they are moved to managed code.
I think this wouldn't really be a great experience for devs trying to parse an LTTng trace for a specific type of event by filtering via the name, so that was the main reason I added all these glue to make it seem indifferent from when we had the native threadpool.
But I also agree with your sentiment towards having some kind of auto-generated code that'll allow devs to add events to ClrEtwAll.man and call it from managed side, and the whole thing "just works". I can try to tackle that later by modifying the scripts to autogenerate the QCall helpers and forward to corresponding LTTng sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this! A few questions and comments inline
|
||
#if FEATURE_EVENTSOURCE_XPLAT | ||
XplatEventLogger.ThreadPoolWorkerThreadAdjustmentAdjustment(AverageThroughput, NewWorkerThreadCount, Reason, ClrInstanceID); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is invoking the native FireETW* functions, which result in calls to EventPipe, ETW, and/or LTTng
I don't think that is quite what this code is doing? The code is invoking XplatEventLogger.ThreadPoolWorkerThreadAdjustmentAdjustment() which then invokes FireEtXPlat* functions. The handling for EventPipe, ETW, EventListener, and LTTng (a 2nd time?) is coming the from the WriteEventCore() call.
I am suspicious that we are double emitting events on both the EventListener and the Lttng paths. For EventListener there is the standard WriteToAllListeners() call here and then the special case handling for NativeRuntimeEventSource where ProcessEvent calls WriteToAllListeners() here. For Lttng we've got the standard EventListener that forwards to Lttng here and then the call to XplatEventLogger native code would log a 2nd time. @sywhang could you check if my suspicions are correct or let me know what I missed that causes us to avoid the duplicate logs?
But, I'd be tempted to say that we shouldn't do something special for it, and just treat the events like any other EventSource events with respect to LTTng.
Our native code LTTng output doesn't match the generic EventSource LTTng output as I understand it. The native code emitted events have dedicated ids and strongly typed event fields whereas the EventSource generated events all share a single event id and use JSON serialized event payloads. Were you suggesting that we would ignore this difference and allow the native event ids and payloads to change as a side-effect of porting code from native to managed?
...s/System.Private.CoreLib/src/System/Threading/NativeRuntimeEventSource.PortableThreadPool.cs
Show resolved
Hide resolved
...s/System.Private.CoreLib/src/System/Threading/NativeRuntimeEventSource.PortableThreadPool.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Show resolved
Hide resolved
public static void ThreadPoolWorkerThreadStart(uint ActiveWorkerThreadCount, uint RetiredWorkerThreadCount, ushort ClrInstanceID) | ||
{ | ||
LogThreadPoolWorkerThreadStart(ActiveWorkerThreadCount, RetiredWorkerThreadCount, ClrInstanceID); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Here is an example of a QCall with the internal accesibility modifier:
Line 24 in 69e114c
internal static extern IntPtr LoadFromPath(string libraryName, bool throwOnError); |
I wouldn't have expected QCall to care what accesibility modifier was assigned. If it does need to be non-public internal should still be sufficient to let the intra-assembly calls work without needing the wrapper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its pretty close, I suggested a few more tweaks
This removes the temporary solution that the portable ThreadPool was using to emit events from managed side. Specifically, the following changes were made:
Fix #45384
cc @brianrob @josalem @kouvel @noahfalk @tommcdon