Skip to content

Remove KestrelEventSource events with broken ActivityId tracking #26684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 8, 2020

Conversation

halter73
Copy link
Member

@halter73 halter73 commented Oct 8, 2020

@MihaZupan reported that many Kestrel Stop events did not have an ActivityId. TplEventSource's automatic ActivityId tracking did not work for the following 3 Kestrel Start/Stop event pairs because the ExecutionContext did not flow properly between Start and Stop:

  1. ConnectionQueuedStart/ConnectionQueuedStop
  2. RequestQueuedStart/RequestQueuedStop
  3. RequestUpgradedStart/RequestUpgradedStop

These 3 event pairs (6 events total) are new in .NET 5 and are the only events being removed by this PR.

The reason the ExecutionContext did not flow properly between the "Queued" Start and Stop events is because of Kestrel's use of ThreadPool.UnsafeQueueUserWorkItem instead of ThreadPool.QueueUserWorkItem which would capture and restore the ExecutionContext.

The "RequestUpgraded" Start and Stop events are broken because the Start event is written inside a call to UpgradeAsync() made by middleware and the Stop event is written after awaiting middleware completion.

We don't want to switch to ThreadPool.QueueUserWorkItem because there's no QueueUserWorkItem overload that takes an IThreadPoolWorkItem meaning that QueueUserWorkItem needs to allocate an IThreadPoolWorkItem itself.

These issues can all be fixed by manually capturing and restoring the ExecutionContext and I've started experimenting with that here, but that's too big of a change to make this late in .NET 5 in my opinion.

Addresses #26560

@halter73 halter73 added bug This issue describes a behavior which is not expected - a bug. area-servers feature-kestrel labels Oct 8, 2020
@stephentoub
Copy link
Member

We don't want to switch to ThreadPool.QueueUserWorkItem because there's no QueueUserWorkItem overload that takes an IThreadPoolWorkItem meaning that QueueUserWorkItem needs to allocate an IThreadPoolWorkItem itself.

FWIW, a ThreadPool.QueueUserWorkItem(IThreadPoolWorkItem) would almost certainly do the exact same thing, allocating a new IThreadPoolWorkItem that wraps the provided one and also captures the context / uses it when executing the provided work item.

@halter73
Copy link
Member Author

halter73 commented Oct 8, 2020

Description

This removes Kestrel EventSource events that tracked the time connections and requests spend in the ThreadPool queue. This also removes an event that tracked the time a request is "upgraded" (presumably for WebSockets). These events are:

  1. ConnectionQueuedStart/ConnectionQueuedStop
  2. RequestQueuedStart/RequestQueuedStop
  3. RequestUpgradedStart/RequestUpgradedStop

All of these events broke TplEventSource's automatic ActivityId tracking when TasksFlowActivityIds is enabled. This in turn breaks the automatic grouping of Start and Stop events using ActivityId like PerfView has (which enables TasksFlowActivityIds by default when attached).

Furthermore, the events that tracked the time connections and requests spend in the ThreadPool queue were sibling activities rather than child activities to the connection and request lifetime activities meaning even if that ActivityId tracking worked, these events couldn't be correlated with ActivityId's. This just seems like an oversight.

Customer Impact

If customers use ActivityId's to correlate EventSource events, leaving these events will result in a bunch of uncorrelated Start and Stop events and likely make a mess of something like PerfView's "StartStopTree" view.

Regression?

I would say yes, because these are the first Start and Stop EventSource event pairs in Kestrel that break ActivityId tracking that I know of.

Risk

Low. This PR simply removes some calls to KestrelEventSource.WriteEvent() that didn't exist in 3.1.

@halter73 halter73 added the Servicing-consider Shiproom approval is required for the issue label Oct 8, 2020
@ghost
Copy link

ghost commented Oct 8, 2020

Hello human! Please make sure you've included the Shiproom Template in a comment or (preferably) the PR description. Also, make sure this PR is not marked as a draft and is ready-to-merge.

@davidfowl
Copy link
Member

The commenting out does look gross, I'm OK deleting it 😄

@halter73 halter73 force-pushed the halter73/26560-remove-events branch from 49f92cd to 2c77393 Compare October 8, 2020 02:15
@halter73
Copy link
Member Author

halter73 commented Oct 8, 2020

Deleted.

@leecow leecow added Servicing-approved Shiproom has approved the issue and removed Servicing-consider Shiproom approval is required for the issue labels Oct 8, 2020
@leecow leecow added this to the 5.0.0 milestone Oct 8, 2020
@halter73
Copy link
Member Author

halter73 commented Oct 8, 2020

@Pilchie I think this is good to merge.

@Pilchie Pilchie merged commit 053052d into release/5.0 Oct 8, 2020
@Pilchie Pilchie deleted the halter73/26560-remove-events branch October 8, 2020 18:47
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions bug This issue describes a behavior which is not expected - a bug. feature-kestrel Servicing-approved Shiproom has approved the issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants