Skip to content

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Sep 17, 2025

Blocked on: Microsoft.OneCollect.RecordTrace version with FFI support

Implements dotnet/docs#47894

Following the addition of emitting native runtime and custom EventSource events as user_events through dotnet/runtime#115265 and the public release of https://github.com/microsoft/one-collect which supports collecting both .NET user_events and Linux perf events into a single .nettrace file, dotnet-trace will support a new verb, collect-linux, that wraps around record-trace.

This PR does the following:

  • Adds collect-linux verb and serializes a subset of dotnet-trace collect options in addition to a collect-linux specific --perf-events option into record-trace args. (see [Diagnostics][dotnet-trace] Add collect-linux verb docs#47894 for overarching details)
  • Adds record-trace dynamic library to dotnet-trace
  • Updates existing profiles (cpu-sampling -> dotnet-common + dotnet-sampled-thread-time) and adds collect-linux specific profiles
  • Updates list-profiles verb with revamped profiles + multiline description formatting
  • Refactors EventPipeProvider composition logic (MergeProfileAndProviders + ToProviders -> ComputeProviderConfig) and rename Extensions.cs -> ProviderUtils.cs
  • Revamp EventPipeProvider composition tests (ProviderParsing.cs -> ProviderCompositionTests.cs)
  • Various cleanup: Update CLREventKeywords + Update logging + refactor collect logic + expand dotnet-trace common options

Testing

dotnet-trace collect-linux

On Linux

collect-linux
collect-linux --help
$ ./dotnet-trace collect-linux -h
Description:
  Collects diagnostic traces using perf_events, a Linux OS technology. collect-linux requires admin privileges to capture kernel- and user-mode events, and by default, captures events from all processes. This Linux-only command includes the same .NET
  events as dotnet-trace collect, and it uses the kernel’s user_events mechanism to emit .NET events as perf events, enabling unification of user-space .NET events with kernel-space system events.

Usage:
  dotnet-trace collect-linux [options]

Options:
  --providers       A comma delimited list of EventPipe providers to be enabled. This is in the form 'Provider[,Provider]',where Provider is in the form: 'KnownProviderName[:[Flags][:[Level][:[KeyValueArgs]]]]', and KeyValueArgs is in the form:
                    '[key1=value1][;key2=value2]'.  Values in KeyValueArgs that contain ';' or '=' characters need to be surrounded by '"', e.g., FilterAndPayloadSpecs="MyProvider/MyEvent:-Prop1=Prop1;Prop2=Prop2.A.B;".  Depending on your shell, you may
                    need to escape the '"' characters and/or surround the entire provider specification in quotes, e.g., --providers 'KnownProviderName:0x1:1:FilterSpec=\"KnownProviderName/EventName:-Prop1=Prop1;Prop2=Prop2.A.B;\"'. These providers are in
                    addition to any providers implied by the --profile argument. If there is any discrepancy for a particular provider, the configuration here takes precedence over the implicit configuration from the profile.  See documentation for
                    examples.
  --clreventlevel   Verbosity of CLR events to be emitted.
  --clrevents       List of CLR runtime events to emit.
  --perf-events     Comma-separated list of perf events (e.g. syscalls:sys_enter_execve,sched:sched_switch).
  --profile         A named, pre-defined set of provider configurations for common tracing scenarios. You can specify multiple profiles as a comma-separated list. When multiple profiles are specified, the providers and settings are combined (union), and
                    duplicates are ignored.
  -o, --output      The output path for the collected trace data. If not specified it defaults to '<appname>_<yyyyMMdd>_<HHmmss>.nettrace', e.g., 'myapp_20210315_111514.nettrace'. [default: default]
  --duration        When specified, will trace for the given timespan and then automatically stop the trace. Provided in the form of dd:hh:mm:ss.
  -n, --name        The name of the process to collect the trace.
  -p, --process-id  The process id to collect the trace.
  -?, -h, --help    Show help and usage information
`collect-linux` without elevated privileges
$ ./dotnet-trace collect-linux
No profile or providers specified, defaulting to trace profiles 'dotnet-common' + 'cpu-sampling'.
Applying profile 'dotnet-common': Microsoft-Windows-DotNETRuntime:0x000000100003801D:4
Applying profile 'cpu-sampling': --on-cpu

Provider Name                           Keywords            Level               Enabled By
Microsoft-Windows-DotNETRuntime         0x000000100003801D  Informational(4)    --profile

Error: Tracefs is not accessible: Permission denied (os error 13)
`collect-linux` with elevated privileges
$ sudo ./dotnet-trace collect-linux
[sudo] password for mihw:
No profile or providers specified, defaulting to trace profiles 'dotnet-common' + 'cpu-sampling'.
Applying profile 'dotnet-common': Microsoft-Windows-DotNETRuntime:0x000000100003801D:4
Applying profile 'cpu-sampling': --on-cpu

Provider Name                           Keywords            Level               Enabled By
Microsoft-Windows-DotNETRuntime         0x000000100003801D  Informational(4)    --profile

Recording started.  Press CTRL+C to stop.
^C
Recording stopped.
Resolving symbols.
Finished recording trace.
Trace written to trace_20250919_205934.nettrace

On Windows (and I presume other non-Linux OS):

`collect-linux`
.\artifacts\bin\dotnet-trace\Debug\net8.0\dotnet-trace.exe collect-linux
The collect-linux command is only supported on Linux.

dotnet-trace list-profiles

`list-profiles`
dotnet-trace profiles:
        dotnet-common                        - Lightweight .NET runtime diagnostics designed to stay low overhead.
                                               Includes:
                                                   GC
                                                   AssemblyLoader
                                                   Loader
                                                   JIT
                                                   Exceptions
                                                   Threading
                                                   JittedMethodILToNativeMap
                                                   Compilation
                                               Equivalent to --providers "Microsoft-Windows-DotNETRuntime:0x100003801D:4".
        dotnet-sampled-thread-time (collect) - Samples .NET thread stacks (~100 Hz) toestimate how much wall clock time code is using.
        gc-verbose                           - Tracks GC collections and samples object allocations.
        gc-collect                           - Tracks GC collections only at very low overhead.
        database                             - Captures ADO.NET and Entity Framework database commands
        cpu-sampling (collect-linux)         - Kernel CPU sampling events for measuring CPU usage.
        thread-time (collect-linux)          - Kernel thread context switch events for measuring CPU usage and wall clock time
Screenshot 2025-09-19 142848 Screenshot 2025-09-19 142945 Screenshot 2025-09-19 142858

return (int)ReturnCode.TracingError;
}

proc.OutputDataReceived += (_, e) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now but once we've got the FFI working I'm hoping that dotnet-trace will control the majority of the output and only pass along warnings/errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooked in the FFI and have the output type routed to Console.Out and Console.Error. Didn't add handling for progress as I haven't seen records yet.

@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch from c09b095 to 0989d21 Compare September 19, 2025 05:45
@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch 2 times, most recently from 76e65e1 to 61b2569 Compare September 19, 2025 18:32
@mdh1418 mdh1418 force-pushed the dotnet_trace_collect_linux branch from 61b2569 to c8e53ea Compare September 19, 2025 18:56
@mdh1418 mdh1418 marked this pull request as ready for review September 19, 2025 19:43
@mdh1418 mdh1418 requested a review from a team as a code owner September 19, 2025 19:43
@mdh1418 mdh1418 added the DO NOT MERGE do not merge this PR label Sep 19, 2025
@mdh1418 mdh1418 changed the title [dotnet-trace] Add collect-linux verb [NO-MERGE][dotnet-trace] Add collect-linux verb Sep 19, 2025
</PropertyGroup>

<ItemGroup>
<_RecordTraceResolved Include="$(_RecordTraceLocal)" Condition="'$(RuntimeIdentifier)' != '' AND Exists('$(_RecordTraceLocal)')" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, artifact from before, will remove

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Aside from some small comments inline a few broader things:

  1. Historically we relied on manual testing to keep these tools operating well but now that we have less time available from the testers I think we need to improve our automated testing. Partly this is to ensure we aren't inadvertently changing the original 'collect' verb and partly to ensure going forward the 'collect-linux' behavior doesn't regress either. I think the best way to do this would be:
  • Open a new PR that we'll check in first containing some basic tests of the existing collect verb.
  • Commit this PR 2nd and all the tests in the 1st PR should continue to pass. This ensures we didn't change anything unintended.

To do the testing we probably need to create some small interface shims. We already have an IConsole interface defined that could be moved to the shared Common folder. We could also create a small interface around the DiagnosticsClient.EventPipeCollect() API so that a test can return some dummy data in a stream instead. dotnet-counters has some example tests that show how we can run some code and then confirm the console output is what we expect. In this case I imagine we'd be running the Collect() function and giving some chosen input arguments.

  1. I think there is a bit more adjustment to be done on some of the output text, but it should be fine to get this one in first, then tweak afterwards in some 3rd PR.

string options = string.Join(' ', recordTraceArgList);
byte[] command = Encoding.UTF8.GetBytes(options);
int rc;
try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd shift the try { to start prior to registering the keypress handler. We want to ensure that any error from then onwards cleans everything up.


private static int RunRecordTrace(CollectLinuxArgs args)
{
s_recordStatus = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use a boolean flag or an enum rather than magic constants.

return collectLinuxCommand;
}

private static int RunRecordTrace(CollectLinuxArgs args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming and factoring feels a little unexpected. I think of "RunRecordTrace" to mean invoking the FFI, waiting for it to finish, and maybe updating the UI during the process. I didn't expect it to include parsing configuration, formatting the script file, determining the output path, or any other calculations to produce the record-trace arguments. I'd suggest just fold this into CollectLinux and name RecordTrace -> RunRecordTrace

{
profileEffect = traceProfile.CollectLinuxArgs;
}
Console.WriteLine($"Applying profile '{traceProfile.Name}': {profileEffect}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like debug output that we should remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this and the perf event log because collect-linux introduces .NET Provider-free configurations such as --profile cpu-sampling and --perf-events block:block_io_start. I figured it would be a good sanity check for users to see those effects to know that the tracing session is actually capturing something, rather than

No providers were configured.
Recording started.  Press CTRL+C to stop.

Previously, those would have also just been

Provider Name                           Keywords            Level               Enabled By


Recording started.  Press CTRL+C to stop.

until adding to ProviderUtils.cs PrintProviders

if (providers.Count == 0)
{
    Console.WriteLine("No providers were configured.");
    return;
}

With this and the below perf-events log, users can see

$ dotnet-trace collect-linux --profile cpu-sampling
Applying profile 'cpu-sampling': --on-cpu
No providers were configured.
Recording started.  Press CTRL+C to stop.

and

$ dotnet-trace collect-linux --perf-events block:block_io_start,block:block_io_done
No providers were configured.
Enabling perf event 'block:block_io_start'
Enabling perf event 'block:block_io_done'
Recording started.  Press CTRL+C to stop.


string perfProvider = split[0];
string perfEventName = split[1];
Console.WriteLine($"Enabling perf event '{perfEvent}'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO NOT MERGE do not merge this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants